Dynamic Resource Management Algorithms For Complex Systems and No
Dynamic Resource Management Algorithms For Complex Systems and No
Dynamic Resource Management Algorithms For Complex Systems and No
OpenCommons@UConn
7-14-2020
Recommended Citation
Zhang, Lingyi, "Dynamic Resource Management Algorithms for Complex Systems and Novel Approaches
to Adaptive Kalman Filtering" (2020). Doctoral Dissertations. 2564.
https://opencommons.uconn.edu/dissertations/2564
Dynamic Resource Management
Algorithms for Complex Systems and
Novel Approaches to Adaptive Kalman
Filtering
ABSTRACT
Lingyi Zhang
A Dissertation
Submitted in Partial Fulfillment of the
Requirements for the Degree of
Doctor of Philosophy
at the
University of Connecticut
2020
Copyright by
Lingyi Zhang
2020
ii
APPROVAL PAGE
Presented by
Lingyi Zhang, B.S., M.S.
Major Advisor
Krishna R. Pattipati
Associate Advisor
Peter B. Luh
Associate Advisor
Yaakov Bar-Shalom
University of Connecticut
2020
iii
ACKNOWLEDGMENTS
Thank you to my major advisor, Dr. Krishna Pattipati, for his guidance and
patience in molding me into who I am today. It is my utmost honor to work and
learn under his guidance and support. I would like to thank my associate advisor Dr.
Yaakov Bar-Shalom, whom I had the pleasure of writing a paper with. He has shaped
my presentation and writing style to be consistent and detail oriented. I also thank
Dr. Peter Luh for being on my committee, and whom I had the privilege of being a
student in his nonlinear optimization course.
I would like to express my appreciation for my friends/colleagues, David Sidoti,
Manisha Mishra, Vinod Avvari, Adam Bienkowski and the rest of the Cyberlab
members I had the honor to work with over the years in pursuit of this degree. I
would like to also particularly thank David Sidoti for being my mentor and aiding
my transition to the lab, providing me with valuable support, advice and guidance
throughout the years of my graduate school journey. Lastly, I want to thank my family
for their unconditional encouragement, support and love. I would not have made it
this far without their support.
iv
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Outline of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
v
2.6.5 Runtime comparison for decomposition methods . . . . . . . . 43
2.6.6 Scalability with N . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.6.7 Scalability with R . . . . . . . . . . . . . . . . . . . . . . . . . 47
vi
4.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.3.1 Deterministic Problem . . . . . . . . . . . . . . . . . . . . . . 107
4.3.2 Multi-objective Extension . . . . . . . . . . . . . . . . . . . . 109
4.4 Fast Approximate Method for the Pareto-frontier Generation . . . . . 110
4.4.1 Time Windows . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.4.2 1-Step Lookahead with Rollout Strategy . . . . . . . . . . . . 112
4.4.3 Gaussian Mixture Model and Silhouette Score to Reduce Prob-
lem Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.5 Simulation and Computational Results . . . . . . . . . . . . . . . . . 115
4.5.1 Scenario Description . . . . . . . . . . . . . . . . . . . . . . . 115
4.5.2 Solution Quality: NAMOA* vs NAPO . . . . . . . . . . . . . 117
4.5.3 Scalability Analysis . . . . . . . . . . . . . . . . . . . . . . . . 119
vii
5.10.1 Case 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.10.2 Case 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.10.3 Case 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.10.4 Case 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.10.5 Case 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6 Conclusion 189
Bibliography 206
viii
Chapter 1
Introduction
1.1 Background
This dissertation considers two broad topics, one motivated by the need to comply with
scarce resource management requirements as defense and industry look to continuously
accomplish more with less, and the other motivated by the previous limitations of
a steady-state data-driven Kalman filter. The first topic led to the development of
efficient dynamic resource management algorithms with applications to nuclear fuel
assembly loading pattern optimization, surveillance asset allocation for counter-drug
smuggling and multi-objective ship routing, while the second topic resulted in novel
approaches for estimating process and measurement noise covariances in adaptive
Kalman filtering.
The goal of automated decision making is to determine and understand the decision
context, and to effectively explore the problem space to present to the Decision Maker
(DM) ranked courses of action to choose from in a timely manner. For example, what
1
separates the nuclear fuel assembly loading problem from a traditional 3-dimensional
(3-D) assignment problem is the requirement to enumerate a dense set of discrete
loading patterns though a dynamically estimated probability distribution (represented
by a reward tensor). Evaluation of each loading pattern by reactor-physics-based
external code may be very time consuming (≈ 0.1 to 10 minutes, depending on
the required accuracy of loading pattern response evaluation). The key here is to
evaluate only new (unique) loading patterns (assignments). In application to maritime
surveillance and drug interdiction, the dynamic resource management problem under
uncertainty may be viewed as a moving horizon stochastic control problem. In the
context of a counter-smuggling mission, the key problem is to efficiently allocate a
set of heterogeneous sensing and interdiction assets to maximize the probability of
smuggler detection and interdiction, subject to mission constraints, by integrating
information, such as intelligence, weather, asset availability, asset capabilities (e.g.,
range, speed), sensor management, and asset assignment (e.g., many sensors may
need to be coordinated to obtain a better picture of the situation). This problem
is PSPACE-hard1 . In the application involving ship routing, the salient problem is
multi-objective planning in a dynamic and uncertain environment. The ship routing
problem is exacerbated by the need to address multiple conflicting objectives (as
many as fifteen objectives, such as fuel efficiency, voyage time, distance), spatial and
temporal uncertainty associated with the weather and multiple constraints on asset
operation (e.g., ship limits, navigator specified deadlines, bathymetry, waypoints, etc.).
Lastly, the second major thrust of this thesis is the identification of noise covariances
in a steady-state Kalman filter [85]. The Kalman filter is the state estimator for linear
1
In computational complexity theory, PSPACE-hard means that a polynomial amount of memory
is needed to arrive at an optimal solution for a given problem.
2
dynamic systems driven by Gaussian white noise with measurements corrupted by
Gaussian white noise. In the classical design of a Kalman filter, the noise covariance
matrices are assumed known and they, along with system dynamics, determine the
achievable filter’s accuracy. However, in many practical situations, including noisy
feature data in machine learning, the statistics of the noise covariances are often
unknown or only partially known. Thus, noise identification is an essential part
of adaptive filtering. Although this problem has a long history, reliable algorithms
for their estimation are not available, and necessary and sufficient conditions for
identifiability of the covariances are in dispute. We address both of these issues in
this dissertation.
3
auction algorithms, and the 2-D transportation problem is solved by the simplex-based
transportation, Transauction or RELAX-IV algorithms. The sequence of relaxed 2-D
problems are interchangeable, while adhering to the relaxed constraints. We validate
and compare the performance and utility of the proposed algorithms and search space
decomposition optimizations via extensive numerical experiments.
In Chapter 3, we tackle the problem of targeting in uncertainty, where we delve
into surveillance operations in counter-drug smuggling. We validate four approximate
dynamic programming approaches and three branch-and-cut-based methods on a
maritime surveillance problem involving the allocation of multiple heterogeneous
assets over a large area of responsibility to detect multiple drug smugglers using
heterogeneous types of transportation on the sea with varying contraband weights.
The asset allocation is based on a probability of activity surface, which represents
spatio-temporal target activity obtained by integrating intelligence data on drug
smugglers’ whereabouts/waypoints for contraband transportation, their behavior
models, and meteorological and oceanographic information. We validate the proposed
algorithmic concepts via realistic mission scenarios. We conduct scalability analyses
of the algorithms and conclude that effective asset allocations can be obtained within
seconds using rollout-based ADP. The contributions of this work have been transitioned
to and are currently being tested by Joint Interagency Task Force–South (JIATF-
South), an organization tasked with providing the initial line of defense against drug
trafficking in the East Pacific and Caribbean Oceans.
Chapter 4 details an enhancement to TMPLAR, a mixed-initiative tool for multi-
objective planning and asset routing in dynamic and uncertain environments. It is
built upon multi-objective dynamic programming algorithms to route assets in a timely
fashion, while considering objectives, such as fuel efficiency, voyage time, distance, and
4
adherence to real world constraints (asset vehicle limits, navigator-specified deadlines,
etc.). The ship routing problem is exacerbated by the need to address multiple
conflicting objectives, spatial and temporal uncertainty associated with the weather
and multiple constraints on asset operation. The NAPO algorithm optimizes weather-
based objectives in a reasonable amount of time, optimizing arrival and departure
times at waypoints, asset speed and bearing. The key algorithmic contribution is a
fast approximate method for substantially containing the computational complexity by
generating the Pareto-front of the multi-objective shortest path problem for networks
with stochastic non-convex edge costs, utilizing approximate dynamic programming
and clustering techniques. The proposed algorithm is validated by comparing its
performance with the new approach to multi-objective A* (NAMOA*), an existing
multi-objective optimization algorithm.
In Chapter 5, we discuss the topic of adaptive Kalman filtering, where we present
the new approach to identify the unknown noise covariances. The Kalman filter
requires knowledge of the noise statistics; however, the noise covariances are generally
unknown. Although this problem has a long history, reliable algorithms for their
estimation are scant, and necessary and sufficient conditions for identifiability of the
covariances are in dispute. We address both of these issues in this thesis. We first
present the necessary and sufficient condition for unknown noise covariance estimation;
these conditions are related to the rank of a matrix involving the auto and cross-
covariances of a weighted sum of innovations, where the weights are the coefficients
of the minimal polynomial of the closed-loop system transition matrix of a stable,
but not necessarily optimal, Kalman filter. We present an optimization criterion
and a novel six-step approach based on a successive approximation, coupled with a
gradient algorithm with adaptive step sizes, to estimate the steady-state Kalman filter
5
gain, the unknown noise covariance matrices, as well as the state prediction (and
updated) error covariance matrix. Our approach enforces the structural assumptions
on unknown noise covariances and ensures symmetry and positive definiteness of the
estimated covariance matrices. We provide several approaches to estimate the unknown
measurement noise covariance R via post-fit residuals, an approach not yet exploited
in the literature. The validation of the proposed method on five different test cases
from the literature demonstrates that the proposed method significantly outperforms
previous state-of-the-art methods. It also offers a number of novel machine learning
motivated approaches, such as sequential (one sample at a time) and mini-batch-based
methods, to speed up the computations.
We summarize and discuss the research impact of the proposed approaches in
Chapter 6.
1.3 Publications
Journal papers that are accepted and published with primary authorship include
[186, 188, 190]:
6
Fusion, pp. 1–20, 2019.
Conference papers that are accepted and published with primary authorship
include [187, 189]:
Patents that are accepted and published with primary authorship include [184,185]:
7
Journal papers that are accepted and published with co-authorship include [158,
161]:
Conference papers that are accepted and published with co-authorship include
[8, 23, 72, 112, 119]:
8
3. D. Haste, S. Ghoshal, K. Pattipati, C. Moore, R. Martin, L. Zhang, and
J. Meyer, “Flexible Integrated System Health Management for Sustainable
Habitats,” in 2018 AIAA Information Systems-AIAA Infotech@ Aerospace, FL,
USA, Jan. 2018, p. 1364.
Book chapters that are accepted and published with co-authorship include [118]:
Patent applications that are currently pending with co-authorship include [47]:
9
Chapter 2
2.1 Introduction
2.1.1 Motivation
Assignment problems are applicable to a diverse array of real world problems [46,55,140].
This set of problems takes the form of how best to assign a number of items or objects
to some (possibly different) number of machines or people during different time
periods. Assignment problems are of a combinatorial nature, each requiring some
form of an objective function to indicate the value or utility of individual assignments.
A sampling of how diverse and widely applicable such assignment problems are can
10
be seen from the following: multi-target tracking, quadratic assignment problems,
traveling salesman problems, or vehicle routing problems. Such problems also occur in
academia or the military, where a set of military troops [140] or teachers [55] must be
assigned to locations or classrooms that are temporally dependent in value or utility.
Assignment problems have even been motivated from a telecommunications standpoint,
where a set of satellites must be launched from a set of locations to maximize their
coverage [140].
A 2-dimensional (2-D) assignment problem may be viewed as a weighted bipartite
graph matching problem, where arcs must link two sets of nodes together such that an
objective function is optimized, while satisfying a set of one-to-one constraints. The 3-
dimensional (3-D) extension of this problem has been proven to be NP-hard [56,86,133].
In particular, one application that we focus on in this chapter is a nuclear fuel assembly
(FA) loading pattern optimization. The core of a nuclear reactor is formed by large
sets of elongated, rectangular FAs arranged in a cylindrical fashion, as shown in Fig.
2.1.
The nuclear fuel assembly loading pattern optimization problem involves choosing:
1) the position of the FA in the nuclear reactor core, 2) the type of FA to put in
the chosen position, and 3) the rotation/orientation of the chosen FA type in the
chosen position. Each dimension of the 3-D assignment corresponds to each of the
decision variables above. In general, this problem is treated as a multiple objective
combinatorial problem, but what separates it from the traditional 3-D assignment
problems is the requirement for a dense set of new discrete loading patterns though a
dynamically estimated probability distribution (represented by a reward tensor). This
conversion to a 3-D assignment problem is a completely new approach for nuclear
fuel loading pattern optimization. The reward tensor is dynamically updated based
11
Figure 2.1: The core of a nuclear reactor is formed by large sets of fuel assemblies where
position, type, and rotation/orientation must be chosen for each one. Illustrated here is a
nuclear fuel assembly loading operation at Fangchenggang nuclear power plant in China’s
Guangxi province [1].
on the “best” solutions taken from the multi-objective Pareto front. “Best” in this
case may not necessarily refer to the optimal, but one of a large number of solutions
(assignments). By “large,” we mean on the order of 104 solutions. Evaluation of each
loading pattern by reactor-physics-based external code may be very time consuming
(≈ 0.1 to 10 minutes, depending on the required accuracy of loading pattern response
evaluation), so there exists a need to evaluate only new (unique) loading patterns
(assignments).
In such scenarios, an m-best 3-D assignment problem is needed, wherein a large
set of solutions is generated in a reasonable amount of time (< 10 minutes for 104
solutions), so that the set of assignments may be externally evaluated (each of which, in
turn, may take 0.1 to 10 minutes). It may also be a viable approach to obtain a dense
set of solutions that are near-optimal and satisfy the decision maker (such as in the case
of resource allocation or military troop allocation problems) or customer preferences
12
(as in [55], where they attempt to satisfy both student and tutor requirements or
requests). Having a large set of solutions offers a range of options that may be of
interest to a decision maker attempting to optimize with respect to multiple, possibly
conflicting, objectives.
This chapter offers an effective solution approach for finding a large number of
m-best solutions to the 3-D assignment problems with non-unity right-hand side
constraints with application to many real world challenges. The problem space may
be decomposed into multiple partitions based on the optimal assignment, as detailed
in [124]. Through a two-phase approach, we offer a method for rapidly generating
large numbers of solutions to the 3-D assignment problems.
13
In order to overcome the 3-D assignment problem’s inherent computational in-
tractability, a wide range of algorithms have been developed to obtain suboptimal
solutions, including greedy heuristics, genetic algorithms, simulated annealing, tabu
search, neural networks, and Lagrangian relaxation approaches [55, 111, 135, 145, 146].
Mazzola [111] proposed a heuristic branch-and-bound method to reduce the com-
putation time. In contrast, Frieze and Yadegar [55] applied Lagrangian relaxation
theory to a more general 3-D assignment problem with application to teaching practice
scheduling. The Lagrangian relaxation method of obtaining solutions to 3-D assign-
ment problems has become extremely prevalent in data association applications due
to the real time computation speed and solution quality [46, 135, 146]. Poore [145]
combined these two approaches, proposing a hybrid branch-and-bound and Lagrangian
relaxation algorithm to the 3-D assignment problem.
In this chapter, we seek to solve the aforementioned 3-D assignment problem,
but instead of finding a single solution, we aim to provide a large set of ranked
solutions. The process of finding the first best, second best, third best, and so on,
solution is known as the m-best optimization problem. The m-best optimization
problem occurs in a variety of contexts, including the shortest path [5,45,78], spanning
tree [3, 57, 65], traveling salesman [174], directed network [28], multi-target tracking
[16, 39, 40, 147, 148] and many other problems. The general approach to the m-best
optimization problem involves partitioning the solution space into smaller subspaces,
which are subproblems of the original problem. Murty’s search space decomposition
[124] is the most common and widely used technique, where the best solution is found
for each partitioned subproblem, given a modified solution subspace. Lawler [98]
applied Murty’s search space decomposition procedure within a more general framework
for a discrete optimization problem. Pascoal [134] proposed a variant of Murty’s search
14
space decomposition to reduce the algorithm’s complexity. This variant involved solving
the partitioned subsets in reverse order. Miller et al. [116] proposed modifications to
optimize Murty’s search space decomposition procedure to the 2-D assignment problem
via: 1) inherited dual variables and partial solutions from the initial subproblems;
2) sorting the subproblems based on lower bounds on the optimal reward before
solving the assignment problem; and 3) partitioning in an order based on lower bounds
on cumulative reward. These modifications substantially reduce the complexity of
Murty’s search space decomposition and are implemented in this chapter.
Another alternative way to solve the m-best optimization problem is by Gabow’s
[57] binary heap partition method. Similarly, Hamacher [64] also proposed using a
binary search tree procedure, while also combining an approach developed by Carraresi
and Sodini [32] to rank the paths. Chegireddy and Hamacher [34] extended this work
further and developed an m-best perfect matching algorithm based on the binary
partition of the solution space to apply to a bipartite matching problem in O(kn3 )
time. Recently, a modified version of the Chegireddy and Hammacher’s algorithm
was developed for large datasets [102]. We suggest comparison of our algorithm with
those in [102] as future research.
The primary focus of this chapter is on combining a Lagrangian relaxation method and
m-best optimization to obtain a very large number of ranked solutions. Motivated by
an approach developed by Pattipati [135], we apply the Lagrangian relaxation approach
that successively solves a series of 2-D problems, since a key advantage of using the
Lagrangian relaxation method is that it prunes the solution space by computing the
15
upper and lower bounds. The first 2-D problem is a bipartite graph matching problem
(2-D assignment problem), which can be solved using either the auction algorithm or
the JVC algorithm [82]; the latter is more efficient for dense problem spaces [50]. The
feasible solution is obtained by solving a 2-D transportation problem (via a simplex
algorithm or Transauction algorithm) reconstructed from the relaxed solution of the
2-D assignment problem. The second step corresponds to imposing the originally
relaxed constraint on the first subproblem’s solutions. As in [147], we generate m-best
solutions by exploiting Murty’s search space decomposition procedure. Additionally, we
optimize Murty’s search space decomposition via Miller’s [116] proposed modifications.
An alternate Lagrangian relaxation method involves first solving a 2-D transportation
problem at each iteration of the 3-D assignment algorithm using either a simplex
algorithm or the Transauction algorithm, and subsequently reconstructing the feasible
solution via a 2-D assignment problem. We will show that the former Lagrangian
relaxation method is two orders of magnitude faster than the latter.
This chapter is organized as follows. We begin by introducing the problem
formulation in Section 2.2. In Section 2.3, we solve the m-best 3-D assignment problem
via Murty’s search space decomposition and the Lagrangian relaxation method. In
Section 2.4, we detail Miller et al.’s [116] search space optimizations and extend them
to the 3-D assignment problem. We provide the pseudocode of the fully optimized
m-best 3-D assignment solution algorithm in Section 2.5. In Section 2.6, we present
the results of the m-best 3-D assignment algorithm and the performance of each
different optimization technique.
16
2.2 Problem Formulation
The notation used in the remainder of this chapter is listed in Table 5.1.
Table 2.1
Summary of Notation
17
2.2.1 Problem Formulation
N X
X N X
R
max wijk xijk (2.1)
xijk ∈{0,1}
i=1 j=1 k=1
N X
X R
s.t. xijk = 1, i = 1, . . . , N (2.2)
j=1 k=1
N X
X R
xijk = 1, j = 1, . . . , N (2.3)
i=1 k=1
XN X N
xijk ≤ mk , k = 1, . . . , R (2.4)
i=1 j=1
where xijk is a binary decision variable such that xijk = 1 if resource (row) i is assigned
to task (column) j at time (layer) k, and 0 otherwise. Constraints (2.2) and (2.3)
ensure that each resource i is allocated to exactly one task j and vice versa. Constraint
(2.4) requires that there may be no more than mk assignments at each time k and
makes this assignment problem non-standard.
Figure 2.2 shows the 3-D assignment problem as a network flow problem. Consider
the first set, indexed by i, and the second set, indexed by j, each consisting of N nodes.
Also, consider a third set, indexed by k, with a total of R nodes. There are a total of
N assignments that may be made between sets i and j based on constraints (2.2) and
(2.3). We view this as a 2-D assignment problem (indicated by the solid box in Fig.
2.2). Additionally, each node in set j must be assigned to one of the nodes in set k
(indicated by the dashed (blue) box in Fig. 2.2). Due to constraint (2.4), for every k,
there may be no more than mk assignment pairs of (i, j) mapped to each layer. This
18
may be viewed as an unbalanced transportation problem, where the nodes in set (i, j)
are the sources and the nodes in set k are the sinks. Note that {mk : k = 1, 2, . . . R}
should be such that R
P
k=1 mk ≥ N so that each (i, j) can be assigned to a node k.
Note that our 3-D assignment problem formulation covers a wide range of problems.
Note that our problem formulation is a special case of the transportation problem.
The general transportation problem involves altering the unity constraint to some
non-unity values.
19
i j k
1 1
2 2 1
3 3 2
4 4 ..
.
.. .. R
. .
N N
Figure 2.2: Network flow view of the 3-D assignment problem, originally presented
in [189].
to the traditional 2-D assignment problem, since constraint (2.4) can be subsumed
under constraints (2.2) and (2.3) and is, thus, unnecessary. The 3-D assignment
problem posed in (2.1) then devolves to a 2-D assignment problem, detailed later in
Section 2.3.1.4. An m-best 2-D assignment problem is adequate for this version of the
problem.
20
problem space into a series of subproblems. Each subproblem is then relaxed and
solved by a 3-D assignment algorithm in phase II.
We adopt the solution approach of the 3-D assignment problem in [135] by relaxing
one of the three constraints and solving the 3-D assignment problem as a series of 2-D
subproblems. Since sets i and j have the unity constraint, a similar solution approach
can be applied to the 3-D assignment problem here by relaxing either of the two sets
of constraints. We then denote Relaxation Method I and Relaxation Method II as
the solution approaches for the 3-D assignment problem when constraints (2.4) or
(2.2)/(2.3) are relaxed, respectively.
N X
N X
R
! R
X X
L (x, µ) = max (wijk − µk ) xijk + mk µk (2.5)
xijk ∈{0,1}
i=1 j=1 k=1 k=1
21
Equation (2.5) is then a relaxed 2-D assignment problem of the form,
N X
X N
max max (wijk − µk ) yij (2.6)
yij ∈{0,1} k
i=1 j=1
N
X
s.t. yij = 1, j = 1, . . . , N (2.7)
i=1
XN
yij = 1, i = 1, . . . , N (2.8)
j=1
where,
R
X
yij = xijk ; i, j = 1, . . . , N. (2.9)
k=1
The upper bound q of the relaxed 2-D assignment problem is easily solvable via a 2-D
assignment algorithm. To obtain a feasible solution, we reimpose constraint (2.4) by
reconstructing the reward tensor and viewing the asymmetric bipartite graph as a
transportation problem based on the solution of the relaxed 2-D assignment problem.
For each hi∗ , j ∗ i of the relaxed 2-D assignment problem at each iteration, the reward
matrix is dynamically updated for each layer k. Given a new reward matrix w
ehi,jik ,
the transportation variation of the problem is as follows.
N X
X R
max w
ehi,jik zjk (2.10)
zjk ∈{0,1}
j=1 k=1
N
X
s.t. zjk = 1, k = 1, . . . , R (2.11)
j=1
R
X
zjk ≤ mk , j = 1, . . . , N (2.12)
k=1
22
Through this sequence, we obtain a feasible solution and a lower bound f . The upper
and lower bounds serve as measures of the solution quality. The distance between
these bounds is referred to as the approximate duality gap (because it is overestimated
by (f − f ∗ ), where f ∗ is the optimal solution). For discrete 3-D assignment problems,
the duality gap may be nonzero. The relative approximate duality gap is given by
|q − f |
gap = (2.13)
f
where q and f are the upper and and lower bounds, respectively, obtained by solving
the series of 2-D subproblems. The 3-D assignment algorithm terminates for a
sufficiently small gap, which implies that a near-optimal solution has been obtained.
In scenarios where the duality gap is large, the 3-D assignment algorithm updates its
Lagrange multipliers via the method proposed in Pattipati [135]. Let us denote g as
an R-dimensional subgradient vector with components given by
N X
X N
gk = R − Xijk k = 1, . . . , R, (2.14)
i=1 j=1
where X is the solution tensor related to the optimal value of the relaxed 2-D
assignment variables {yij∗ } via
yij∗ , if k = arg min(wijα − µα )
α
Xijk =
0,
otherwise
23
We then update the Lagrange multipliers by
(p − f )
µk = max µk − gk , 0 . (2.15)
kgk22
After updating the Lagrange multipliers, the algorithm iterates back to the relaxation
step. The process continues until either the maximum number of iterations is reached
or the duality gap is sufficiently small. The flow diagram of the 3-D assignment
algorithm when the constraint in (2.4) is relaxed is shown in Fig. 2.3.
Note that a relaxed problem is also obtainable by interchanging the sequence of 2-D
subproblems. In other words, we may apply the Lagrangian relaxation on constraints
(2.2) or (2.3). When constraint (2.3) is relaxed via Lagrange multipliers µj , the
Lagrangian function is:
N X
N X
R
! N
X X
L (x, µ) = max (wijk − µj ) xijk + µj (2.16)
xijk ∈{0,1}
i=1 j=1 k=1 k=1
The 3-D assignment problem is then relaxed into a 2-D transportation problem of the
form
N X
X R
max max (wijk − µj ) zik (2.17)
zik ∈{0,1} j
i=1 k=1
XR
s.t. zik = 1, i = 1, . . . , N (2.18)
k=1
XN
zik ≤ mk , k = 1, . . . , R, (2.19)
i=1
24
where
N
X
zik = xijk ; i = 1, . . . , N ; k = 1, . . . , R (2.20)
j=1
The upper bound q can be obtained by solving the relaxed 2-D transportation problem.
The 2-D assignment problem is obtained by reimposing constraint (2.3) and recon-
structing the reward tensor based on the solution of the relaxed 2-D transportation
problem. The assignment variation of the problem is as follows.
N X
X N
max w
ehi,kij yij (2.21)
yij ∈{0,1}
i=1 j=1
N
X
s.t. yij = 1, j = 1, . . . , N (2.22)
i=1
XN
yij = 1, i = 1, . . . , N (2.23)
j=1
A feasible solution and a lower bound f can be obtained through this sequence.
The duality gap is then computed and compared for algorithm termination. The
subgradient is updated in a similar fashion to the first relaxation method, except that
it is with respect to dimension j and uses binary decision variables {zik }.
25
utilizes the Transauction algorithm developed by Bertsekas and Castañón [20], which
solves the transportation problem by mapping it to an assignment problem and obtains
a solution via a modified auction algorithm. In the second algorithm, we exploit the
findings in [133, 167], where the transportation problem was found to be equivalent to
the minimum cost network flow problem, and solve the 2-D transportation problem
via a (weakly polynomial) simplex-based method. We refer to this simply as simplex-
based transportation. The third algorithm is the RELAX-IV algorithm developed by
Bersekas and Tseng [21] and further detailed in [54]. It is one of the most efficient
algorithms to solve problems of the network flow type.
for this problem, constraint (2.4) is such that mk = N . In this case, the summations
over sets i and j are always less than or equal to N for each k, and, consequently,
constraint (2.4) is always satisfied. This implies that the constraints in (2.4) are
inactive and the Lagrange multipliers µk = 0 for k = 1, 2, . . . , R. Consequently, the
3-D assignment problem takes the form,
N X
X N
max max (wijk ) yij (2.24)
yij ∈{0,1} k
i=1 j=1
N
X
s.t. yij = 1, j = 1, . . . , N (2.25)
i=1
XN
yij = 1, i = 1, . . . , N (2.26)
j=1
26
Initialization
µk = 0,
k = 1, 2, . . . , R,
q = −∞
Relax constraint
in (2.4) via La-
grange multi-
plier vector {µk } No
Is the gap
suffi- Yes
Terminate
Optimally solve ciently
the relaxed 2-D small?
subproblem
Enforce
|q−f |
constraint gap = f
in (2.4)
Figure 2.3: Flow diagram of the 3-D assignment algorithm when relaxing constraint (2.4).
This problem can be easily solved by an m-best 2-D assignment algorithm. Further-
more, the approach is the same for the general case when mk ≥ N .
Let P0 be the original problem in equations (2.1)–(2.4) and let A be the corresponding
assignment solution space. Further, let A∗0 be the best feasible assignment found
by the 3-D assignment algorithm detailed in Section 2.3.1. In general, to find the
27
(n + 1)th best solution, we have to partition the (n + 1)th problem space, Pn , into
N subproblems, denoted by Pnr , 1 ≤ r ≤ N . Then, the complete solution space
corresponding to problem space Pn is
N
[ n−1
[
An = Anr = A − A∗i for n = 1, 2, . . . , m (2.27)
r=1 i=0
where Anr denotes a set of tuples in which each i and j appear exactly once, but
k may be repeated. Equation (2.27) is a formalization of the constraint that the
solution space An for the (n + 1)th best solution will not contain any of the best
solutions obtained for the previous n problems. Here, a complete feasible solution is
assumed to be a set of tuples. Hence, some solutions may have a similarity, however,
as seen in (2.28), the set of solution tuples as a whole are unique, differing by at least
one element for each of the previous n problems. Let an assignment Anr consist of
multiple tuples (in this chapter, triples), where we index the triples within by t. Let
`nrt be the individual reward of the tth triple, sub-indexed as < inrt , jnrt , knrt >, in
the solution space Anr . We can then augment the triple into a 4-tuple and write a
feasible assignment in Anr as
28
The primal value of the corresponding assignment Snr is denoted by fnr , which can
be obtained by summing `nrt over t = 1, 2, . . . , N .
N
X
fnr = `nrt (2.30)
t=1
The best assignment A∗nr with the corresponding primal value fnr
∗
in the solution
space Anr is found via the 3-D assignment algorithm described earlier and pertains
specifically to partition r. The best assignment A∗n is found by iterating over all active
partitions and finding the argument r∗ which has the maximum primal value.
Given the original problem space and its optimal assignment, denoted by P0 and
A∗0 , respectively, we partition P0 into N problem subspaces P11 to P1N in order to
find the next best solution. To generate subproblem P11 , we remove the first of N
tuples in the assignment A∗0 . We then use the 3-D assignment algorithm to obtain the
best possible solution A∗11 to problem P11 . To partition the subspace P1s , 2 ≤ s ≤ N ,
we remove the sth tuple in A∗0 as a feasible assignment in P1s , while fixing the first
(s − 1) triples to those in the original assignment A∗0 . Thus, as the solution and
problem spaces are reduced at every search space decomposition, the complexity of
the problem decreases substantially, since the first (s − 1) triples are reused from the
previous assignments. We then only need to find assignments for the remaining N − s
assignments, such that the sth triple from the original assignment A∗0 is not contained
in the solution, while satisfying the constraints. The enforcement of tuples to be either
29
in or be removed from the problem spaces P11 to P1N during partitioning ensures the
disjointness of the individual subproblems, as in equation (2.28).
Each of the best solutions A∗11 to A∗1N is saved into a heap and accordingly sorted
∗ ∗
based on the respective primal values, f11 to f1N . The best solution within the heap
is then removed and saved as the second best solution. The problem corresponding to
the second best solution is then partitioned into the subproblems P21 to P2N . The
best assignment from the top of the heap is then marked as the third best assignment
with respect to the original problem P0 . We continue to apply this process until the
mth best solution is found or, alternatively, the heap becomes empty.
Murty’s search space decomposition is an ingenious way of decomposing the
search space, and has a number of applications in combinatorial optimization [66, 98].
Optimizations of the decomposition technique to improve the computational efficiency
are discussed in Section IV.
Remark : For small size problems and large m, if we apply the Lagrangian relaxation
on constraint (2.4), the transportation problem reconstructed from the best (i, j) pair
of the 2-D assignment problem may contain too many removed arcs and, thus, no
feasible solution may exist. In this case, by interchanging the sequence of relaxed
problems solved (i.e., solve the 2-D transportation problem first, as opposed to the
assignment problem (normally solved first)), we can obtain a feasible solution to
the 3-D assignment problem. This situation arises in small size problems (e.g., of
dimension 3 × 3 × 2). However, since the tensor dimensions used in this chapter are
large, the solution space is vast and this anomaly did not arise.
30
2.4 Optimized implementation of Murty’s search
space decomposition
We extend the 2-D optimization modifications in [116] to the 3-D assignment prob-
lems. These include: 1) inheriting the dual variables and partial solutions from the
subproblems being decomposed; 2) sorting the subproblems by an upper bound on
reward before solving; and 3) partitioning the subproblems in an optimized order.
All three modifications exploit the primal-dual aspects of the JVC algorithm. The
following sections explain each modification in detail for the case when the constraints
in (2.4) are relaxed in the m-best 3-D assignment algorithm. Similar optimization
techniques can be applied for the case when the constraints in (2.3) are relaxed.
Solving the 3-D assignment problem via the JVC algorithm provides dual variables u
and v, which can be inherited by the partitioned subproblem using Murty’s search
space decomposition. The solution tensor Xn , for the problem space Pn and the
reward tensor W , contains N solution triples < i∗ , j ∗ , k ∗ >. During each step of
Murty’s search space decomposition, a new subproblem Pnr is generated, associated
with a new reward tensor W 0 . Removing the triple < i∗ , j ∗ , k ∗ > from the subproblem
space Pnr is equivalent to setting w<i∗ ,j ∗ ,k∗ > = −∞. This implies we may skip the
initialization step for the JVC algorithm and go directly to the augmentation step
with only one arc left to assign in the 2-D assignment problem, following the procedure
outlined in Algorithm 1. In this case, the initialization step is only required for the
first feasible solution to the 3-D assignment problem.
31
Algorithm 1 Upper bound reward calculation when inheriting dual variables
1: for each < i∗ , j ∗ , k ∗ >∈ A do
2: w<i∗ ,j ∗ ,k∗ > = −∞
3: u0 = u, v 0 = v,
4: X 0 ← X − X<i∗ ,j ∗ ,k∗ >
5: end for
Note that we can not inherit the Lagrange multipliers µk from the previous problem
Pn in the process of partitioning the subproblems. The Lagrange multipliers from the
previous problem Pn may be too large for the subproblems Pnr , r = 1, 2, . . . , N . This
may cause the duality gap to remain above the threshold value required to terminate.
Thus, the algorithm will continue to run until the maximum iteration limit is reached.
The upper bound reward of individual subproblems is easily obtainable and can be
used to avoid solving subproblems that are unlikely to produce the next best solution.
For an m-best assignment problem, the best solution from problem Pn is always
better than the best solution obtained from the subproblems obtained by partitioning
Pnr , r = 1, 2, . . . , N . Therefore, for an m-best 2-D assignment problem, the objective
function of the solution to Pn can be used as an initial upper bound on the objective
function value of the best solution to its corresponding subproblems. Since 3-D
assignment problems may have a nonzero duality gap, the computation of the upper
bound can be determined using either the dual value (denoted by φ) or the primal
value (denoted by ω) as initial upper bounds to the partitioned subproblems.
When a subproblem Pnr is created by removing a triple < i∗ , j ∗ , k ∗ > from a copy
of P, we can compute the upper bound objective function value by finding the best
32
slack (i.e., next possible best assignment) of all the alternative assignments for a row i.
The upper bound objective function value will be the sum of the initial upper bound
and the row slack, denoted by Br . The calculation of the upper bound is shown in
detail in Algorithm 2.
A similar procedure can be followed for column j to find the column slack, Bc .
By combining both the row and the column slack, a tighter upper bound can be
obtained. The heap of subproblems can be modified to sort its elements (in descending
order) based on each element’s respective upper bound reward. This implies that the
problems located at the top of the heap are most likely to have the best solutions.
In this optimization method, the initial problem is partitioned into a series of
subproblems when it is solved by the 3-D assignment algorithm. Both the original
problem and its corresponding subproblems are saved into a heap. During each
iteration of Murty’s search space decomposition, if the top problem Pn removed from
the heap has a feasible solution, then the solution will be saved as the mth best
assignment. If Pn has not yet been solved (i.e., it has a partial solution), then we find
its best solution A∗n using the 3-D assignment algorithm and add it back into the heap.
A new partitioning process is then invoked on Pn and its solution A∗n . The process is
repeated until the heap is empty or a total of m solutions are obtained. This method
allows us to eliminate subproblems by focusing on their corresponding upper bounds,
thus reducing the number of problems needed to be solved by the 3-D assignment
33
algorithm.
The third optimization method proposed here is to carefully select the order in which
the partitioning is performed. This modification maximizes the probability that the
subsequent smaller subproblems (with a greater number of fixed arcs) have better
solutions. For problem Pn with solution A∗n that contains N triples, we first compute
each upper bound reward that would result from excluding each individual arc. These
upper bounds are computed via the method explained in Section 2.4.2. We then
select the triple that corresponds to the lowest upper bound reward computed and
exclude it from the current subproblem, while fixing the corresponding arc in the next
subproblem.
In this modification, the heuristic tends to ensure that the largest problem (maxi-
mum number of unassigned arcs) has the lowest upper bound. In other words, the
largest problem has the highest probability of containing the worst solution and to
be pushed to the bottom of the heap (and in turn, will most likely remain unsolved
upon algorithm termination). The next worst problem will tend to be the second
largest subproblem, and so on. By doing this, we increase the chance that the smallest
problem (that which has the least amount of unassigned arcs) contains the best
solution.
2.5 Pseudocode
The following variants were used and/or combined for different optimization methods:
34
(A) Inheritance of the dual variables and partial solutions during partitioning
(B) Sorting subproblems by an upper bound reward before solving, where the upper
bound is calculated via:
i ω + Br
ii ω + Br + Bc
iii φ + Br
iv φ + Br + Bc
These variants are denoted as listed for the remainder of the chapter and may be
combined, e.g., when combining variant A with variant B(ii) and variant C, the
algorithm variant will be categorized as A+B(i)+C. The pseudocode for Murty’s
modified search space decomposition, optimized via variants A, B(ii), and C, is
detailed in Algorithm 3. These variants assume JVC and Transauction to be applied
in the m-best 3-D assignment algorithm.
2.6 Results
The proposed m-best 3-D assignment algorithm was implemented in the MATLAB
2016b and runs on an Intel Core i7-4712HQ CPU processor @ 2.30 GHz with 16 GB
RAM. In all experiments, the top 104 ranked solutions were computed.
35
Algorithm 3 m-best 3D assignment algorithm
1: H ← {} . Initialize binary heap
2: U ← [] . Initialize solution list
∗ ∗
3: hA0 , P0 , f0 i = 3DAssign(wijk )
4: Partition(H, P0 , A∗0 ) . Invoke Partition method
5: H ← hA∗0 , P0 , f0∗ i . Add to the heap
6: counter = 0
7: while counter ≤ m − 1 and H 6= ∅ do
8: hA∗n , Pn , fn∗ i = H.pop
9: if A∗n is feasible then
10: counter = counter + 1
11: U ← A∗n , fn∗
12: else
13: hA∗n , Pn , fn∗ i = 3DAssign(wijk , hA∗n , Pn , fn∗ i)
14: if ∃ solution then
15: Partition(H, hA∗n , Pn , fn∗ i)
16: H ← hA∗n , Pn , fn∗ i
17: end if
18: end if
19: end while
We first performed 10 Monte Carlo runs to compare the simulation runtimes of the 3-D
assignment algorithm when relaxing either constraint (2.3) or constraint (2.4). The
reward tensor elements were uniformly distributed in the interval [0,1] and of dimension
60×60×8. The JVC and Transauction algorithms were implemented to solve the
2-D assignment and the transportation problems, respectively. As shown in Table
2.2, speedup of as much as 2.28 and an average speedup of 1.63 were observed when
comparing the two relaxation methods. In general, solving a 2-D assignment problem
is significantly faster than solving a transportation problem. The transportation
problem obtained from relaxing constraint (2.3) is complex, and thus takes a longer
time to solve compared to the transportation problem reconstructed from the best 2-D
36
Algorithm 4 Partition pseudocode
1: function Partition(H, < A∗n , Pn , fn∗ >, wijk )
2: for each < i∗ , j ∗ , k ∗ >∈ A∗n do
3: wi∗ ,j ∗ ,k∗ = −∞
4: end for
5: for each < i∗ , j ∗ , k ∗ >∈ A∗n do
6: for each row i∗ ∈ A∗ do
7: Br = max{wijk − u(i∗ ) − v(j) − µ(k)}
j,k
8: Bc = max{wijk − u(i) − v(j ∗ ) − µ(k)}
i,k
9: Bi∗ = Br + Bc
10: end for
11: (B, i∗ ) = min(Bi∗ 6= −∞)
12: < i∗ , j ∗ , k ∗ >= A∗n (i∗ )
∗
13: fnr = fn∗ + B
14: Anr ← A∗n − < i∗ , j ∗ , k ∗ >
∗
assignment solution when constraint (2.4) is relaxed. Relaxing constraint (2.4) also
consistently resulted in a smaller duality gap compared to when constraint (2.3) was
relaxed due to the fact that |k|= R < N = |j|. This implies that when constraint (2.4)
is relaxed, a smaller number of elements in the 3-D reward tensor are removed when
constructing the 2-D subproblem, i.e., since a smaller number of elements are removed,
there is a higher likelihood that a better solution remains. For these reasons, the
remaining experiments used the m-best 3-D assignment algorithm with the relaxation
37
Algorithm 5 3D assignment subroutine
1: function 3DAssign(wijk , hA∗n , Pn , fn∗ i)
2: f ∗ = −∞; lb =-∞; q ∗ = ∞; maxIter = 20
3: MAX = true, n3 = R
4: F ixList, v, Φ ← A∗n
5: for curIter = 1 to maxIter do
6: C = max(wijk − µk )
k
7: for < i∗ , j ∗ , k ∗ >∈ A∗n .F ixList do
8: C[i∗ , j ∗ ] = w[i∗ , j ∗ , k ∗ ]
9: end for
10: if Φ == ∅ then
11: (Φ, u, v, φ) = JVC(C, MAX)
12: else
13: (Φ, u, v, φ) = Augment(C, Φ, v, MAX )
14: end if
q ∗ = min(q, φ + n3 ∗ (µk ))
P
15:
k
16: for each row do
17: T [row] = w[row, Φ[row], k] ∀k
18: end for
19: (Ω, ω) =Transportation(T , MAX)
20: if ω ≥ lb then
21: lb = ω
22: f ∗ = lb
23: end if ∗ ∗
24: gap = |q |f−f
∗|
|
38
Table 2.2
10 Monte Carlo Runs for different Lagrangian Relaxation methods
To measure and quantify which algorithms best solve the 2-D assignment and trans-
portation problems within the 3-D assignment problem, we compared the runtimes of
the 3-D assignment algorithm when using the JVC or the auction algorithms for the
2-D assignment problem, and Transauction or simplex-based transportation algorithms
for the transportation problem. A tensor was generated with elements sampled from
a uniform distribution in the interval [0,1] for tensor sizes ranging from 30×30×8
to 60×60×8 with increments of N = 5. Any combination of the 2-D assignment
algorithms with the transportation algorithms resulted in the same assignments and
objective function values. An example of the objective function values of a sample
tensor of dimension 30×30×8 is shown in Fig. 2.4 when the algorithm was run to
39
29.82
29.81
29.8
29.8
29.79
0.2 0.4 0.6 0.8 1
4
Solution # ·10
Figure 2.4: Example objective function values for a tensor of dimension 30×30×8 with
values uniformly distributed on the interval [0,1].
obtain the top 104 solutions. As shown in Fig. 2.4, even when 104 assignments were
obtained, the maximum and minimum objective function values obtained from the
assignment solutions had minimal variation and the difference was relatively small for
all tensor dimensions tested; however, as shown in Fig. 2.5, the m-best 3-D assignment
algorithm, which invoked the JVC algorithm was, on average, 3 times faster when
compared to the case when the auction algorithm was used. The RELAX-IV algorithm
was used to solve the transportation problem in this experiment.
Similar tests were performed to evaluate the best algorithm to solve the transportation
problem. Assuming that the JVC algorithm would be invoked to solve the 2-D
40
·103
1.2
JVC
1 Auction
0.6
0.4
0.2
30 35 40 45 50 55 60
Tensor Size N with R = 8
Figure 2.5: The CPU runtime for the JVC and auction algorithms were compared as a
function of varying tensor dimensions. The JVC algorithm consistently outperformed the
auction algorithm.
assignment portion of the problem, Fig. 2.6 demonstrates that the simplex-based
transportation algorithm was significantly slower compared to both the Transauction
and RELAX-IV algorithms. In general, the RELAX-IV algorithm had the fastest
runtime speed. The maximum observed speedup of RELAX-IV in comparison to
the simplex-based transportation and the Transauction algorithms was 21.4 and 2.4,
respectively. Overall, the RELAX-IV algorithm dominated both the simplex-based
transportation and the Transauction approaches to the transportation problem, on
average solving it nearly 17 and 1.6 times faster, respectively. Based on these findings,
the JVC and RELAX-IV algorithms were selected to solve the 2-D assignment and
transportation problems within the m-best 3-D assignment problem, respectively, for
the remaining computational experiments. We optimized the m-best 3-D assignment
41
·103
10
9 Transauction
8 RELAX-IV
Simplex-based transportation
Time (s) 7
6
5
4
3
2
1
0
30 35 40 45 50 55 60
Tensor Size N with R = 8
Figure 2.6: The CPU runtime for the Transauction and simplex-based transportation
algorithms were compared as a function of differing tensor dimensions. The Transauction
algorithm remained relatively unaffected by the increase in the reward tensor size, while the
transportation algorithm took orders of magnitude more time to find the same assignments.
42
30
28.6
0.2 0.4 0.6 0.8 1
Solution # ·104
Figure 2.7: Objective function values for tensor size 30×30×8 with various optimization
combinations.
in our problem setup did not serve as an accurate estimate of the initial upper bound.
All other combinations of optimization methods were comparable to the Murty’s
search space decomposition. Therefore, optimization method combinations A+B(iii)
and A+B(iv) were removed from the remaining tests.
43
compared to the original Murty’s search space decomposition. As shown in Fig. 2.8,
methods A+B(i)+C and A+B(ii)+C were able to obtain speedups with very minimum
variation in the objective function values originally found by Murty’s search space
decomposition for all tensor sizes except that of dimension 60×60×8. The reason
for such a slow down is explained later in Section 2.6.6. Furthermore, combinations
A+B(i)+C and A+B(ii)+C were able to obtain objective function values slightly
better (higher) than the proposed method by Murty (on the order of 10−6 ). This
phenomenon is due to the Lagrangian relaxation algorithm’s approximation of the
3-D assignment problem. The search space decomposition method is suboptimal
when applied to the 3-D assignment problem (due to the suboptimal nature of the
Lagrangian relaxation algorithm), and so from our analysis we observed that, through
the particular optimization method combinations of A+B(i)+C and A+B(ii)+C,
better feasible solutions were found. These methods were also significantly faster,
offering an average of 2.1 and 2.4 speedup, respectively, as illustrated in Fig. 2.8. To
investigate these combinations more thoroughly, Monte Carlo runs were performed on
these two combinations only.
To measure both the overall scalability and consistency, 10 Monte Carlo runs were
performed for each tensor size varying from 30×30×8 to 60×60×8 in increments of
N = 10 and using the two specific optimization method combinations of A+B(i)+C
and A+B(ii)+C. Each test tensor was generated with elements uniformly distributed
in the interval [0,1] and 104 solutions were obtained for each tensor. In each run, the
objective function values and the simulation runtime, were monitored and compared
44
Table 2.3
Simulation runtime in CPU seconds for various combinations of decomposition methods
Decomposition Methods
Original
Tensor Size A A+B(i) A+B(ii) A+B(i)+C A+B(ii)+C
Murty
30×30×8 139.17 157.32 193.45 124.30 51.69 44.96
35×35×8 189.03 230.78 400.78 284.66 74.95 62.67
40×40×8 235.56 293.40 429.86 391.12 76.85 73.50
45×45×8 178.70 226.16 398.22 337.41 102.21 87.33
50×50×8 288.11 370.07 803.69 491.81 99.15 79.73
55×55×8 244.75 303.14 489.72 477.44 207.71 204.60
60×60×8 152.72 209.78 463.33 427.37 309.65 232.06
45
·10−3
4
A
A+B(i)
Figure 2.8: Percentage error compared against the speedup for the combinations of
optimization methods tested for all tensor sizes, varied from 30×30×8 to 60×60×8 with an
increment of N = 5.
(when N ≈ 55) in computation time between obtaining a feasible solution and the
m-best optimization methods (e.g., partitioning or sorting), as seen in Fig. 2.10. The
increase in dimension N does not necessarily mean an increase in the computation time
of the 3-D assignment algorithm, since both optimization methods A+B(i)+C and
A+B(ii)+C reduce the frequency of calling the 3-D assignment algorithm; however,
partitioning and/or sorting the larger subproblems may become more difficult. A
more favorable speedup may be observed if the algorithms were to be implemented
in a fast object oriented-programming language. Overall, for a 30×30×8 tensor, the
m-best 3-D assignment algorithm utilizing optimization method A+B(ii)+C took an
average of 4.9 milliseconds to obtain a single solution to the 3-D assignment problem.
46
Table 2.4
Minimum, maximum, and average runtimes in CPU seconds to obtain 104 solutions
As mentioned in Sections 2.2.1 and 2.3.1.4, the value of mk should be such that
PR
k=1 mk ≥ N . For problems where mk = R ≥ N , the 3-D assignment problem
47
A+B(i)+C
0
N = 30 N = 40 N = 50 N = 60
A+B(ii)+C
Ave. percentage Error (%)
·10−3
4
0
N = 30 N = 40 N = 50 N = 60
Figure 2.9: Box plot for the average percentage error (as compared to the original Murty
search space decomposition method) for the optimization method combinations A+B(i)+C
and A+B(ii)+C.
48
·103
0.5
Original Murty
0.4 A+B(i)+C
A+B(ii)+C
Time (s)
0.3
0.2
0.1
30 40 50 60
Tensor Size N with R = 8
Figure 2.10: The average CPU runtimes for 10 Monte Carlo runs for the two optimization
method combinations tested.
reduce the need for the 3-D assignment routine invocation and, therefore, is able
to obtain 104 solutions in a relatively short amount of time (< 5 minutes). The
tensor of dimension 30×30×6 had an increase in average CPU runtime compared to a
tensor of dimension 30×30×10 when considering the optimized methods A+B(i)+C
and A+B(ii)+C. This is due to nonzero duality gap which impacts the partitioning
procedure and subsequently requires more subproblems to be solved before obtaining
all m-best solutions. Intuitively, due to the nature of the problem, a tensor of dimension
30×30×6 is more likely to violate constraint (2.4).
49
Table 2.5
Minimum, maximum, and average runtimes in CPU seconds to obtain 104 solutions
50
A+B(i)+C
Ave. percentage Error (%)
·10−3
2
R=6 R = 10 R = 15 R = 20 R = 25 R = 29
A+B(ii)+C
Ave. percentage Error (%)
·10−3
R=6 R = 10 R = 15 R = 20 R = 25 R = 29
Figure 2.11: Box plot for the average percentage error (as compared to the original Murty
search space decomposition method) for the optimization method combinations A+B(i)+C
and A+B(ii)+C, where N = 30.
51
·103
3.5
Original Murty
3 A+B(i)+C
2.5 A+B(ii)+C
Time (s)
2
1.5
1
0.5
5 10 15 20 25 30
Tensor Size R with N = 30
Figure 2.12: The average CPU runtimes for 10 Monte Carlo runs with each increment of
R for the two optimization method combinations tested.
52
Chapter 3
3.1 Introduction
3.1.1 Motivation
The illicit drug trade is an extremely profitable industry and it is estimated that
the consumers in the United States of America alone spend as much as 150 billion
USD per year on black market drugs. Of this, it is estimated that 37 billion USD is
spent on cocaine alone. It is a problem of national, and increasingly international,
concern [171], [172]. This problem increased exponentially with the advent of narco-
terrorism and the prospect of terrorists using narcotics smuggling techniques to
transport terrorists or weapons of mass destruction into the country. Given the
53
reduction in the national resources allocated to the counter-narcotics threat, it is of
paramount importance that smarter and faster decision support tools that integrate a
wide variety of information are developed to assist in this challenge of using less to
accomplish more. To do so requires effective hybrid human-machine systems.
The US Navy has shown a growing interest in mixed-initiative human-machine
systems and mastering information dominance for effective context-driven operations
[173]. To do so requires the transfer of the right data from the right sources in the
right context to the right decision maker (DM) at the right time for the right purpose
– a concept known as 6R [163]. If a dynamically developing operational context can
be understood by the DM, appropriate courses of action can be carried out, given
the unfolding events. In the context of maritime operations, DMs must assimilate
information from a multitude of sources before making decisions on the strategy to be
followed each day. If the DMs are better informed about what to expect given the
currently accessible data, as well as what they might expect in the case of unforeseen
events, effective decisions can be made on the courses of action.
Currently, much planning for narcotics seizures is performed by humans interpreting
large amounts of data, including weather forecasts, intelligence, and recently reported
contacts of interest. Each day, the targeting analysts must process and interpret all
of this data and agree upon a course of action amounting to where limited detection
aircraft and interdiction vessels should be allocated to disrupt the maximum amount
of shipments over a multi-day planning cycle. The consolidation of large amounts of
data and possible strategies into a single asset allocation optimizer is beneficial for
both algorithmic purposes and human understanding. To support this transition to a
human-machine collaborative mode of operation, we have developed an optimization-
based modeling framework and the associated decision support software tool for
54
Observations
Sensors
Interdiction Dynamic Mission
Environment
Interdiction Surveillance COAs Information
Actions Processing
Decision
Interdiction Makers Target Type
Assets & Tracks
Figure 3.1: The counter-smuggling problem viewed from a stochastic control standpoint.
Targeteers (decision makers) choose from a set of available surveillance assets and finalize a
search schedule to allocate the asset(s) over a near-time planning horizon, typically 72
hours. Similar to the planning/decision process presented in [4, 7, 159, 161], after the action
is carried out, information is gathered, processed, and fed back to the targeteer.
55
component, detailed in [161], involves the allocation of multiple heterogeneous surface
assets (viz., Navy ships, Coast Guard cutters), to disrupt multiple drug smugglers of
varying types, similar to that which is addressed in this chapter. The DMs in Fig. 3.1
choose which surveillance assets to allocate to which target(s) (smugglers) based on
the target type and intelligence forecasting the target’s trajectory (specified in the
form of probability of activity (PoA) surfaces [67, 159]). After allocated assets attempt
to search for potential targets, the mission environment changes due to any target
detection that may occur or due to weather changes. These environment changes are
recorded by sensors and operators, processed, and sent back to the DMs in the form of
target types and tracks, and are combined into an updated PoA surface, providing a
new forecast for the remainder of the planning time horizon. The process then repeats.
Ideally, the results of this chapter feed that of [161] for coordinated smuggler detection
and interdiction.
The surveillance mission involves the search, detection, tracking and identification of
potential smugglers within a large geographic region, which plays an essential role
in the counter-smuggling operation. Airborne surveillance assets (e.g., helicopters,
maritime patrol aircraft) are highly efficient at determining the sea surface traffic
information. However, in a real world scenario, there is typically a limited number
of surveillance assets and a large sea surface area that needs to be surveilled. The
study of how to most effectively employ limited resources to locate an object, whose
location is not precisely known, falls under the rubric of search theory.
The earliest foundations of search theory were built by Koopman [92] to aid the U.S.
56
Navy in efficiently locating enemy submarines during World War II, which was further
generalized in [93]. There are two major categories of search theory: 1) the optimal
allocation of effort problem, and 2) the best track problem [130]. For the optimal
effort allocation problem, Blachman and Proschan [24] derived an optimum search
pattern for a generalized problem of finding an object in one of n boxes. Pollock [143]
introduced a Bayesian approach to the optimal allocation problem, where allocation
decisions are made sequentially based on observations up to the current time in order
to minimize the expected cost of searching to satisfy a specified probability of detection.
Charnes and Cooper [33] applied convex programming, along with the Kuhn-Tucker
conditions, for the optimum distribution of effort computation. In this chapter, we
adopt Charnes and Coopers’s method to compute the effort required for the optimal
search in a discretized map.
Stone [166] made use of the calculus of variations, convexity properties, and
generalized Lagrange multiplier techniques to formulate a systematic treatment of
search theory. For the best track problem, Lukka [104] worked out the theory of
optimal search for stationary targets, targets whose motion is known, and targets
whose motion is almost known. The method relies on the theory of optimal control.
Mangel [108] extended Lukka’s algorithms with the option of incorporating a detection
rate that is either independent of or dependent on velocity.
In recent years, the problem of drug surveillance has been formulated from a variety
of viewpoints. For example, Washburn [179] formulated the surveillance problem as a
two-person zero-sum game and Pfeiff [137] applied search theory to a defender-attacker
optimization model that maximizes the defender’s probability of success. Royset and
Wood approach the problem as a network flow problem, wherein an interdictor must
destroy a set of arcs on a network to minimize both the interdiction cost and minimize
57
the maximum flow of smugglers [153]. Jacobson [81] formulates the problem as a
multiple traveling salesman problem with the objective of minimizing the overall search
route cost for multiple platforms that visit every search location. Ng and Sancho [129]
developed a dynamic programming method to solve the surveillance problem. However,
the dynamic programming approach suffers from the curse of dimensionality for large
problems and, consequently, near-optimal approximations are needed. A common
way to overcome this curse is by approaching the problem via approximate dynamic
programming with policy iteration as in [76], where they frame the problem in terms
of stochastic control with partially observable Markov decision processes. Kress et
al. [95] examine a discrete-time and discrete-space stochastic dynamic programming
approach to coordinate the efforts of a single aerial search asset and a single surface
interdiction asset. Other approaches, including the formulation of the surveillance
problem as a resource-dependent orienteering problem [29, 141, 142], wherein reward
depends on the resource expended at each visited node, have been investigated.
Optimal search problem formulations have become versatile in their ability to ac-
count for multiple cooperating searchers, multiple targets with different characteristics,
as well as environmental effects on the search [128, 152, 155, 182]. For example, arc
inspection is based on the inverse of the probabilities of detection as opposed to PoA
surfaces accounting for weather and intelligence in [4, 67, 159]. Byers [27] extended
the network modeling approach to drug interdiction by including Bayesian updating
of the PoA surface. He considered a scenario with one unmanned aerial vehicle and
one ground-based interceptor to interdict multiple targets with different deadlines.
Bessman [22] developed a defender-attacker optimization model that uses the PoA
surfaces as the basis for asset allocation against smugglers. He formulated a stochastic
shortest path problem and represented smuggler behavior as the output of an all-to-one
58
label-correcting temporal dependence instead of one-step dependence. Three different
sensor types (one interdiction and two surveillance) are considered for allocation to
prosecute one type of target (among three possible). In this defender-attacker model,
smugglers are assumed to have imperfect knowledge of possible sensor locations and
are given the ability to modify their behavior in response to this information.
Similar to Pietz and Royset [141], we also discretized our maritime map. We adopt
Charnes and Coopers’s method [33] to compute the effort required for optimal search
in a discretized map. Our novel algorithmic contributions are the following:
1): Fast 1- and 2-step lookahead approximate dynamic programming algorithms for
maritime surveillance composed of heterogeneous assets, heterogeneous targets,
each of which is carrying not necessarily the same amount of contraband. Our
algorithms exploit the fusion of intelligence and weather information available
in the probability of activity (PoA) surfaces.
2): We measure the utility of our approach by way of comparison with more traditional
branch-and-cut algorithms to solve the surveillance problems. We develop
two variations of the approximate dynamic programming-based surveillance
asset allocation algorithms, wherein real-world constraints on the assets (e.g.,
endurance and rest time) are explicitly considered.
The chapter is organized as follows. Section 3.2 describes the problem and the
technical challenges addressed in the development of allocation algorithms underlying
our decision support tool. In Section 3.3, we discuss solution approaches, including
59
exhaustive and greedy branch-and-cut and approximate dynamic programming. In
Section 3.4, we present simulation results as applied to a benchmark scenario that
has multiple targets, multiple surveillance assets and parameters that have multiple
levels of uncertainty. We additionally conduct and present results from our sensitivity
analysis relating to the scalability and performance of our solution approaches in a
realistic mission scenario.
The complete maritime surveillance and interdiction problem is one of maritime drug
trafficking disruption in the East Pacific Ocean and the Caribbean Sea. The general
mission consists of two components: 1) surveillance (the detection, tracking, and
identification of contacts of interest) and, 2) interdiction (the interception, investigation,
and potential apprehension and repatriation of smugglers). In response to the need
for information fusion, we proposed a decision support system (DSS) in [160], named
COAST, to host and utilize algorithms to provide auxiliary support to JIATF-South
targeteers. We proposed different forms of visualizations to enable DMs to understand
the behavior of our algorithms and the presently evolving context, while also providing
functionality for human input and interaction in order to effectively integrate both
humans and decision support algorithms for mixed-initiative planning. The information
flow for the complete maritime interdiction problem is illustrated in Fig. 3.2.
In COAST, we solve a moving horizon dynamic resource management problem for
both surveillance and interdiction operations based on user-defined mission parameters.
60
We then provide suggested courses of action (COAs) that the DMs can interact with,
adjust and fine tune to analyze various “what-if” scenarios and to obtain a satisfactory
allocation. Visual and computational analytics are provided to communicate the
reasons behind our algorithm’s behavior. From Fig. 3.2, continuously updated PoA
surfaces (see Fig. 3.3 for an example), representing the posterior probabilities of
smugglers’ presence, constitute the sufficient statistics for decision making [18] – that
is, COAST does not need to know how specific intel or meteorology and oceanography
(METOC) features, e.g., uncertainty associated with a drug trafficker, wave heights,
currents, etc., and how these two inputs, along with asset and target models, are
combined to produce the PoA surface. A targeteer can fine tune the allocations, the
resulting COAs are executed, and observations from surveillance and interdiction
assets are sent back to the reachback cell in the form of situational reports or SITREPs
(e.g., detections or non-detections) which are used to update the PoAs. The targeteer
can specify multiple objective functions. The objectives considered and analyzed in
this chapter are:
Let αj and ρj denote the expected contraband weight and expected number of
smugglers for case j. Let C be the total number of cases (i.e., predicted smuggler
61
Surveillance JIATF-South Interdiction
Asset Allocation Targeting Team Asset Allocation
COAST
• Probability of Activity
• Case Information
METOC Intel
Figure 3.2: Information flow and decisions (controls) in the counter-smuggling problem.
The decision support tool, Courses Of Action Simulation Tool (COAST), provides courses
of action (COAs) to the JIATF-South Targeting Team who then modify them as they see
fit. The manually entered COAs can then be fed back into the tool where the simulation is
rerun providing new outcomes to the targeting team, who can then provide further feedback
and modifications, if necessary.
tracks) to be searched. Then, the normalized priority weights for Objectives O1 –O3,
respectively, are as follows:
αj
λj = PC (3.1)
g=1 αg
1
λj = (3.2)
C
ρj
λj = PC (3.3)
g=1 ρg
1
62
Table 3.1
Summary of Notations
The foundation for each asset allocation solution is the PoA surface over multiple
time epochs. The PoA surface is the result of combining METOC information
with actionable intelligence with regards to uncertain smuggler departure point(s),
63
departure times, waypoint(s), destination(s), and their behavior on the ocean. The
spatio-temporal probability surface, P oA, is calculated as the joint probability of two
discrete random events: 1) the case j, with a corresponding binary random variable
Cj , i.e., how trustworthy the intelligence source is regarding a target, and 2) the target
corresponding to case j at a location q at time epoch k, with a corresponding binary
random variable X(q, k, j), i.e., given that the case j exists, the probability that the
target exists at a location q at time k. The probability surface P oA is indexed by a
location q, time k, and case j, and is defined in (3.4)–(3.7).
where we separate the expectation in (3.6) based on the law of total expecta-
tion/iterated expectations.
We assume that P ( Cj = 1) = 1, that is, the intelligence sources are always correct
with 100% certainty. Then, (3.7) reduces to,
64
Figure 3.3: PoA surface P oA(q, k, j) summed over all k.
detailed in [67] and represent all the relevant information for effective asset allocation.
The DM can specify how many planning epochs to optimize over based on these
PoA surfaces and the objective function to be optimized. A typical PoA surface
P oA(q, k, j), summed over all k, is shown in Fig. 3.3.
We assume the optimum distribution of search effort is known based on the model
in [33]. Let pjkq denote the PoA of target j in cell q at time k. We first rank the
nonzero PoA cells in decreasing order such that pjk[1] ≥ pjk[2] ≥ . . . , where [κ] denotes
the κth largest nonzero PoA cell. Let the total available effort to be expended by asset
i to search case j be Φij . A critical threshold is then calculated to narrow the problem
space and eliminate PoA cells not worth searching, by first finding an n that satisfies
the following inequality [33].
n
X
v ln pjk[v] − ln pjk[v+1] > Φij (3.9)
v=1
65
Then, the critical probability, ρijk , corresponding to the search of case j by asset i at
time k, is as in (3.10).
n
!
1 X
ρijk = v ln pjk[v] − ln pjk[v+1] − Φij + ln pjk[n+1] (3.10)
n v=1
We then select all the cells corresponding to case j which have a probability of
activity greater than the critical probability found in (3.10). This reduces the number
of potential cells that need to be searched for each case j by asset i. We then compute
the patrol box that maximally covers the high probability cells for each case. The
allocation of assets to patrol boxes is the subject of the optimization problem discussed
next.
The case regions are labeled by aggregating the PoA surfaces over a discrete planning
time period of length K (e.g., 72 hours). Let us assume a moving horizon frame of
reference, where k = 0 corresponds to the current time period of unit length (∆ =
1 hour):, k = 1 corresponds to the first planning period, and k = K corresponds to
the final period to be planned for. Let A be the total number of surveillance assets,
C be the total number of cases and q ∈ Q(j) be the set of cells in the patrol box for
case j as determined by the optimal search effort calculation algorithm. The size of
the patrol box depends on the concept of operations and is assumed known. Let wijk
be the probability of successful detection (P oSD), which is the product of the PoA
surface and the probability of detection (P D) when asset i is assigned to search for
66
case j at time k. That is,
X
wijk = P oA(q, k, j)P D(i, j, k), (3.11)
q∈Q(j)
where
Sijk vis ∆
−
P D(i, j, k) = 1 − e Aj
(3.12)
is the probability that asset i detects case j during the k th time epoch interval (the
P D(i, j, k) can only be collected at the end of the k th time epoch interval). Let us
assume that each asset travels to the search region at a speed via and searches in the
search region at a speed vis . The PD equation is adopted from Koopman’s random
search formula [92], and offers a lower bound on the probability of detection; advanced
models may be used in place of (3.12) as in [166]. Here, Sijk is the sweepwidth of
asset i searching for case j at time epoch k, and ∆ is the inter-epoch interval (=1
hour in this chapter).
Let Bij represent the geodesic1 distance that asset i must traverse from its base to
the centroid of case j. The time it takes to traverse Bij , denoted by tij , is given by,
Bij
tij = , (3.13)
via
where d·e denotes the ceiling or rounding up to the nearest integer. Let τi` denote the
departure time if an asset i is allocated to a case for flight `, and di` as the landing
time upon its return from the corresponding search box. The index ` increments
with each flight that asset i is scheduled to fly over the planning time horizon. Formally,
1
The geodesic distance is the shortest distance between two points on the surface of a sphere.
67
if i is assigned to a case during the `th flight
k, 0 < k ≤ K
τi` = (3.14)
∞,
otherwise
A similar definition applies to di` . For each flight, the total search and travel time
for each asset from its corresponding base to each case must not exceed the asset’s
endurance, Li (in hours), and, upon flight completion, it must rest for Ri consecutive
hours before it can be scheduled to depart for the next search box. The assets are
assumed to be manned aircraft with an associated rest time for the pilot; additionally,
each aircraft requires periodic maintenance and refueling. The minimum time it may
take for an asset to become available again for search is Li + Ri . Note that there is no
feasible asset allocation for a case j and asset i if 2tij ≥ Li , i.e., the total round trip
travel time for a search region is greater than the maximum aloft time Li . With P oSD
defined as in (3.11), the cumulative probability of successful detection (CP oSD) for a
given asset i is
K
Y
CP oSD(i, j) = 1 − (1 − wijk xijk ) , (3.15)
k=1
where xijk is a binary decision variable such that xijk = 1 if asset i is assigned to case
j at time epoch k, and 0, otherwise. The total reward that asset i can collect over the
planning time horizon is then
X
ri = λj CP oSD(i, j), (3.16)
j
where λj is the normalized priority weight of case j. We wish to solve the following
problem.
68
A
X
max J = max ri (3.17)
xijk ,τi` ,di`
i=1
X
s.t. xijk ≤ 1 ∀j, k (3.18)
i
X
xijk ≤ 1 ∀i, k (3.19)
j
In (3.17), we assume that the surveillance asset cannot detect targets while it is en
route to the patrol box. Constraints (3.18) and (3.19) ensure that no more than
one case is allocated to an asset at one time. Constraint (3.20) indicates that the
maximum asset aloft time must not exceed Li . Constraint (3.21) ensures that there
must be a minimum downtime of Ri between asset allocations for a particular asset i
and that subsequent allocations must have a departure time later than the previous
one(s), if any. The problem posed in (3.17)-(3.23) is NP-hard [59].
69
3.3 Solution Approach
70
=
ri φ
1
j2 = 1 ri = φ2
τi2
.. ..
j1 = 1 . .
..
.
τi1 ..
.
Z
.. ..
. j1 = C .
Figure 3.4: Branching method with τi1 and τi2 being the departure time for the first and
second flights and the corresponding case assignment j1 and j2 . The ri is evaluated using
(3.15) and (3.16) for each completed branch. The highest ri is then saved as the best
assignment for asset i.
Similar to E-B&C, we repeat the asset allocation process for all the available assets
and fix the assignment for an asset i∗ with the highest ri . After the asset-case-time
epoch assignment is fixed, we update the PoA surface to ensure that the assigned
cases are no longer available for additional scheduling during the assigned search hours.
The same process is then repeated until either no more assets are available or all cases
71
are fully allocated. We refer to this method as GB&C-I. The pseudocode is shown in
Algorithm 7. In this pseudocode, line 1 states that while there are any unassigned
assets, continue on to lines 2–7, where the best assignment for each unassigned asset is
found using B&C. The best asset assignment is then selected in line 8 (i.e., i∗ becomes
known among the explored potential assignments). In lines 9–11, the PoA surface is
updated given the asset assignment found.
To reduce the runtime and problem complexity, we propose a second greedy Branch-
and-Cut method, referred to as GB&C-II. This method is similar to the E-B&C
method, except that we put an additional constraint on assets. Once we enumerate all
the possible departure times and find the best assignment {j ∗ } corresponding to each
departure time for an asset i, we fix the corresponding schedule. That is, we reduce
the complexity of search with more than one asset from permutation ordering to a
linear ordering. The same process is then repeated until all cases are fully allocated
72
or there are no more assets available. The pseudocode is shown in Algorithm 8. Here,
line 2 finds the best assignment for asset i found in line 1. Line 3 updates the PoA
surface and line 4 saves the best assignment found in line 2.
Slave Process
73
Executes the subproblem
The serial GB&C-II algorithm, executed on a single processor, searches the branch-
and-cut tree by expanding live nodes one at a time. In order to parallelize this problem
on M processors, we set each τi1 to each processor and let each processor execute the
subproblem. All processes share the same memory for the PoA and other read-only
data. Lastly, the master processor collects all value returns from the slave processors
to evaluate the best assignment for asset i.
where jk is the state-based control variable that selects a case j at time epoch k as
74
asset-case assignment has been made at time epoch k − 1 and the asset is currently
in a flight state. The detailed control options are described in this section later (see
(3.31) and (3.32)).
The approximate dynamic programming equation for the problem is defined as
follows:
Y
gk (zk , jk ) = λjk 1 − (1 − wijk k ) (3.26)
k∈sjk |zk
where sjk is the set of remaining search time indices available within the current sortie
for asset i assigned to case j and Λ(k) is a function that indicates that the asset is
currently flying its `th flight at time k. The variable λj is the normalized priority
weight for case j. Here, J¯k+1 is the heuristic cost-to-go and is estimated based on the
following assumptions:
H1 : The asset will fly out for its maximum aloft time
H2 : Each asset will stay on just one case for each flight
H4 : The case with the highest total reward will be selected for the `th flight interval,
as in (3.28)
Y
j ∗ = arg max λj 1 − (1 − wijk ) (3.28)
j
k∈γ(i,j,`)
where γ(i, j, `) is the set of search time indices for asset i assigned to case j for the
`th flight. If the planning time horizon allows multiple flights, then we first compute
75
the best case for the next flight time defined by H1 to H3 using (3.28). The future
cost-to-go for the `th flight is as follows.
Y
H(`) = λj ∗ 1 − (1 − wij ∗ k ) (3.29)
k∈γ(i,j ∗ ,`)
where j ∗ is computed from (3.28). The heuristic cost-to-go given the current flight at
time k is Λ(k) and is given by
K
dL e
i +Ri
X
J¯k+1 (f (zk , jk , `k )) = H(n) (3.30)
n=Λ(k)+1
A comparison of expected reward between launching the asset at the current hour
versus the next hour is performed using rollout with the heuristic defined above. If
launching the asset during the current time epoch results in a higher reward, then
the asset will be assigned to the case with the highest total reward ri in (3.28) and
assigned for the first search hour to the selected case. If launching the asset during
the next time epoch results in a higher reward, then we simply increment the time
epoch and repeat the process. Fig. 3.5 illustrates this rollout heuristic for determining
the expected reward for launching at a different hour.
76
Time (Hours from Scenario Start)
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536
(1)
(2)
Figure 3.5: Illustration of rollout for deciding when to fly with the traveling time (green),
search time (blue).
When the asset is in flight, for the second through final hour of the search, the
control variable jk takes on a different set of values, detailed as follows:
jk−1 , Stay on the current case
jk =
j̃ 6= jk−1
Switch to a different case with the cost of additional travel time
(3.32)
We illustrate the computation of the heuristic for the 1-step lookahead rollout in
Fig. 3.6. The first example illustrates the situation when the surveillance asset is
searching for case j and chooses to stay on case j for the remaining search interval.
The second example illustrates the situation, wherein the asset currently searching for
case j switches to a new case j̃ 6= j, while considering the cost of additional travel time
from case j to case j̃. The travel time between the new case j̃ to the asset’s home base
is then the new return travel time for the asset. The optimal control action is selected
based on the maximum expected reward, as in (3.27). This process is repeated for
each time epoch k to obtain a feasible asset-case assignment over the planning horizon.
77
Time (Hours from Scenario Start)
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536
(1) case j
(2) case j̃ 6= j
78
permutation of asset order, where m specifies the size of the subset permutation to be
used for the initial asset allocation. Lines 4–7 find the best allocation given each asset
i in a specific asset order from line 1 and updates the PoA surface, accordingly. Then,
lines 12–18 compute the best assignment for the remaining unassigned assets. Line 12
finds the best assignment for each unassigned asset and line 17 selects the best asset
i∗ for allocation. The PoA surface is subsequently updated in line 18. Lines 20–24
save the complete assignment and reset the parameters for the next asset permutation
sequence generated in line 1.
79
Algorithm 9 mSLADP-I
1: PermSeq = Perm({1, . . . , A}, m) . m-length permutation of asset order, where m
specifies the size of the subset permutation to be used for initial asset allocation
2: for each AssetSeq in PermSeq do
3: for each i ∈ AssetSeq do
4: assign(i) = ADP(i)
5: BestAssign ← BestAssign + assign(i)
6: updatePoA(assign(i))
7: AssignedAsset ← AssignedAsset + i
8: end for
9: while length(AssignedAsset) ≤ A do
10: for each i ∈ {1, . . . , A} do
11: if i ∈
/ AssignedAsset then
12: assignTemp(i) = ADP(i)
13: end if
14: end for
15: b assign(i∗ ) = MaxReward(assignTemp) . Given the previous
allocations, select the asset assignment with the highest ri among the remaining
available assets
16: AssignedAsset ← AssignedAsset + i∗
17: BestAssign ← BestAssign + b assign(i∗ )
18: updatePoA(assignment(i∗ ))
19: end while
20: PotentialAssign ← PotentialAssign + BestAssign
21: assign = ∅
22: AssignedAsset = ∅
23: BestAssign = ∅
24: resetPoA
25: end for
26: BestAssign = MaxReward(PotentialAssign)
80
Algorithm 10 mSLADP-II
1: PermSeq = Perm({1, . . . , A}, m)
2: for each AssetSeq in PermSeq do
3: for each i ∈ AssetSeq do
4: assign(i) = ADP(i)
5: BestAssign ← BestAssign + assign(i)
6: updatePoA(assign(i))
7: AssignedAsset ← AssignedAsset + i
8: end for
9: for each i ∈ {1,. . . , A} do
10: if i ∈
/ AssignedAsset then
11: assign(i) = ADP(i)
12: end if
13: end for
14: AssignedAsset ← AssignedAsset + i∗
15: BestAssign ← BestAssign + assign(i)
16: updatePoA(assignment(i∗ ))
17: PotentialAssign ← PotentialAssign + BestAssign
18: assign = ∅
19: AssignedAsset = ∅
20: BestAssign = ∅
21: resetPoA
22: end for
23: BestAssign = MaxReward(PotentialAssign)
81
3.4.1 Scenario Description(s)
There are two main areas of operation in the simulated scenario: the East Pacific Ocean
and the Caribbean Sea. The PoA surfaces corresponding to this area of responsibility
(AOR) were partitioned into a grid of 90 × 138 cells, where each cell is a square
with a side length of 30 nautical miles. The total area of the AOR was ≈ 11 million
square nautical miles. The lower left corner of the rectangular AOR had a latitude
and longitude of 10◦ S, 110◦ W, respectively.
The PoA surfaces forecasted ten smuggler cases, of which five were located in
the East Pacific Ocean and the remaining five were located in the Caribbean Sea.
The details for each case can be found in Table 3.2 and Fig. 3.8. These cases are
generated based on Navy intelligence, which typically comprises estimates of the
expected number of smugglers on board and the size of the contraband shipment.
Often there are few “active” cases, i.e., cases which targeteers deem to have sufficient
actionable intelligence to allocate assets to. We assume the PoA surfaces reflect
the spatio-temporal probabilities pertaining to such “active” cases. Four different
types of smuggler vessels were considered: 1) Go Fast – small, fast boats capable of
reaching high speeds, 2) Panga – modest-sized, fast boats, that are easy to build by
the smugglers. 3) Self-Propelled Semi-Submersible (SPSS) – narco-submarines capable
of shifting heavy loads long distances while almost submerged under the ocean’s
surface [180], and 4) Fully submerged vessel – makeshift submarine-like vessels that
can remain submerged with large quantities of cocaine aboard. Each case had a unique
departure, destination, and waypoint combination. Waypoints are defined as possible
areas in the ocean where the cargo is transferred to another vessel or a change in
trajectory of the smuggler is predicted. Additionally, each case also had an associated
82
Table 3.2
Smuggler Cases
payload measured in kg of cocaine. This is relevant when we run the algorithm with
objective O2. An important fact to note is that each case had different start and end
times. Fig. 3.7 details the time epochs when each smuggler case is deemed active.
Cases with high uncertainty had wide bands of PoA. The amount of uncertainty is
dependent on the type of smuggler vessel (e.g., SPSSs can be extremely difficult to
detect, and thus the corresponding PoA surfaces reflect this in long and broad bands
of probability reflecting spatial and temporal uncertainty), and/or departure time
uncertainty.
In the scenario, ten P-3 surveillance assets were considered as available for allocation
during the planning horizon. The home bases of individual surveillance assets are
detailed in Table 3.3. Each asset carries two different types of sensors with performance
parameters detailed in Table 3.4.
We simulated the scenario with a granularity of one hour (i.e., the forecasted
83
Time (Hours from Scenario Start)
0 6 12 18 24 30 36 42 48 54 60
Case 1
Case 2
Case 3
Case 4
Case 5
Case 6
Case 7
Case 8
Case 9
Case 10
Figure 3.7: Chart displaying when each smuggler case is active over the 72 hour time
horizon. Cases are active up through time K = 72 and do not necessarily end at that time,
but rather, due to the time horizon of the forecast data, are truncated.
Table 3.3
Asset Home Base Location (Longitude, Latitude)
1, 6 (-69.7617, 18.5036)
2, 7 (-79.3833, 9.07111)
3, 8 (-85.5442, 10.5931)
4, 9 (-89.0558, 13.4406)
5, 10 (-92.37, 14.7942)
surfaces were for each hour, on the hour, thus ∆ = 1 h). The forecasts extended
to 72 h out from the current time (i.e., K = 72) and an asset allocation solution
(e.g., xijk = 0 or 1) was required for each time epoch, k in order for the algorithm to
terminate.
Note that we omit E-B&C for large size scenarios in our results due to an exponential
increase in computation times. Therefore, for E-B&C, we compute the solution for
scenarios involving only up to 5 assets and 10 cases.
84
Table 3.4
Sensor-to-Target Sweepwidth (nm)
Using the aforementioned values for the parameters, we ran the simulation for all the
approaches to schedule the ten specified assets over the 72 h planning horizon. Tables
3.5-3.7 show the cumulative probability of successful detection (CP oSD) for the
GB&C-II method for objectives O1, O2 and O3, respectively. Parallel GB&C-II has
the same result as the sequential GB&C-II. Therefore, we omit the Parallel GB&C-II
from the quality comparison.
We refer to Tables 3.5-3.7 as COA matrices [160]. The COA matrices aid the DM
in understanding the reasoning behind the algorithm’s behavior and its output by
giving metrics both for individual asset-case pairs, as well as overall, the probability
an asset detects at least one case (P DC) and the probability that a case is detected by
85
Table 3.5
Objective O1: Maximize Weight of Contraband Detected
Asset Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7 Case 8 Case 9 Case 10 P DC
1 - - - - - - - 0.76 - - 0.76
2 - - - - - - 0.43 0.29 - - 0.60
3 0.09 - - - - - 0.25 0.29 - - 052
4 0.15 - - - - 0.28 - - - 0.14 0.47
5 0.21 0.14 - - - - - - - - 0.33
6 - - - - - - - 0.61 - - 0.61
7 - - - - - - 0.40 - - - 0.40
8 - - 0.09 - - - 0.21 - - - 0.28
9 0.17 - - - - 0.30 - - - - 0.42
10 - - - 0.20 - 0.11 - - - 0.06 0.33
a b
P DA 0.49 0.14 0.09 0.20 - 0.55 0.80 0.95 - 0.19 , , ,c
a
Expected weight of contraband disrupted: 7,828 kg
b
Expected number of detections: 3.41
c
Expected number of smugglers: 8.21
Table 3.6
Objective O2: Maximize Number of Detections
Asset Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7 Case 8 Case 9 Case 10 P DC
1 - - - - - - - 0.76 - - 0.76
2 - 0.27 - - - 0.38 - - - 0.15 0.61
3 0.20 - - - - - 0.28 - - - 0.43
4 0.15 - - - - - - 0.10 - 0.15 0.35
5 0.19 - - 0.17 - - - - - - 0.33
6 - - - 0.13 - - - - - 0.20 0.31
7 - 0.29 - - - - - 0.37 - - 0.55
8 0.13 - - - - - - - 0.23 - 0.33
9 - - 0.10 - 0.18 - - - - - 0.25
10 0.17 - - - - - - - - 0.05 0.21
a b
P DA 0.61 0.48 0.10 0.28 0.18 0.38 0.28 0.86 0.23 0.46 , , ,c
a
Expected weight of contraband detected: 6,619 kg
b
Expected number of detections: 3.85
c
Expected number of smugglers: 8.67
86
Table 3.7
Objective O3: Maximize The Number of Smugglers Detected
Asset Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7 Case 8 Case 9 Case 10 P DC
1 - - - - - 0.21 - 0.61 - - 0.69
2 - - - - - 0.34 0.26 - - 0.15 0.59
3 0.09 0.23 - - - - 0.25 - - - 0.48
4 0.11 - - 0.21 - - 0.10 - - - 0.37
5 0.18 - 0.10 - - - - - - - 0.25
6 - - - - - - - 0.58 - 0.15 0.64
7 - 0.27 - - - - - - - 0.11 0.35
8 - - - - 0.18 - - - 0.22 - 0.36
9 0.16 - - - - 0.30 - - - - 0.41
10 - - 0.09 0.16 - - - - - - 0.23
a b
P DA 0.44 0.44 0.17 0.34 0.18 0.63 0.51 0.84 0.22 0.35 , , ,c
a
Expected weight of contraband detected: 7,036 kg
b
Expected number of detections: 4.13
c
Expected number of smugglers: 9.37
at least one asset (P DA). These matrices may be generated to assess the allocation
performance at a particular time epoch, or, as shown in Tables 3.5-3.7, the cumulative
asset allocation performance up to that point in time (in Tables 3.5-3.7, through
K = 72).
Solving with respect to Objective O1 (Table 3.5) resulted in an asset allocation
with the highest expected weight of contraband detected, totaling 7,828 kg of cocaine
compared to Objectives O2 and O3 (Tables 3.6 and 3.7). This implies that we have a
64% success rate of detecting the transport of contraband with respect to the total
possible for the experimental scenario of 12,200 kg of contraband. The asset allocations
with respect to Objective O1 have 15.5% and 10.1% more contraband disrupted when
compared to Objectives O2 and O3, respectively. In Table 3.5, Case 8 has the most
amount of contraband (5000 kg) with a CP oSD = 0.95. Solving with respect to
Objective O3 resulted in the detection of a higher expected weight of contraband
87
(5.9%), expected number of detections (6.8%), and expected number of smugglers
(7.5%) compared to Objective O2. This could be caused by the uniform priority weight
vector used in objective O2.
For the sake of compactness, we omit the COA matrices used in demonstrating the
performance of the other approaches implemented and instead, quantify the goodness
of the allocation by comparing the algorithms with that of GB&C-II algorithm as
measured by the expected weight of the contraband detected, expected number of
detections and the expected number of smugglers detected.
The sums of the totals for each objective for each algorithm are shown in Table 3.8.
Fig. 3.9 shows a normalized representation of the results detailed in Table 3.8, where
the largest possible number of detections and contraband detected was utilized as a
basis for normalization of both metrics, respectively, to compare the expected number
of detections and contraband weight detected. Note that Fig. 3.9 only contains the
results for 1SLADP-I and 1SLADP-II; the detailed solutions of mSLADP with m > 1
are shown later in Section 3.4.3.
We illustrate in Table 3.8 and Fig. 3.9 that all branch-and-cut-based algorithms
optimizing Objective O2 are out-performed by the same algorithms optimizing Objec-
tive O3 in terms of both the expected number of detections and expected number
of smugglers. When comparing GB&C-I and GB&C-II, optimizing with respect to
Objective O2 resulted in 4% less expected number of detections and 1.2% less expected
number of smugglers than when optimizing with respect to Objective O3.
In terms of the amount of contraband detected, using the GB&C-I algorithm
resulted in an allocation that obtained the highest expected amount of contraband
detected when solving for Objective O1 ; however, its solutions for maximizing the
expected number of detections or expected number of smugglers were inferior to the
88
O1
1 O1
O1 O2
of Contraband Detected
O2 O3
0.9
O3
0.85 O2
O2
0.8 GB&C-I
GB&C-II
1SLADP-I
0.75 1SLADP-II
0.7
0.7 0.75 0.8 0.85 0.9 0.95 1
Normalized Expected # of Detections
Figure 3.9: A normalized view comparing the performance of all the algorithms, with
respect to the expected weight of contraband disrupted (O1 ), the expected number of
interdictions (O2 ), the expected number of smugglers (O3 ).
In this section, we use Objective O1 for the scalability studies with respect to the
number of assets. To measure the scalability, we limited the number of assets available
for allocation for the scenario from 1 to 10 aircraft. Figs. 3.10 and 3.11 show the
expected weight of contraband disrupted and the runtimes, respectively. The detailed
89
Table 3.8
Algorithm Comparison
values are in Tables 3.9 and 3.10. In Fig. 3.10 and Table 3.9, we see that approximate
dynamic programming-based algorithms (1SLADP-I, 1SLADP-II, 2SLADP-I and
2SLADP-II) are able to obtain similar amounts of contraband disrupted, differing by
only up to 339.7 kg (4.6%) of contraband.
Similarly, the branch-and-cut-based algorithms (E-B&C, GB&C-I and GB&C-II)
are able to obtain similar amounts of contraband disrupted, differing by only up
to 279.1 kg of contraband among the three. E-B&C, intuitively, outperformed the
other branch-and-cut variations (and all other algorithms for that matter) among the
scenarios simulated until runtime became an issue. GB&C-II is able to obtain a better
90
result compared to GB&C-I when there are 2, 6, or 7 assets available for allocation.
This is due to the nature of the scenario or the characteristics of the smuggler cases.
Since GB&C-I iterates through all available assets, there is a tendency that closer
(with respect to assets’ home base) cases are allocated first, since there is less travel
time and, hence, are more rewarding. In turn, this may limit the options available
to assets considered for allocation in later iterations since cases, previously in close
proximity to their home base may already be allocated and, due to longer travel time,
will be much less rewarding or not at all. Similar problems arose with 1SLADP-I
algorithm, which obtains less expected contraband disrupted compared to 1SLADP-II
algorithm when there are more than 6 assets available for allocation, differing by
up to 314.9 kg of contraband. We are able to minimize the effect of this problem
by applying a 2-step lookahead strategy. 2SLADP-I algorithm obtains less expected
contraband disrupted compared to 2SLADP-II algorithm when there are more than 5
assets available for allocation, differing by up to 197.6 kg of contraband.
As Fig. 3.11 and Table 3.10 show, E-B&C has the slowest runtime. There is a
maximum speed up of 34.8, 120.6, 210.6, 4,794, 6,146, 2,861, 3,711 and an average
speed up of 9.8, 30.9, 53.7, 1,177, 1,542, 809, and 994 when comparing the runtimes
of GB&C-I to GB&C-II, Parallel GB&C-II, 1SLADP-I, 1SLADP-II, 2SLADP-I, and
2SLADP-II, respectively. Over all the asset availability scenarios tested, the average
speed up of GB&C-II, Parallel GB&C-II, 1SLADP-I, 1SLADP-II, 2SLADP-I, and
2SLADP-II are 3.6, 6.1, 87, 143, 52 and 72 times, respectively, faster compared to
GB&C-I.
Our key finding here is that, with a 1.6% sacrifice in optimality on average, GB&C-
II provides a solution nearly identical to that of E-B&C, while offering a solution in
a fraction of the time (up to nearly 210.6 times faster among the simulated results).
91
Expected Weight of Contraband Disrupted (kg)
8,000
7,000 E-B&C
GB&C-I
6,000 GB&C-II
1SLADP-I
5,000 1SLADP-II
2SLADP-I
2SLADP-II
4,000
3,000
1 2 3 4 5 6 7 8 9 10
Total # of Assets
Figure 3.10: The expected weight of contraband disrupted for each algorithm by varying
the number of available assets.
Alternatively, at a cost of 2.5% suboptimality on average, but more than 6,146 times
faster speedup, we can run the 1SLADP-II for a given scenario. Similarly, at a cost
of 2.4% suboptimality on average, 2SLADP-II offers more than 3,711 times faster
speedup.
In general, GB&C-II should be used when the total numbers of assets is less than
3 due to its minimal sacrifice in optimality (on average 1.6%). When the number of
assets is greater than 3, 2SLADP-II should be used.
Here, we vary the number of cases from 1 to 10, while fixing the number of available
assets to 10. Figs. 3.12 and 3.13 show the expected weight of contraband disrupted and
the runtimes, respectively. The detailed values for each figure are in Tables 3.11 and
92
103
E-B&C
102 GB&C-I
GB&C-II
Time (s)
Parallel GB&C-II
101
1SLADP-I
1SLADP-II
100 2SLADP-I
2SLADP-II
10−1
1 2 3 4 5 6 7 8 9 10
Total # of Assets
Figure 3.11: The CPU runtimes for each algorithm by varying the number of available
assets.
Table 3.9
Expected Weight of Contraband Disrupted (kg) for Varying Asset Availability
3.12, respectively. From Fig. 3.12 and Table 3.11, we see that all the algorithms have
very similar solution quality. We see a noticeable increase in contraband disruption for
case 8 (5,000 kg of contraband). All algorithms obtained a similar amount of expected
contraband disrupted.
93
Table 3.10
Simulation Runtime (s) for Varying Asset Availability
Parallel
1SLADP- 1SLADP- 2SLADP- 2SLADP-
# of Assets E-B&C GB&C-I GB&C-II GB&C-
I II I II
II
1 3.23 3.70 3.48 1.86 0.08 0.08 0.09 0.08
2 15.1 10.9 7.54 4.49 0.18 0.18 0.16 0.17
3 69.6 22.8 11.8 6.75 0.38 0.26 0.30 0.29
4 418 45.7 16.7 9.82 0.53 0.36 0.51 0.47
5 2639 75.9 21.9 12.5 0.55 0.43 0.92 0.71
6 - 89.4 23.0 13.9 0.94 0.53 1.58 1.03
7 - 111 25.5 15.1 1.46 0.66 2.91 1.62
8 - 139 30.3 16.4 1.35 0.83 4.91 2.22
9 - 181 31.0 18.32 1.73 0.93 7.75 2.90
10 - 220 34.7 20.9 2.22 0.99 13.0 5.50
Fig. 3.13 and Table 3.12 show the runtimes. As expected, GB&C-I has the slowest
runtimes while the 1SLADP-II algorithm has the fastest runtime (< 1 s). There are
maximum speed ups of 7, 11, 99, 221, 17 and 40 when comparing the runtimes of
GB&C-I to GB&C-II, Parallel GB&C-II, 1SLADP-I, 1SLADP-II, 2SLADP-I, and
2SLADP-II, respectively. On average the speed ups of the GB&C-II, Parallel GB&C-II,
1SLADP-I, 1SLADP-II, 2SLADP-I, and 2SLADP-II algorithms were 4.3, 6, 33, 71.8,
6, and 16.5 times, respectively.
The key point here is that the algorithm 2SLADP-I is very efficient and is recom-
mended for scenarios when the number of cases is less than or equal to the number of
assets, which is often the case.
94
Expected Weight of Contraband Disrupted (kg) 8,000
6,000 GB&C-I
GB&C-II
1SLADP-I
4,000 1SLADP-II
2SLADP-I
2SLADP-II
2,000
1 2 3 4 5 6 7 8 9 10
Total # of Cases
Figure 3.12: The expected weight of contraband disrupted for each algorithm by varying
the number of available cases.
Table 3.11
Expected Weight of Contraband Disrupted (kg) for Varying Case Availability
95
102
GB&C-I
GB&C-II
Time (s)
1 2 3 4 5 6 7 8 9 10
Total # of Cases
Figure 3.13: The CPU runtimes for each algorithm by varying the number of available
cases.
Table 3.12
Simulation Runtime (s) for Varying Case Availability
Parallel
1SLADP- 1SLADP- 2SLADP- 2SLADP-
# of Cases GB&C-I GB&C-II GB&C-
I II I II
II
1 0.85 0.34 1.08 0.68 0.23 1.99 0.86
2 4.09 1.36 4.47 0.66 0.32 3.74 1.27
3 5.88 2.67 1.89 0.85 0.42 4.13 1.58
4 11.8 3.93 3.06 1.00 0.54 5.24 1.91
5 18.0 4.77 4.01 1.48 0.57 5.51 2.16
6 24.7 7.48 5.31 1.34 0.64 6.95 2.38
7 56.0 11.7 7.27 1.45 0.72 7.79 2.55
8 108 13.2 8.56 1.67 0.80 9.45 3.13
9 183 15.9 9.85 1.92 0.89 11.9 4.00
10 220 34.7 20.9 2.22 0.99 13.0 5.50
96
scenario. Sampling from the PoA surfaces, we obtained waypoints for each smuggler at
each time epoch and joined them together to obtain a full path. From these paths, we
measured whether the smuggler traversed through any allocated patrol boxes during
the allocated search time, and if so, what was the aircraft’s probability of detecting
the target during those time epoch(s). Table 3.13 shows the detailed performance
statistics for each algorithm over the 100,000 Monte Carlo simulations. A useful metric
to measure an algorithm’s goodness is that of nominal-the-best signal-to-noise ratio
(SN R) [138], that is,
µ2
SN R = 10 log10 (3.33)
σ2
97
Table 3.13
Monte Carlo Analysis (from 100,000 runs)
Standard Deviation of
Mean of Contraband
Objective Contraband Detected SNR (dB)
Detected (µ) in kg
(σ) in kg
GB&C-I 7,616 246.3 29.8
GB&C-II 7,632 252.5 29.6
1SLADP-I 7,645 244.2 29.9
1SLADP-II 7,610 218.4 30.8
2SLADP-I 7,648 240.3 30.1
2SLADP-II 7,612 203.3 31.5
98
Chapter 4
4.1 Introduction
4.1.1 Motivation
Navy planners strive to optimize ship routes with respect to multiple objectives, e.g.,
fuel efficiency, time, distance, safety, etc. When the task is one of trying to optimize
multiple objectives, humans are notoriously poor at decision making, especially if
the task is dynamic and has inherent uncertainty [2, 25, 99, 162]. Consequently,
decision support tools are needed to collaboratively optimize routes by evaluating
and recommending multiple courses of action (COAs) from which a navy planner
can select one. To support such mixed-initiative planning, the tool(s) must aid the
99
human planner to create COAs and evaluate his or her own plan against optimized
ones, or to combine both human expectation of the forecast, geographic hazards, and
possible uncertainty with the automated algorithm output for hybrid human-machine
consensus on what routes to consider for one or more shipping vessels (as in the case
of aircraft strike group path routing). Although the problem is formulated for ship
routing, the path planning algorithm is applicable to unmanned aerial vehicle and
helicopter routing, among other Navy missions.
The scope of this chapter is limited to that of many-objective ship routing problems
in uncertain environments, where many-objective refers to 15 or more, to be simulta-
neously optimized and traded-off. Such problems are rarely undertaken in practice
due to the computational complexity. Motivated by carrier strike group missions and
fuel cost optimization, which also falls under the category of many-objective ship
routing, we treat different weather parameters as individual objectives to be optimized
(e.g., minimize relative wind speed, current speed, etc.). Each such weather-based
objectives involves varying spatio-temporal uncertainty over a finite horizon with some
correlation. Thus, our problem can be succinctly stated as follows: Given a graph
(e.g., as in [158]), a departure point, and a destination point, find a representative set
of Pareto optimal shortest paths in a reasonable amount of time to optimize as many
as 15 or more objectives.
The ship routing problem falls under the rubric of a multi-objective shortest
path problem under uncertainty with time windows, speed and bearing as additional
control variables, that is, with time-varying stochastic and non-convex costs at nodes
and along arcs in the network, the evolution of which is similar to the timescale of
the ship’s transit. What makes the problem intractable is that arc costs are time-
dependent, non-convex, and many-dimensioned. The complexity of the problem space
100
and concurrent constraints renders the majority of provably optimal multi-objective
approaches unusable, thus necessitating the need for tools that allow for navigational
planning and replanning that consider both economic and practical needs of naval
and commercial shipping [158].
The single-objective shortest path problem is widely studied in the literature and was
researched extensively, for example [15, 43, 49, 70, 100, 122, 156], during the mid-1950s
to late-1960s. By optimizing the path with two or more objectives that are usually
in conflict, we obtain the multi-objective path planning problem, which, is a key
component of the ship routing problem considered here. Solution approaches to the
multi-objective path planning in the literature primarily fall into one of two categories:
1) Generation methods, 2) Conversion to a single-objective shortest path problem.
Generation method refers to the direct generation of the Pareto front by solving
the shortest path problem. In this vein, Hansen [69] first examined the case of two
objectives and the concomitant computational complexity of the problem. Based on
Hansen’s work, Henig [75] proposed a dynamic programming approach, where perfor-
mance improvements were obtained when the arc costs are quasiconcave/quasiconvex.
Kostreva and Wiecek [94] proposed a generalized dynamic programming approach
(both backward and forward) to obtain multi-objective shortest paths on networks
with (known) time-dependent arc costs.
Aside from these approaches, a majority of the research around multi-objective
shortest path problems are of the label setting (e.g., [109]) or label correcting variety.
Martins’ algorithm [109] is a label setting algorithm in the spirit of Dijkstra’s shortest
101
path algorithm, but in lieu of a single cost, a label with multiple entries, corresponding
to each of many objective costs, is set on each vertex. Stewart and White extended
the A* algorithm to a multi-objective variant (MOA*) [165], where they devised
an intelligent method to select nodes to expand as part of their algorithm. Most
recently, Mandow proposed a new approach to multi-objective A* (NAMOA*), where
the algorithm smartly expands selected paths using various heuristic evaluation
functions [106, 107]. We use this method as a reference method for comparison with
the NAPO algorithm of this chapter.
Generation of the full Pareto-frontier suffers from rapidly increasing computation
time and storage due to the NP-hard nature of the multi-objective shortest path
problem [63,154]. Therefore, approximation methods for rapidly generating the Pareto-
frontier are desired to make them practical in real-world applications. Warburton [178]
approximated the Pareto-frontier and bounded the problem complexity to polynomial
time by introducing the -nondominated scaling procedure. Based on Warburon’s
work, Hassin presented two alternative approximation algorithms with fully polynomial
complexity [71]. Even these modifications can result in intractable computational
complexity.
The second approach that bypasses direct generation is the second aforementioned
category of multi-objective path planning, which involves converting the multi-objective
shortest path problem into a single objective shortest path problem either through a
utility function [120, 151] or objective weights based on user preference [38, 62]. This
method may be the fastest in term of computation runtime; however, these methods
often significantly reduce the Pareto-optimality of the resulting Pareto front, that is,
the number of distinct routing options available to the navigator can be substantially
smaller than what can be obtained.
102
In this chapter, we present a fast approximate method for the Pareto-front genera-
tion utilizing a combination of approximate dynamic programming (ADP) techniques
(i.e., one-step lookahead, rollout) and clustering techniques (i.e., Gaussian mixture
model (GMM), silhouette score). This combination substantially reduces the computa-
tion time, while enlarging the number of distinct courses of action that the navigator
can choose from.
In this chapter, accurate short and medium range weather predictions are used in
conjunction with ship models (e.g., how a ship’s speed may be impacted by the
expected wind, wave, and current conditions). Commercial ship voyage planning
modules are used to calculate the impacts of weather, ship’s hull form, cargo, and
(power) plant characteristics on fuel costs. Broad categories of impacting weather
include, but are not limited to, winds, waves, and currents. Among wind features, wind
103
speed and direction are used in impact calculation. Among wave features, amplitude,
period, and direction are considered. Additionally, current direction and speed are
used in ship impact calculations. Environmental parameters are forecast by multiple
models [13, 170, 181].
Besides weather, bathymetry data is crucial in calculating the optimized ship
routes. Bathymetry, extracted from the Oceanographic Atmospheric Master Library
(NRL-MRY, NRL-Stennis Space Center), is divided into four categories: 1) shallow
water, 2) water deeper than a certain threshold (in this chapter, twelve feet), 3) land,
and 4) unknown. From this data, TMPLAR extracts possible safe paths that do
not cross over land, and in alignment with the navigator’s desire to allow travel over
shallow water or not, and routes ships in a mixed-initiative decision making cycle.
A high priority concern for navigators is to save on fuel to reduce expenditures, while
also increasing operational endurance and asset availability [41]. Fuel cost calculation
is complex and involves the ship’s hydrodynamics, nonlinear combinations of model
parameters, and one or more exogenous variables. TMPLAR utilizes the Smart Voyage
Planning Decision Aid (SVPDA), explained in detail in [117], for fuel consumption
calculation. Input parameters include swell heights and periods, surface wind speeds
and directions, wave directions, heights and periods, and current speeds and directions.
In high fidelity fuel consumption models, relative wind and sea-state calculations have
a direct impact on ship speed depending on the ship’s bearing, e.g., the wind and
current may aid a ship along its course if its bearing is the same; however, if the ship
is against the wind/current direction, it will have to expend much more fuel to get to
104
its destination in time. Fuel consumption is thus highly sensitive to the ship’s speed
(both when traveling at slow and fast speeds) [37].
The overall power needed to maintain a speed from one node to the next is
calculated as in (4.1).
where PT otal represents the total power required, and PCW represents the power
required to navigate at the specified ship speed in calm water and current. Here, PSea ,
PSwell , and PW ind represent the additional resistances due to the sea, swell, and wind
components, respectively. PCW is dependent on the relative direction and velocity of
the current with respect to the ship; PSea , PSwell , and PW ind are similarly so, using
direction, height, and period of the sea and swell(s), respectively, for the former
two components, and speed and direction for the latter. SVPDA even considers the
ambient air temperature to incorporate the effect of temperature on HVAC (heating,
ventilation, and air conditioning) loads. Details on ship’s power calculation can be
found in [37, 61, 83, 103].
In conjunction with navigator’s input, the problem space is set up as follows. The
coordinates of a departure and destination and a Great Circle route is constructed
between the two (assuming no land obstacles prevent doing so). Using the Great
Circle1 route as a basis, a specified number of “stages” are inserted equi-spaced along
1
A Great Circle route is one that is the shortest distance between two points lying on the surface
of a sphere, often used in ship navigation and air travel. It is also known as the geodesic distance.
105
the track. Additionally, within each stage, a specifiable number of nodes are inserted
at a predefined distance cross track (perpendicular) on each side of the Great Circle
route. In this manner, a navigator-definable multi-stage grid is constructed. The start
and end stages consist of one node each and the specifiable number of nodes (including
the node associated with the Great Circle route) is added to each stage. The resulting
grid system is a trellis with a finite number of stages, wherein each node in one stage
is connected to all the nodes in the next. This provides a grid system for finding the
Pareto optimal paths between the source and the destination.
Ship safety is the highest priority among all objectives the navigator considers
when routing ships. When searching for viable paths (arcs) to traverse between stages,
bathymetry and weather conditions are checked both at the node locations and along
the connecting arcs. Doing so reduces the problem space further, eliminating options
that should not be explored due to infeasibility or safety concerns. Due to the severity
of the consequences if a ship is not routed safely, hard thresholds/constraints are
enforced. If any threshold is exceeded, the location is removed from consideration,
e.g., if the wave height exceeds a threshold specified by the planner, the corresponding
node and/or arc is removed from the problem space. The safety constraint is thus
modeled as a Heaviside function, where there are two types of nodes/arcs: those that
are passable by the ship and those that are impassable.
106
4.3 Problem Formulation
Adapting the formulation from [158], let G = (N, E). We write the (forward, which is
feasible due to a deterministic forecast) dynamic programming equation as follows.
4.3.1.1 States
Let xs be the two-dimensional state which consists of the node ns at stage s and the
arrival time τs . Note that the arrival time at a node is the same as the departure
time at a node. That is, generally for the current stage s, the state can be written as
follows.
xs = (x1,s , x2,s ) = (ns , τs ) (4.2)
4.3.1.2 Controls
Let S be the required number of nodes in a path connecting the origin and the
destination. The control variables at stage s are: 1) which node ns+1 to traverse to;
2) the power plant configuration ρs needed to efficiently traverse to the next stage
s + 1 departing at τs . Control variables determine the speed and bearing of the ship
at node ns . That is,
us = (u1,s , u2,s ) = (ns+1 , ρs ) (4.3)
107
4.3.1.3 Transition Dynamics
where
where b is the transit time from node ns at stage s such that we arrive at node ns+1
at time τs+1 in stage s + 1. Note that τs+1 ∈ (0, T ] is an integer multiple of a time
resolution ∆ > 0 and T ≥ ∆ is a given integer denoting the maximum amount of time
specified to transit the route. We assume transit time to be nonnegative.
We denote the d-dimensional cost to traverse arc hx1,s , x1,s+1 i as c(xs , us ), where d is
the total number of objectives to consider, and ci (xs , us ) as the cost pertaining to a
particular objective i ∈ {1, 2, . . . , d}. The costs ci (xs , us ), i ∈ {1, . . . , d} are assumed
to be nonnegative. The power plant configuration u2,s is used for both fuel efficiency
and to achieve certain top speeds, given weather impacts (i.e., it is a proxy for time of
arrival τs+1 ).
We now define the shortest path (with respect to objective i) to the from the start
i∗
node n1 to a node ns+1 in stage s + 1 ≤ S, s 6= 1 as Js+1 (xs+1 ). The cost is found by
108
solving (4.7).
i∗
(xs+1 ) = min Jsi∗ (xs ) + ci (xs , us )
Js+1 (4.7)
us
u2,s ∈ P (4.9)
where P is the set of allowable power plant configurations, and Na and Ea are the
set of safe nodes and arcs, respectively (i.e., for a given a ship class, those nodes/arcs
whose bathymetry is of a certain depth or greater). The recursion is initiated with
the initial condition J1 (x1 ) = 0. Our constraints include arriving by a time τs+1 that
satisfies (4.8), choosing an allowable plant configuration (4.9), and only traversing
along feasible safe nodes and edges in the network (4.10)–(4.11).
Although constraint (4.8) results in a large problem space, we show later that, by
exploiting time discretization and earliest possible arrival/latest allowable departure
times via forward-backward Dijkstra algorithm, we can significantly reduce the required
computation.
To solve the multiple objective problem, we must use a labeling algorithm and find the
set of Pareto optimal labels (solutions) for stage s. Let label g s,`∗ (xs ), ` = 1, . . . , L,
of L Pareto optimal labels in stage s, be as in (4.12).
109
where each label g s,`∗ has cardinality equal to the number of objectives d.
Adapting the multi-objective notation from [106], a set of one or more labels will
be stored at each node. In general, a cost vector g ` is said to dominate g `0 , denoted as
g ` ≺ g `0 , if and only if
0
g ` ≺ g `0 ⇐⇒ ∀i J i ≤ J i ∧ g ` 6= g `0 (4.13)
where J i denotes the ith element of the label (or cost vector) g ` . Then, given a set of
vectors, denoted by F , we shall define nondom(F ) as the set of nondominated vectors
in set F , that is,
n o
nondom(F ) = g `∗ ∈ F | @g `0 ∈ F, g `0 ≺ g `∗ (4.14)
(i.e., there does not exist a cost vector g `0 that dominates g `∗ in label set F ). Then, by
augmenting the control vector to include ` as the third control variable, and letting
Fs+1 (xs+1 ) be the nondominated label set over state xs+1 , we can rewrite (4.7) as
n o
Fs+1 (xs+1 ) = nondom g s,`∗ (xs ) + c (xs , us ) (4.15)
us
subject to (4.8)–(4.11).
We propose a 1-Step Lookahead (1SL) combined with rollout strategy to solve the
dynamic programming problem in (4.15). To further reduce the problem complexity,
110
we utilize a Gaussian mixture model (GMM), along with the silhouette score, to
cluster the potential nondominated paths. A subset of the paths are then selected to
be the representative paths for the next stage.
Without loss of generality, we limit the time window to take on only integer multiples
of the time resolution ∆ > 0. Let T(ns , ns+1 ) be the minimum time to traverse
from ns to ns+1 . To incorporate deadlines and feasibility, the problem is then to find
optimal times of arrival and departure at each node subject to (4.8)–(4.11), while also
satisfying (4.16), which corresponds to enforcing an earliest possible time of arrival at
each node and a latest allowable departure from each node to reach the destination
nS by time T .
T(n1 , ns ) ≤ τs ≤ T − T(ns , nS ) , (4.16)
The upper and lower bounds in (4.16) can be computed via Dijkstra’s algorithm.
Thus, the time window at nS is [dT(n1 , nS )/∆e ∗ ∆, bT /∆c ∗ ∆], where d·e and
b·c denote the ceiling and floor functions, respectively. Instituting time windows at
each node in the network, τs is constrained as,
That is, we may only consider discretized times that fall within the time window, at
time intervals of arbitrary length ∆, for each node when deciding the time to depart
the previous node and the time to arrive at the next, given the latest allowable time
of arrival, T , at nS . Thus, the time window on the destination propagates through
111
the network.
To solve (4.15), let us write the approximate dynamic programming equation as,
In other words, the heuristic cost-to-go from stage s + 1 consists of the cost to travel
to the next stage’s optimal node and assumes optimal traversal for the remainder of
the path to the terminal node.
112
4.4.3 Gaussian Mixture Model and Silhouette Score to Re-
duce Problem Space
To further reduce the problem complexity, we cluster the non-dominated paths and
select a subset of paths for further expansion. Let Ls be the Pareto optimal set of
labels that collects all non-dominated labels at stage s by iterating over set Fs (xs ) for
all nodes n ∈ Ns , where Ns denotes the set of nodes in stage s, and for all feasible
departure times τs at stage s,
that is, the set of non-dominated states available at stage s. We then cluster the
non-dominated paths associated with labels in Ls into 2 to a maximum number
K (K is a hyper parameter) clusters via Gaussian mixture models (GMMs). The
number of clusters is determined by the silhouette score, which measures the quality
of GMM clusters. Rouseeuw [150] proposes that each cluster have a corresponding
silhouette value that indicates which data points lie well within a cluster and those
that are between clusters. The average silhouette score for each solution represents
the clustering solution’s validity and can serve as the basis for selecting the number of
clusters that best fits the data points when utilizing a GMM clustering algorithm.
For each non-dominated path in Ls with the corresponding label g s`∗ (xs ), let z(`)
be the average dissimilarity of path ` with all other path within some cluster k. This
can be any of several distance metrics, but for simplicity, let z(`) be the average
Euclidean distance from path ` to any other path within the same cluster. Let y(`)
denote the lowest average dissimilarity amongst all clusters. As with z(`), the distance
from path ` to all other paths not within the same cluster are calculated and the
113
lowest distance is then selected. Essentially, z(`) signifies how well a point is assigned
to a cluster and y(`) signifies which cluster path ` best fits in. The silhouette score,
w(`), for path `, is then defined as
y(`) − z(`)
w(`) = (4.23)
max {z(`), y(`)}
It is evident from (4.23) that −1 ≤ w(`) ≤ 1. If the silhouette score w(`) is 0, this
implies that path ` is on the border between two clusters. If w(`) is close to 1, path `
is badly matched to clusters other than the one it has already been assigned to and is
considered to be appropriately labeled; however, if w(`) is close to -1, it implies that
path ` should have been labeled as belonging to some cluster other than the one it
is currently assigned to. Utilizing the average silhouette value, given in (4.24), it is
possible to interpret the “goodness” of the GMM cluster scheme by aggregating all
the silhouette values for each label and taking the average.
L
1 X
w(`) (4.24)
L `=1
Other metrics that may suit a given problem space include geometric or harmonic
averages.
For each stage, we cluster the non-dominated paths with the total number of
clusters ranging from 2 to K and use the number of clusters with the highest average
silhouette value. Then, we pick the top m number of paths from each cluster for the
next stage’s calculations, where the top here refers to the highest silhouette scores.
Additionally, we also consider λ stages of freedom, that is, we do not perform
the clustering technique when the algorithm is λ stages from the destination node.
114
Alternately worded, only stages 1 to S − λ result in the invocation of GMM clustering
with silhouette score calculations. This is applied to increase the number of Pareto-
optimal labels set at the destination.
The test scenario used a time resolution of ∆ = 15 minutes with a specified departure
time of midnight on August 1, 2017 at Roanoke Island, North Carolina, USA (36.0◦ N,
75.0◦ W), and a deadline of midnight on August 3, 2017 at Rock Sound, The Bahamas
(25.0◦ N, 76.0◦ W). A trellis was constructed between the start and destination, such
that there were 6 equi-spaced stages between the origin and destination, and with
a Great Circle route connecting through the middle. Five nodes were inserted to
the west and east of the Great Circle track, per stage, at a distance of 20 nautical
miles. In Fig. 4.1, the grid used for our analyses is illustrated. The test scenario
used forecast information available up to the departure time and coincided with the
impacts and aftereffects of Tropical Storm Emily, a rapidly-forming storm system
that passed through the Florida panhandle into the western Atlantic Ocean. The
comparison and analyses in this section were performed on an Intel Corporation Xeon
E3-1200 v5/E3-1500 v5/6th Gen Core Processor with 32 GB RAM.
The graph was generated and post-processed to remove nodes and arcs deemed
unsafe (in the form of passable or impassable nodes/arcs). We tested the NAPO
algorithm on up to 15 objectives and Table 4.1 shows the single-objective minimum
cost route computed using A* with the corresponding time for each objective. Table
115
Figure 4.1: Graph network with start and destination and the connecting edges, starting
at South Carolina and directed towards The Bahamas.
4.1 also shows that some objectives require more runtime compared to others. This is
likely due to the nature of the data, how it is stored, the range of values associated
with the objective (e.g., there are physical limits in meteorology on how high sea
height can be, but distance is an unbounded metric since it is always possible to add
more waypoints to a graph).
We set λ = 2, K = 10, and m = 3, that is, we do not perform the clustering
technique for the last two stages of the graph, the silhouette score is evaluated when
attempting to cluster the label set into 2 to 10 clusters, and we select 3 nodes from
the k clusters with the best silhouette score. Note that smaller K significantly reduces
116
Table 4.1
Single Objective Cost and Simulation Runtime
We ran the test scenario with both NAMOA* and the new approximate dynamic
programming-based Pareto optimization (NAPO) method proposed in the chapter.
Figure 4.2 shows the Pareto-optimal costs obtained by both algorithms. To compare
the two Pareto-fronts, let us construct a vector consisting of all the single objective
minimum costs as reference costs, that is, µ = [68.7, 7.11]0 for two objectives. Then,
we measure the percentage difference between each Pareto-point solution cost and the
117
10
NAMOA*
NAPO
9.5
8.5
7.5
7
65 70 75 80 85 90 95 100 105
Fuel Consumption (kgal)
The average for NAMOA* are 0.16 and 0.16, for objectives 1 and 2, respectively.
For NAPO, the average are 0.27 and 0.15 for objectives 1 and 2, respectively. The
88 Pareto-points found via NAMOA* were reduced down to 36 Pareto-points with a
relatively small sacrifice in optimality via the proposed NAPO algorithm. We perform
the same test for 3 objectives, where the average for NAMOA* is 0.23, 0.15 and
2.49, for objectives 1, 2 and 3, respectively. For NAPO, the average is 0.22, 0.19 and
5.04 for objectives 1, 2 and 3, respectively. NAMOA* output 608 Pareto-points for
the 3-objective planning problem, while the proposed NAPO algorithm terminated
with 42 Pareto-points, with minimal sacrifice in optimality.
118
Table 4.2
Total Number of Pareto-points and Simulation Runtime Comparison for 2 to 3
Objectives
To gauge the scalability of the NAPO algorithm, we varied the number of objectives
from 2 to 15. A comparison of the number of Pareto-solutions and the simulation
runtimes between the two algorithms is shown in Table 4.2. Note that the we
were unable to obtain solutions from NAMOA* for more than 3 objectives due to the
prohibitive and exponential simulation runtime of the algorithm. We list the remaining
results for the NAPO algorithm in Table 4.3, when the number of objectives was varied
from 3 to 15. Overall, the proposed NAPO algorithm’s runtime was strongly correlated
with the number of GMM clusters. From Table 4.3, we were able to approximate a
Pareto front for the complex 15-objective ship routing problem around the time of
Tropical Storm Emily in 15 minutes, as shown in Table 4.3, where the Pareto-front
comprised 866 solutions. A visual of the runtimes between the two algorithms is
illustrated in Fig. 4.3, where no times were obtained for NAMOA* after 3 objectives
due to its intractability in the context of the ship routing problem. The runtime of
the NAPO algorithm was approximately sub-log linear, while NAMOA*’s runtime
was definitively exponential.
119
105 NAMOA*
NAPO
103
2 4 6 8 10 12 14
# of Objectives
Figure 4.3: Simulation runtime for total number of objective varies from 2 to 15.
Table 4.3
NAPO Algorithm Total Number of Pareto-points and Simulation Runtime for 4 to
15 Objectives
# of Total # of Simulation
Objectives Pareto-points Runtime (s)
4 54 341.5
5 51 411.8
6 282 503.1
7 221 563.0
8 120 618.0
9 664 666.1
10 500 686.1
11 173 710.3
12 241 735.9
13 114 810.7
14 843 857.2
15 866 902.4
120
Chapter 5
5.1 Introduction
The Kalman filter (KF) [85] is the optimal state estimator for linear dynamic systems
driven by Gaussian white noise with measurements corrupted by Gaussian white
noise 1 . In the classical design of a Kalman filter, the noise covariance matrices are
assumed known and they, along with the system dynamics, determine the achievable
filter’s accuracy. However, in many practical situations, including noisy feature data
in machine learning, the statistics of the noise covariances are often unknown or only
1
The KF is also the best linear estimation algorithm when the noises are non-Gaussian with
known covariances [11].
121
partially known. Thus, noise identification is an essential part of adaptive filtering.
Adaptive filtering has numerous applications in engineering [121], machine learning [36],
econometrics [44], weather forecasting [42, 79, 115, 149], to name a few.
We were motivated by the following learning problem: Given a vector time series
and a library of models of system dynamics for the data (e.g., a Wiener process,
a white noise acceleration model, also called nearly constant velocity model, or a
white noise jerk model, also called nearly constant acceleration model), find a suitable
process noise and measurement noise model and the best system dynamics for the
time series. The problem we consider in this chapter is limited to finding a suitable
process noise and measurement noise covariance for a given dynamic model.
The approaches for estimating the noise covariance matrices for a Kalman filter can be
broadly classified into four general categories: Bayesian inference, maximum likelihood
estimation, covariance-matching, and correlation methods. The first two categories
pose the noise covariance estimation problem as a parameter estimation problem.
In the Bayesian inference approach [77], the covariance estimation problem is
solved by obtaining the posterior probability density function (pdf) of the unknown
parameters (in this case, the noise covariance matrix elements) from their prior pdf and
the observed measurements using the Bayes’ formula recursively. In 2013, Matisko and
Havlena [110] proposed a new Bayesian method to estimate the unknown covariance
matrices. They first use a Monte Carlo method to generate a grid of possible unknown
covariance matrix pairs (Q, R) with more density near the highest prior probability.
Then, they compute the likelihood and posterior probability after performing state
122
estimation for each pair using a Kalman filter. In general, the Bayesian approach
suffers from the curse of dimensionality and is computationally intractable due to the
fact that it involves numerical integration or Monte Carlo simulations over a very
large parameter space.
In maximum likelihood estimation [88, 164], the noise statistics are obtained by
maximizing the probability density function of the measurement residuals generated
by the filter, which is the likelihood of the filter parameters [11]. These filter-based
maximum likelihood methods require nonlinear programming based optimization and
are computationally intractable. Shumway and Stoffer [157] utilize the expectation
maximization (EM) algorithm [48], which requires the smoothed estimates of the
system state. This approach starts with the smoothed estimation of the system state
given an estimate of the initial state and noise covariance matrices. Then, the unknown
parameters are estimated via maximum likelihood estimation using the smoothed state
estimates obtained from the expectation step. Later, Ghahramani and Hinton [60]
present an extension of [157] that can account for an unknown observation matrix in
linear dynamic systems. They then go on to use forward and backward recursions to
estimate the noise covariance matrices. This process is repeated until the estimated
parameters converge. In addition to computational complexity, this method suffers
from convergence to a local optimum.
The basic idea of the covariance-matching techniques [125] is that the sample
covariance of the innovations should be consistent with its theoretical value. In [125],
the unknown noise covariances are estimated from the sample covariance computed
from the innovation sequences accumulated over the entire historical data (or in a
moving time window). In this method, if the estimated innovation covariance value is
much larger than the theoretical value, then the process noise covariance is increased.
123
The convergence has never been proved for this method.
With regard to correlation methods, Heffes [74] derived an expression for the
covariance of the state error and of the innovations of any suboptimal filter as a
function of noise covariances. This expression serves as a fundamental building
block in the correlation methods. The first innovation-based technique to estimate
the optimal Kalman filter gain and the unknown noise covariance matrices via the
correlations of innovations from an arbitrary initial stabilizing filter gain was introduced
by Mehra [113]. Another procedure to carry out the identification of unknown optimal
Kalman filter gain and the noise covariance matrices is by Carew and Bélanger [31].
Their strategy calculates the Kalman filter gain based on the estimation error that
is defined as the discrepancy between the optimal state estimates obtained from
the optimal Kalman filter gain and the state estimates obtained from an arbitrary
suboptimal Kalman filter gain. There is a question as to whether the correlation
method is sensitive to the initial Kalman filter gain selection. Mehra suggested to
repeat the noise covariance estimation steps with the obtained gain from the first
attempt to improve the estimation. However, Carew and Bélanger [31] claim that if
the optimal Kalman filter gain is used as the initial condition, then the approximations
in Mehra’s approach are such that the correctness of the optimal gain will not be
confirmed.
Later, Neethling and Young [126] suggested to combine the noise covariance
matrices in a vector and solve a single least squares or weighted least squares problem
to improve the performance of Mehra and Carew–Bélanger’s approaches. In 2006,
Odelson et al. [131,132] developed the autocovariance least squares method to estimate
the noise covariance matrices by applying the suggestions of [126] on Mehra’s approach
and using the Kronecker operator. The algorithm defines a multistep autocovariance
124
function between the measurements, which is used to develop a linear least squares
formulation to estimate the noise covariance matrices. Dunı́k et al. [52] compared the
method presented by Odelson, Rajamani, and Rawlings [132] to a combined state and
parameter estimation approach.
An interesting variant of the correlation methods is to utilize the output correlations.
In 1972, Mehra [114] proposed an output correlation technique to directly estimate
the optimal Kalman filter gain. This method has the advantage of being non-recursive
compared to the innovation correlation techniques. However, the poor estimates of
sample output correlation functions can lead to an ill-conditioned Riccati equation.
The contributions of the present chapter are as follows:
3. Several novel approaches to estimate the unknown noise covariance matrix R are
derived via utilization of the post-fit residual, which has not yet been discussed
in the literature.
125
4. Convergence proofs in [31] assumed that time averages are the same as ensemble
averages. This is only approximate with finite data. Consequently, these methods
either diverge or result in largely inaccurate estimates of unknown covariances.
126
5.2 Plant and Measurement Model for the Kalman
Filter
The notation used in the remainder of this chapter is listed in Table 5.1.
Consider the discrete-time linear dynamic system
2
Detectability and stabilizability are all that are needed for a stable Kalman filter (i.e., state
observability is not needed).
127
Table 5.1
Summary of Notation
128
next time instant k + 1 as
where the estimate x̂(k + 1|k) is the one-step extrapolated estimate of the state vector
x(k) based on the measurements up to k, W (k), k = 1, . . . , N is the sequence of
Kalman filter gains, ν(k), k = 1, . . . , N is the innovation sequence, P (k + 1|k) is the
state prediction covariance, S(k + 1) is the measurement prediction (or innovation)
covariance, and P (k + 1|k + 1) is the updated state error covariance.
The six-step approach in this chapter is designed specifically for a steady-state
Kalman filter. The steady-state state prediction covariance matrix P̄ satisfies an
algebraic Riccati equation.
The steady-state updated state covariance, denoted as P , can also be computed via
another algebraic Riccati equation (see Appendix A.1).
129
Evidently,
P = P̄ − W SW 0 (5.12)
where (5.13) is known as the Joseph form; W and S are the steady-state optimal gain,
and the steady-state innovation covariance, respectively, and are given by
W = P̄ H 0 S −1
= P̄ H 0 (H P̄ H 0 + R)−1 (5.14)
= P H 0 R−1
S = E[ν(k)ν(k)0 ] = H P̄ H 0 + R (5.15)
Note that (Inx − W H) is invertible, but need not be stable (i.e., eigenvalues need not
be inside the unit circle).
One major issue in the previous literature involves the necessary conditions to estimate
the unknown covariance matrices. Mehra [113] claimed that the system must be
observable and controllable; however, Odelson [132] provided a counter-example
wherein the system was observable and controllable, but the full Q matrix was not
estimable. Following the ideas in [168], we prove that the necessary and sufficient
condition (as detailed in Appendix A.2) to estimate the unknown covariance matrices
in a system is directly related to its minimal polynomial of
130
its stable closed-loop filter matrix F̄ , and a transformation of the innovations based
on the coefficients of the minimal polynomial. Let us define x̃(k + 1|k) to be the
predicted error between the state x(k + 1) and its predicted state x̂(k + 1|k), that is,
where F̄ is defined in (5.16). We can also write ν(k) in terms of x̃, that is
m
X
ai F̄ m−i = 0; a0 = 1 (5.21)
i=0
Note that we apply the minimal polynomial of F̄ to ensure that the innovation in
131
(5.22) is stationary. Let us define ξ(k) as
m
X
ξ(k) = ai ν(k − i) (5.23)
i=0
m
" (m−i−1 ) #
X X
= ai H F̄ m−i−1−j [Γv(k − m + j) − F W w(k − m + j)] + w(k − i)
i=0 j=0
(5.24)
m
" ( m
) #
X X
= ai H F̄ l−i−1 [Γv(k − l) − F W w(k − l)] + w(k − i) (5.25)
i=0 l=i+1
m l−1
! m
X X X
l−i−1
= H ai F̄ [Γv(k − l) − F W w(k − l)] + al w(k − l) (5.26)
l=1 i=0 l=0
Xm m
X
= Bl v(k − l) + Gl w(k − l) (5.27)
l=1 l=0
where Bl and Gl are the sum of two moving average processes driven by the process
noise and the measurement noise, that is,
l−1
!
X
Bl = H ai F̄ l−i−1 Γ (5.28)
i=0
" l−1
! #
X
Gl = al Inz − H ai F̄ l−i−1 FW (5.29)
i=0
G0 = Inz (5.30)
132
we can rewrite (5.31) as
nv X
nv
" m
# nz X
nz
" m #
X X X X
Lj = qlp bi,l b0i−j,p + rlp 0
gi,l gi−j,p (5.32)
l=1 p=1 i=j+1 l=1 p=1 i=j
nv
( l " m
# m
" m
#)
X X X X X
= qlp bi,l b0i−j,p + qlp bi,l b0i−j,p
l=1 p=1 i=j+1 p=l+1 i=j+1
( l " # " #) (5.33)
nz
X X m
X m
X m
X
0 0
+ rlp gi,l gi−j,p + rlp gi,l gi−j,p
l=1 p=1 i=j p=l+1 i=j
nv
( " m
# nv
" m
#)
X X X X
= qll bi,l b0i−j,l + qlp bi,l b0i−j,p + bi,p b0i−j,l
l=1 i=j+1 p=l+1 i=j+1
nz
( " m
# nz
" m
#) (5.34)
X X X X
0 0 0
+ rll gi,l gi−j,p + rlp gi,l gi−j,p + gi,p gi−j,l
l=1 i=j p=l+1 i=j
133
Algorithm 11 Construction of the noise covariance identifiability matrix I
1: for j := 0 : m do
2: r = j ∗ n2z
3: k←0
4: for l := 1 : nv do
5: k ←P k+1
6: b= m 0
i=j+1 [bi,l bi−j,l ]
0
2
7: I(r + 1 : r + nz , k) ← vec(b)
8: for p := l + 1 : nv do
9: k←k+1
0 0 0
10: cj,l,i (p)
Pm= [bi,l bi−j,p + bi,p bi−j,l ]
11: d = i=j+1 cj,l,i (p)
12: I(r + 1 : r + n2z , k) ← vec(d)
13: end for
14: end for
15: for l := 1 : nz do
16: k ←P k+1
17: g= m 0
i=j [gi,l gi−j,p ]
0
The linearity of (5.36) implies the full rank condition on I. Since R is always estimable
because Gm (recall that m is the order of minimal polynomial) is invertible 3 , the
maximum number of unknowns in Q that can be estimated must be less than or equal
to the minimum number of independent measurements minus the number of unknowns
in R. That is
rank(I) − nR > nQ (5.37)
3
See Appendix A.3 for a detailed proof.
134
where nR is the number of unknowns in R, and nQ is the number of unknowns in Q
To illustrate the necessity and sufficiency of this condition, consider an example
system from [132],
0.9 0 0
x(k) = 1 0.9 0 x(k − 1) + v(k − 1) (5.38)
0 0 0.9
0 1 0
z(k) = x(k) + w(k) (5.39)
0 0 1
with Q being a full 3 × 3 positive definite symmetric matrix and R being a full 2 × 2
positive definite symmetric matrix. Since the rank of I is not affected by W (the
observability condition is independent of the filter gain matrix), one can examine the
rank of I for W = 0 for convenience. In this case, the minimal polynomial coefficients
are
0 0
a0 a1 a2 = 1 −1.8 0.81 (5.40)
135
Here, I is a 12 × 9 matrix with a rank of 8. Since there are 9 unknown variables (6
in Q and 3 in R), the covariance matrix elements are not identifiable. However, if
E[v(k)v(k)0 ] is diagonal, as is typically the case, then the covariance matrix elements
are identifiable because there are only 6 unknown variables (full R matrix and three
diagonal elements of Q).
Another example to illustrate the necessity and sufficiency of this condition is to
consider the system
0.1 0 1 0
x(k) = x(k − 1) + v(k) (5.44)
0 0.2 0 2
z(k) = 1 0 x(k) + w(k) (5.45)
with Q being a diagonal 2 × 2 positive definite diagonal matrix and R being a scalar.
Similarly, we examine the rank of I for W = 0 and obtain the minimal polynomial
coefficients,
0 0
a0 a1 a2 = 1 −0.3 0.02 (5.46)
B1 = 1 0 B2 = −0.2 0 (5.47)
136
Here,
1.04 0 1.09
I=
−0.2 0 −0.31
(5.49)
0 0 0.02
has a rank of 2. Since there are 3 unknown variables (2 in Q and 1 in R), the covariance
matrix elements are not identifiable.
Note that the minimal polynomial can be used to estimate the unknown covariances
R and Q via quadratic programming techniques. Furthermore, it can be used to
estimate the optimal gain W , as in [168] and Appendix A.4; however, reliable and
accurate estimation of the parameters of vector moving average processes is still an
unresolved problem [58, 87, 105, 144].
There are two competing approaches for the estimation of the filter parameters W ,
R, Q, and P̄ . The first approach is to estimate the noise covariance matrices first
and subsequently the Kalman filter gain W and the predicted state covariance P̄ are
computed given the estimated noise covariance matrices [110, 164]. This method has
an underlying problem in that it involves the sum of two moving average processes.
Additionally, the autoregressive moving average (ARMA) approach, pioneered in
the econometric literature, does not extend naturally to sums of moving average
processes and we have found the resulting algorithms [58, 87, 105, 144] to have erratic
computational behavior.
The second approach is to estimate the Kalman filter gain W from the measured
137
data first [31, 113]. Given the optimal W , we can compute R, Q and P̄ (this approach
is applied in this chapter). The proposed R, Q and P̄ estimates in this chapter are
valid as long as an optimal gain W is provided. There are many ways to obtain the
optimal Kalman filter gain W . The techniques listed in this chapter to obtain the
optimal W , that is, Section 5.5 and Appendix A.4, are by no means all-inclusive,
and several such methods may be suitable for a given problem. For example, the
optimal gain W can be obtained from the suboptimal Kalman filter residual [35],
solving the minimal polynomial problem [168], utilizing the least squares method on
the observable form [30], and utilizing a second Kalman filter to track the error in the
estimated residual of the first Kalman filter [139], to name a few.
5.5 Estimation of W
This section includes the discussion of two different approaches to estimate the optimal
Kalman filter gain W , namely, the minimal polynomial approach and the successive
approximation, coupled with an adaptive gradient descent scheme, on a criterion based
on innovation correlations. The derivation of the minimal polynomial approach is
detailed in Appendix A.4. This approach assumes the system to be purely driven
by the optimal innovation. In doing so, the estimation of the optimal Kalman gain
can be achieved via a vector auto-regressive model approximation of a vector moving
average process. However, from limited testing on examples chosen in this chapter,
this approach was found to be numerically unstable, only performing well on systems
with no eigenvalues close to unity. In fact, the vector auto-regressive model has various
numerical problems and an accurate and reliable algorithm to obtain the solution still
138
remains to be developed [87]. Therefore, we omit this approach from the chapter and
focus on minimization of the innovation correlations using a successive approximation
and adaptive gradient descent method.
In the sequel, we describe in detail the approach of our chapter using the correlation-
based criterion. If the Kalman filter gain W is not optimal, the innovation sequence
{ν(k)}N
k=1 is correlated. We can use the innovation sequence of any stable suboptimal
−M
NX
1
Ĉ(i) = ν(j)ν(j + i)0 , i = 0, 1, 2, . . . , M − 1 (5.50)
N − M j=1
We know that the optimal Kalman filter gain W makes the autocorrelation function
Ĉ(i), i = 0, 1, 2, . . . , M − 1 vanish for all i 6= 0. Given the correlation matrix for i ≥ 1
as in [113], that is
= H F̄ i−1 F P̄ H 0 − W C(0)
(5.52)
(M −1 )
1 Xh i− 21 h i−1 h i− 12
0
J = tr diag Ĉ(0) Ĉ(i) diag Ĉ(0) Ĉ(i) diag Ĉ(0)
2 i=1
(5.53)
where diag(C) is the Hadamard product of an identity matrix, of same dimension as
C, with C
diag(C) = I C (5.54)
This objective function is selected to minimize the sum of the normalized Ĉ(i) with
139
respect to the corresponding diagonal elements of Ĉ(0) for i > 0. The optimal J
becomes 0 as the sample size N tends to ∞ because the time averages are the same
as ensemble averages given infinite data. Substituting (5.52) into (5.53) and utilizing
the cyclic property of trace, we have
(M −1 )
1 X
J = tr Θ(i)X E2 X 0 (5.55)
2 i=1
where
X = Ψ − W C(0) (5.58)
Ψ = P̄ H 0 (5.59)
1
E = [diag (C(0))]− 2 (5.60)
4
Detailed steps on the gradient computation are provided in Appendix A.5.
140
and X is obtained by rewriting (5.52) as
HF Ĉ(1)
H F̄ F Ĉ(2)
X = (5.63)
.. ..
.
.
H F̄ M −1 F Ĉ(M − 1)
†
HF Ĉ(1)
H F̄ F Ĉ(2)
X= (5.64)
.
..
.
..
M −1
H F̄ F Ĉ(M − 1)
141
5.6 Estimation of R
5.6.1 General R
Given the steady-state optimal gain W and the innovation covariance S, whose
estimation is explained later in Section 5.8, let µ(k), k = 1, . . . , N be the sequence of
post-fit residuals of the Kalman filter, that is,
Proof: On the right hand side of (5.68), the (1,1) block is simply the definition of
the innovation covariance matrix in (5.15). Using (5.67), the (1,2) block in (5.68) is,
given by
= (Inz − HW )S (5.70)
142
Using (5.7) and (5.8),
= S − H P̄ H 0 = R (5.71b)
G = E[µ(k)µ(k)0 ] (5.72)
= R(Inz − HW )0 = R − RW 0 H 0 (5.75)
G = R − HP H 0 (5.76)
Note that by using the Schur determinant identity [19, 175], the determinant of (5.68)
is
S R
= |S||G − RS −1 R|= 0 (5.77)
R R − HP H 0
143
matrix R can be computed in the following five ways:
R1 : R = (Inz − HW )S (5.78)
1
R2 : R = {E[µ(k)ν(k)0 + E[ν(k)µ(k)0 ]} (5.79)
2
R3 : Obtain R from G = RS −1 R (5.80)
1
R4 : R = [G + S − HW SW 0 H 0 ] (5.81)
2
1
G(Inz − W 0 H 0 )−1 + (Inz − HW )−1 G
R5 : R = (5.82)
2
G = R − H P̄ H 0 + H P̄ H 0 S −1 H P̄ H 0 (5.84)
G = R − (S − R) + (S − R)S −1 (S − R) (5.86)
= RS −1 R (5.87)
S = RG−1 R (5.88)
144
Note that (5.87) is a continuous-time algebraic Riccati equation5 . Therefore, we can
estimate R by solving the continuous-time Riccati equation, as in [6], or Kleinman’s
method [90]. Some additional methods to solve the continuous-time algebraic Riccati
equation can be found in [101]. We can also interpret (5.80) in terms of a Linear
Quadratic Regulator (LQR) optimal control problem, where we can obtain R as the
solution of the continuous-time algebraic Riccati equation associated with the optimal
gain in the LQR problem. The computation of R is also related to the simultaneous
diagonalization problem6 in linear algebra [175]. Note that, in the scalar case, R is
the geometric mean of the variance of the post-fit residual and the innovation, as in
the (1,2) block of (5.68).
For R4, we substitute (5.85) into (5.83) and rewrite G as
G = R − (S − R) + HW SW 0 H 0 (5.89)
= 2R − S + HW SW 0 H 0 (5.90)
1
R= {G + S − HW SW 0 H 0 } (5.91)
2
5
0R + R0 − RS −1 R + G = 0
6
The solution via Cholesky decomposition and eigen decomposition or simultaneous diagonaliza-
tion can be found in Appendix A.6 and Appendix A.7, respectively.
145
For R5, recall (5.70). We can rewrite (5.74) as
proving R5.
Note that R1–R5 are theoretically the same; however, they are numerically
different. We recommend R3, since it ensures the positive definiteness of R.
5.6.2 Diagonal R
where F, indicates the Frobenius norm. The positive definite R can be estimated
from R3, given in Proposition 2. The solution is simply the diagonal elements of
the estimated R from R3. This can also be interpreted as the masking operation to
146
impose structural constraints on R, as discussed in the context of the estimation of Q
in Section 5.7.
Note that R can also be estimated using one-step-lag smoothing on the post-fit
residuals. Let us define the one-step-lag smoothed residual s(k) as in [123], that is,
W1 = P̄ F̃ 0 P̄ −1 W (5.100)
where F̃ is defined as
F̃ = (Inx − W H)F = F −1 F̄ F (5.101)
From (5.66), we can also write s(k) as a one-step moving average process
Therefore,
E[s(k)ν(k)0 ] = (Inz − HW )C(0) − HW1 C(1)0 (5.104)
and for the optimal Kalman filter gain W , we can write (5.104) as
147
A similar expression can be derived for E[s(k)µ(k)0 ], that is,
= RS −1 R = G (5.107)
and with the optimal Kalman filter gain W , combined with (5.14), we get
E[s(k)s(k)0 ] = RS −1 R + H P̄ F̃ 0 P̄ −1 W SW 0 P̄ −1 W SW 0 P̄ −1 F̃ P̄ H 0 (5.109)
= RS −1 R + RW 0 F 0 H 0 S −1 SS −1 HF W R (5.111)
= R(S −1 + W 0 F 0 H 0 S −1 HF W )R (5.112)
Note that E[s(k)s(k)0 ] can be used in a manner similar to the algorithm in Section 5.5
to obtain the optimal Kalman filter gain W . More investigation is needed into this
approach.
148
5.7 Estimation of Q, P and P̄
In this section, we discuss a method to estimate the process noise covariance Q and
the state prediction (updated) covariance P̄ (P ). Unlike the case of a Wiener process
and for a process with H = I, where both Q and P̄ can be estimated separately and
without iteration, as shown in Section 5.9.1.3, Q and P̄ (P ) are coupled in the general
case, requiring multiple iterations for the estimation to converge. The relationship
between the steady-state state prediction covariance matrix P̄ and the steady-state
updated state covariance matrix P with the process noise covariance matrix Q is
P̄ = F P F 0 + ΓQΓ0 (5.113)
−1
= F P̄ −1 + H 0 R−1 H F 0 + ΓQΓ0 (5.114)
= F̄ P̄ F̄ 0 + F W RW 0 F 0 + ΓQΓ0 (5.115)
where F̃ is defined as in (5.101) and (5.117) is derived utilizing (5.14) and the fact
(from [11]) that
P = (Inx − W H)P̄ (5.119)
149
We also define P̃ as
P̃ , F P F 0 = F̄ P̃ F̄ 0 + F W RW 0 F 0 + F̄ ΓQΓ0 F̄ 0 (5.120)
Given P̃ and S, or P and S, or P̄ and S, we can compute ΓQΓ0 in the following ways:
Q1 : ΓQΓ0 = F −1 P̃ (F −1 )0 + W SW 0 − P̃ (5.121)
Q2 : ΓQΓ0 = P + W SW 0 − F P F 0 (5.122)
Q3 : ΓQΓ0 = P̄ − F P̄ F 0 + F W SW 0 F 0 (5.123)
for P (0) . We compute P (`+1) utilizing (5.118) until the value converges, that is,
h −1 i−1
P (`+1) = F P (`) F 0 + ΓQ(t) Γ0 + H 0 R−1 H (5.125)
D(t+1) = P + W SW 0 − F P F 0 (5.126)
150
Then, we can update Q(t+1) from (5.122)
A mask matrix A can shape Q to enforce the structural constraints (e.g., diagonal
covariance). The mask matrix comprises binary matrix elements with a 1 in the
desired positions and 0, elsewhere, for example, as in an identity matrix. Then Q is
structured by
Q(t+1) = A Q(t+1) (5.128)
After the estimate of Q converges, we can estimate P̄ using either (5.113), (5.114) or
(5.115).
Given the methods to obtain estimates of R and Q in Sections 5.6 and 5.7, we
summarize our method into a six-step solution approach to obtain the optimal steady-
state W , S, P (P̄ ), Q, and R.
151
5.8.1 Step 1
Start with iteration r = 0 and initialize with a W (0) to stabilize the system as in [91].
We execute the Kalman filter for samples k = 1, 2, . . . , N as
5.8.2 Step 2
5.8.3 Step 3
In this step, we check whether any of the termination conditions given below are met.
If none of the termination conditions are met, we update the Kalman filter gain via
the proposed method, detailed later in Section 5.8.3.2.
There are five conditions that result in algorithm termination, subsequently yielding a
Kalman filter gain W for R, Q and P̄ estimates in later steps:
Condition 2: The gradient of Kalman filter gain (5.61) is within a specified threshold
152
ζ∆ .
Condition 4: The objective function value stops improving, given a specified “patience”
(number of epochs, detailed in Section 5.8.3.2) for the adaptive gradient method.
5.8.3.1.1 Condition 1: Let ∆W be the change in the Kalman filter gain from
iteration r to r + 1, that is
then
δW = ||∆W./(W (r) + W )|| (5.135)
where ./ indicates element-wise division and ||·|| is a matrix norm (In this chapter,
the authors use the Euclidean norm) and W is a very small value to protect against
zeros in the denominator. When δW is less than a specified threshold ζW , the Kalman
filter gain is assumed to have converged and we terminate the algorithm; otherwise,
we update the Kalman filter gain W for the next iteration.
153
the Euclidean norm of ∇W J is less than a sufficiently small threshold ζ∆ , that is,
When any of the above conditions are met, we terminate the algorithm. Otherwise, we
update the Kalman filter gain W for the next iteration r + 1 via the gradient direction
in (5.61). Given the gradient direction, the Kalman filter gain at iteration r + 1 is
154
updated by
W (r+1) = W (r) − α(r) ∇W J (5.137)
where α(r) is the step size for the proposed method. The step size is initialized as
β !
N
α(0) = min c ,c (5.138)
Ns
where c is a positive constant and is used to update the Kalman filter gain in the first
iteration, Ns is a hyperparameter on the number of observations, and β is a positive
constant to adapt the initial step size to the number of observations. Note that (5.138)
is selected to automatically tune the initial step size. When only a small subset of
samples are observed, we want to use a smaller step size to prevent large steps that
could result in unstable gains. If a line search is used instead, initialization is not
necessary. Use of stochastic approximation type step sizes will enable one to extend
the estimation method to on-line situations and the extended Kalman filter.
Subsequently, α(r) is computed using the bold driver method in [12, 97, 177]. That
is, after each iteration, we compare the J (r) to its previous value, J (r−1) , and set
0.5α(r−1) , if J (r) > J (r−1)
(r)
α = (5.139)
max(1.1α(r−1) , c̄), otherwise
β !
N
c̄ = min , cmax (5.140)
Ns
155
Once we update the Kalman filter gain W , we go back to Step 1 by setting r = r +1
and repeat the same process until any of the five termination conditions are met.
Note that each time J (r) ≤ J (r−1) , we save the corresponding Kalman filter gain
W (r) and J (r) , and we halve the step size each time J (r) > J (r−1) in the hope of
observing a decrease in J (r) . If the value of J (r) has consecutively increased for a
specified number of iterations (i.e., given a “patience” factor), we select the best
Kalman filter gain W by
W = arg min J (r) (5.141)
r
We then terminate the iteration and move onto Step 4 after repeating Steps 1 and 2
with the corresponding W . Note that adaptive stochastic gradient descent methods
can be applied to compute the optimal Kalman filter gain W as in [80,89,127,169,183].
5.8.4 Step 4
Once we obtain the optimal steady-state Kalman filter gain W and the corresponding
innovation covariance S, we can compute the unknown R, as in Section 5.6.
5.8.5 Step 5
Given the covariance matrix R, computed in Step 4, we can compute the covariance
matrix Q and steady-state state prediction covariance matrix P̄ , as detailed in Section
5.7.
156
5.8.6 Step 6
In this section, we consider two special cases below. The first case is when the state
transition matrix F and the measurement matrix H are both identity matrices, Inx
and Inz , where nx = nz . This considerably simplifies our method to estimate R and
Q. The second special case is when only the measurement matrix H is the identity
matrix, while the state transition matrix F remains general. Note that we can extend
either case to that of one assuming perfect measurements, that is, when H = Inx , we
have no measurement noise, and thus, R = 0.
157
5.9.1.1 Kalman Filter Gain Update for a Wiener Process
Define
L0 = E [ξ(k)ξ(k)0 ]
Note that both L0 and L1 can be computed from samples. Additionally, we can obtain
the optimal W from L1 as
W = Inx + L1 S −1 (5.150)
158
Substituting W in (5.150) into (5.148), we can write the relationship between L0 and
L1 as
L0 = S + L1 S −1 SS −1 L01 (5.151)
= S + L1 S −1 L01 (5.152)
Note that (5.152) is in a form related to the discrete algebraic Riccati equation and
has a positive definite solution [53].
Proposition 3: For a Wiener process where both the state transition matrix F and
the measurement matrix H are both the identity matrices, Inx and Inz , respectively,
where nx = nz , and given the optimal steady-state Kalman filter gain W , and
the concomitant post-fit residual sequence µ(k) and innovation sequence ν(k), the
covariance matrix R can be computed in the following ways:
Proof: SR1-SR4 are directly proven by substituting H = Inz into R1–R4. For
159
SR5, we know from (5.8) that
W S = P̄ (5.158)
S = P̄ + R (5.159)
Then,
= S − W S − SW 0 + W SW (5.162)
= R − SW 0 + W SW 0 (5.163)
R = G + W S − W SW 0 (5.164)
Symmetrizing (5.164),
1
R = G − W SW 0 + (W S + S 0 W 0 ) (5.165)
2
160
5.9.1.3 Estimation of P̄ and Q for a Wiener Process
Unlike the general case, where multiple iterations are needed to estimate both Q and
P̄ , in the case of a Wiener process, we can estimate them with no iteration.
Proposition 4: For a Wiener process, where the state transition matrix F and the
measurement matrix H are both identity matrices, Inx and Inz , respectively, and given
the optimal steady-state Kalman filter gain W , and the corresponding innovation
sequence ν(k), the steady-state state prediction covariance and the process noise
covariance Q can be computed as:
P̄ = W S (5.166)
Q = W SW 0 (5.167)
Proof: Given the relationship in (5.8) and knowing that, for a Wiener process
H = Inz , using (5.8), we have (5.166).
For a Wiener process, we can rewrite the Riccati equation (5.10) as
P̄ = P̄ − W SW 0 + Q (5.169)
161
Thus, for a Wiener process, Q can be estimated as
Q = W SW 0 (5.170)
Hence, (5.167) is proven. Note that (5.167) is used as Q(0) in the general case for
iteratively computing Q. Also note that when R = 0 (i.e., perfect measurement case),
we have,
W = Inx (5.171)
P =0 (5.172)
G=0 (5.173)
Q = S = P̄ (5.174)
In the second case, only H is the identity matrix, but F is not necessarily so.
162
Let ξ(k) be
ξ(k) = z(k) − F z(k − 1) (5.177)
Define
where
F̄ = F (Inx − W ) (5.180)
L0 = S + F̄ S F̄ 0 (5.181)
L1 = −F̄ S (5.182)
S + L1 S −1 L01 = L0 (5.183)
W = Inz + F −1 L1 S −1 (5.184)
163
and we can calculate R from R3, in (5.87). G can be obtained by running the filter
given the optimal Kalman filter gain. Note that, we can also write ξ(k) as
Then, L0 is
L0 = ΓQΓ0 + R + F RF 0 (5.187)
ΓQΓ0 = S + F̄ S F̄ 0 − (R + F RF 0 ) (5.188)
= S + F GF 0 − (R + F RF 0 ) (5.189)
L0 = S = ΓQΓ0 = P̄ (5.191)
164
1. A second-order kinematic system (a white noise acceleration or nearly constant
velocity model) by varying the lags M in the correlation.
Each case is simulated with 100 Monte Carlo (MC) runs with an assumed “patience”
of 5, ζJ = 10−6 , ζW = 10−6 , ζ∆ = 10−6 , c = 0.01, cmax = 0.2, β = 2 and the maximum
outer-loop iteration limit is set to 20. Case 5 is simulated with 200 MC runs to be
compatible with the results in [132].
For each test case, we examine the condition number of the system’s observability
and controllability matrices, as well as matrix I. The condition number of matrix A
is computed as
κ(A) = kAkkA† k (5.192)
where A† is the pseudoinverse of A and ||·|| is a Euclidean norm. The rank of matrix
I is also examined for each test case. For each test case result, we compute the 95%
probability interval (PI) via the highest probability interval method7 and denote by
r and r the corresponding lower and upper limits, respectively. We also provide the
mean and the root mean squared error (RMSE) of each distribution. The averaged
normalized innovation squared (NIS) is also provided to measure the consistency of
7
The highest probability interval is, assuming unimodality, the minimum width interval such
that the estimates of the parameter within the interval have a specified higher probability density
than points outside of the interval.
165
the Kalman filter,
nMC
1 X
¯(k) = ν(k)0 S −1 ν(k) (5.193)
nMC i=1
where nMC is the number of MC runs. The elements of each matrix A are denoted as
aij , representing the element in the ith row and the j th column of A.
5.10.1 Case 1
where δkj is the Kronecker delta function. The mean of the process and the mea-
surement noises are assumed to be zero and the corresponding variances are given
in (5.196) and (5.197), respectively. Note that the system has the condition number
of 20.1 for its observability matrix and 20.2 for its controllability matrix. The noise
166
covariance identifiability matrix I, given the initial Kalman filter gain in (5.199), is
−5
5 · 10 6
I= −5 (5.198)
2.5 · 10 −4
0 1
which has a rank of 2, and we have 2 unknown variables to estimate, implying that
Q and R are identifiable. The condition number for I is 1.5 · 105 . The least squares
problem using the minimal polynomial approach is ill-conditioned.
We performed 100 MC runs, where each run contained N = 1000 sample observations.
We set nL = 100, Ns = 1000, and vary the lags, M = 10, 20, 30, 40, 50, 100, with an
initial Kalman Filter gain
0.1319
W (0) = , (5.199)
0.0932
obtained by solving the Riccati equation with Q(0) = 0.1 and R(0) = 0.1. Figs.5.1
and 5.2 show the box plots of the estimated R using R38 and Q of 100 MC runs,
respectively, with varying M .
The bottom and top of each “box” are the first (denoted Q1 ) and third (denoted
Q3 ) quartiles of the estimate, respectively. The line in the middle of each box is the
median estimate. The distances between the tops and bottoms are the interquartile
ranges (IQR = Q3 − Q1 ). The whiskers are lines extending above and below each box
and are drawn from each end of the interquartile ranges to the upper (Q3 + 1.5IQR)
8
All (R1–R5) obtain the same values.
167
·10−2
1.15
1.1
1.05
R
0.95
0.9
10 20 30 40 50 100
M
Figure 5.1: 100 Monte Carlo runs for the Kalman filter R estimation using method R3
with various M .
Table 5.2
Monte Carlo Simulation for Case 1 Varying the Number of Lags M (Method R3)
M
10 20 30 40 50 100
R 0.0100 0.0100 0.0100 0.0100 0.0100 0.0100
Q 0.0048 0.0030 0.0027 0.0026 0.0025 0.0025
and lower (Q1 − 1.5IQR) adjacent values. Estimates beyond the whisker length are
marked as outliers (indicated by the “+” symbols). The accuracies of the estimates of
both R and Q increase with an increase in M . Table 5.2 shows the mean value of the
estimates of both R and Q. The smallest error of the median of the estimates of R
and the variability of the estimates of Q are obtained with M ≥ 100.
168
·10−2
2.5
1.5
Q
0.5
0
10 20 30 40 50 100
M
Figure 5.2: 100 Monte Carlo runs for the Kalman filter Q estimation with various M .
Given M = 100, for 100 MC runs with the initial Kalman Filter gain as in (5.199),
we found that R1–R5 estimate the same R values. The true values of R all lie within
the 95% PI associated with the distribution of estimates. Fig.5.3 shows the Q versus
R plot of each estimate. The true values are marked by “+” symbols. The reason the
estimated Q varies so much is that its value is very small compared to the measurement
noise. Fig.5.4 shows the averaged NIS and its 95% probability region, which proves
that the filter is consistent.
169
Table 5.3
Monte Carlo Simulation for Case 1 with M = 100 and PI= 2σ (100 Runs)
R Q
R1 R2 R3 R4 R5
Truth 0.01 0.01 0.01 0.01 0.01 0.0025
r 0.0092 0.0092 0.0092 0.0092 0.0092 6.49 · 10−4
Mean 0.0100 0.0100 0.0100 0.0100 0.0100 0.0025
r 0.0109 0.0109 0.0109 0.0109 0.0109 0.0046
RMSE 4.41 · 10−4 4.41 · 10−4 4.41 · 10−4 4.41 · 10−4 4.41 · 10−4 0.0010
W11 W21
Truth 0.0952 0.0476
r 0.0697 0.0255
Mean 0.0925 0.0465
r 0.1250 0.0634
RMSE 0.0147 0.0100
P̄11 P̄22
Truth 0.0011 5.13 · 10−4
r 8.47 · 10−4 2.82 · 10−4
Mean 0.0011 5.13 · 10−4
r 0.0013 8.72 · 10−4
RMSE 1.26 · 10−4 1.60 · 10−4
170
·10−3
11.5
11
10.5
R
10
9.5
9
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
Q ·10−3
5.10.2 Case 2
0.8 1 1
x(k) = x(k − 1) + v(k − 1) (5.200)
−0.4 0 0.5
z(k) = 1 0 x(k) + w(k) (5.201)
where
171
1.6
1.5
1.4
1.3
Average NIS
1.2
1.1
1
0.9
0.8
0.7
0.6
0 100 200 300 400 500 600 700 800 900 1,000
Samples
Figure 5.4: Averaged NIS for Case 1.
The system’s condition numbers for its observability and controllability matrices are
2.18 and 2.56, respectively. Here, I, given the initial Kalman filter gain, is
1.25 1.8
I=
0.5 −1.12
(5.204)
0 0.4
and the rank is 2. The number of unknown variables is 2, therefore, the system noise
variances are estimable. The condition number of I is 2.3 and indeed the minimal
polynomial approach works well for this problem. We simulated 100 Monte Carlo runs
with N = 1000, nL = 100, Ns = 1000, and an initial suboptimal Kalman filter gain
0.9
W (0) = (5.205)
0.5
172
Table 5.4
Monte Carlo Simulation for Case 2 with M = 100 and PI = 2σ (100 Runs)
R Q
R1 R2 R3 R4 R5
Truth 1.00 1.00 1.00 1.00 1.00 1.00
r 0.56 0.56 0.56 0.56 0.56 0.78
Mean 1.05 1.05 1.05 1.05 1.05 0.97
r 1.51 1.51 1.51 1.51 1.51 1.18
RMSE 0.25 0.25 0.25 0.25 0.25 0.11
W1 W2
Truth 0.65 0.09
r 0.49 −0.03
Mean 0.63 0.10
r 0.80 0.25
RMSE 0.08 0.07
P̄11 P̄22
Truth 1.89 0.35
r 1.59 0.30
Mean 1.87 0.35
r 2.07 0.39
RMSE 0.12 0.02
Table 5.4 shows the estimated noise variances. Similar to the Case 1 result, the mean
values of each of the estimated parameters are very close to their corresponding true
values. As seen in Table 5.4, the true values lie within the 95% PI associated with the
distribution of estimates for each variable Q, R, W and Pii . Fig.5.5 shows the Q and
R estimates for each MC run. As shown in Fig.5.6, the Kalman filter is considered
consistent.
173
1.6
1.4
R 1.2
0.8
0.6
0.4
0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.3
Q
Figure 5.5: Q and R estimation for Case 2.
5.10.3 Case 3
In Case 3, we test on the example in [113]. The system matrices are assumed to be as
follows.
0.75 −1.74 −0.3 0 −0.15
0.09 0.91 −0.0015 0 −0.008
F = 0
0 0.95 0 0 (5.206)
0 0 0 0.55 0
0 0 0 0 0.905
174
1.6
1.4
1.2
Average NIS
1
0.8
0.6
0.4
0.2
0 100 200 300 400 500 600 700 800 900 1,000
Samples
Figure 5.6: Averaged NIS for Case 2.
0 0 0
0 0 0
Γ=
24.64 0 0
(5.207)
0 0.835 0
0 0 1.83
1 0 0 0 1
H= (5.208)
0 1 0 1 0
1 0 0
1 0
Q=
0 1 0
R = (5.209)
0 1
0 0 1
The condition number for the observability matrix is 42.6, and the condition number
for the controllability matrix is 54.6. The system has a rank(I) equal to 5 (utilizing
175
the constraint that both R and Q are diagonal), with a total of 5 unknowns. Hence,
the Q and R parameters are identifiable. The condition number of the noise covariance
identifiability matrix I is 808. The initial Kalman filter gain is obtained by solving
the Riccati equation with
0.25 0 0
Q(0) =
0 0.5 0
(5.210)
0 0 0.75
0.4 0
R(0) = (5.211)
0 0.6
Both Mehra’s [113] and Bélanger’s [14] methods to update the Kalman filter gain W
can be unstable unless a large number of data samples are observed. This is due to the
fact that the time average converges slowly to the ensemble average. We conducted
100 MC simulations with 10,000 data samples in each run given the five-state system
described in (5.206)–(5.209). We then varied the number of observed samples from
100 to 10,000 and updated the Kalman filter gain using both the Mehra [113] and
Bélanger [14] methods. We measure the percentage of unstable Kalman filter gains by
checking if any of the eigenvalues of F̄ are outside of the unit circle for each run over
the 100 MC runs. The results are shown in Figs.5.7 and 5.8. We only display up to
5,000 samples for both methods because each approach terminated with a stable gain
when the total observation samples exceeded 5,000. The minimum number of samples
required to obtain a stable gain from these methods were about 4,500. Our proposed
176
18
16
14
12
Percentage
10
0
0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000
N
Figure 5.7: Percentage of unstable Kalman filter gains obtained from [113] for varying the
total number of observed samples (M = 40).
method always results in a stable Kalman filter gain; hence, it is not included in the
comparison of methods.
Given the 100 MC simulations with 10,000 observation samples generated in 5.10.3.1
and setting nL = 500, Ns = 10000, Table 5.5 shows the estimation of the Kalman filter
gain W over 100 Monte Carlo runs, given three different gain update methods: the
proposed method with M = 40, Mehra’s method with M = 40 [113] and Bélanger’s
method with M = 5 [14].
In Table 5.5, we see that all methods have the true values staying within its 95%
PI; however, our proposed method is able to obtain the Kalman filter gain closest to
the optimal Kalman filter gain and the RMSE are, on average, 8 and 4 times smaller
177
60
50
40
Percentage
30
20
10
0
0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000
N
Figure 5.8: Percentage of unstable Kalman filter gains obtained from [14] for varying the
total number of observed samples (M = 5).
compared to Mehra’s and Bélanger’s, respectively. The very small gains W21 and W41
are (similarly to the small Q from Case 1) very hard to estimate — they are essentially
buried in noise.
We test and compare the proposed method with that of Mehra’s and Bélanger’s for
the estimation of R, Q and P using the methodology described in Sections 5.6.2 and
5.7, combined with the converged Kalman filter gain from Table 5.5. The results are
shown in Table 5.6 and Mehra’s method results in the true value of P33 staying outside
of the 95% PI. In comparison to Bélanger’s method, the proposed method is vastly
more accurate with lower RMSE (2 to 9 times smaller) for all R, Q, and P̄ , while
Mehra’s method obtained a result that is less accurate than Bélanger’s method, as
expected from the Kalman filter gain results. The reason r1 is so difficult to estimate is
that S1 is dominated by the state uncertainty (S1 = 65, r1 = 1), i.e. the measurement
178
Table 5.5
W Estimation Monte Carlo Simulation for Case 3 (100 Runs; 10,000 Samples)
W11 W21 W31 W41 W51 W12 W22 W32 W42 W52
−3 −4
Truth 0.95 2.80 · 10 −2.86 −1.76 · 10 0.03 0.77 0.34 −1.49 0.25 −0.77
Proposed Method W11 W21 W31 W41 W51 W12 W22 W32 W42 W52
r 0.93 −8.65 · 10−3 −2.94 −0.02 0.02 0.71 0.31 −1.57 0.18 −0.84
Mean 0.95 2.52 · 10−3 −2.86 5.29 · 10−4 0.03 0.77 0.34 −1.50 0.25 −0.76
r 0.97 0.01 −2.80 0.02 0.05 0.84 0.38 −1.41 0.30 −0.68
RMSE 0.01 5.33 · 10−3 0.04 9.31 · 10−3 9.60 · 10−3 0.03 0.02 0.05 0.03 0.04
Mehra’s Method W11 W21 W31 W41 W51 W12 W22 W32 W42 W52
r 0.92 −0.04 −3.38 −0.07 −0.11 0.18 0.04 −2.61 −0.12 −1.17
−4 −3
Mean 1.01 3.79 · 10 −3.15 3.59 · 10 −0.02 0.62 0.31 −1.34 0.28 −0.62
r 1.08 0.06 −2.83 0.08 0.06 1.11 0.60 0.19 0.80 −0.11
RMSE 0.07 0.03 0.33 0.04 0.07 0.30 0.14 0.79 0.22 0.34
Bélanger’s Method W11 W21 W31 W41 W51 W12 W22 W32 W42 W52
r 0.89 −0.02 −3.02 −0.05 −0.04 0.45 0.13 −2.27 −0.02 −1.15
Mean 0.96 1.46 · 10−4 −2.85 3.85 · 10−3 0.03 0.77 0.33 −1.44 0.26 −0.77
r 1.01 0.04 −2.70 0.04 0.10 1.13 0.52 −0.32 0.56 −0.32
RMSE 0.03 0.02 0.09 0.02 0.03 0.17 0.09 0.48 0.14 0.20
noise is “buried” in a much larger innovation. In the case of r2 = 1, one has S2 = 2.45,
i.e., r2 is “visible” in the innovations.
In this section, we vary the number of samples observed, N = 500, 2500, 5000, 10000
using our six-step approach. The results are detailed in Tables 5.7 and 5.8. As
expected, the accuracy increases with an increase in N . The estimation is greatly
degraded for N < 5000. Fig.5.9 illustrates that the Kalman filter is consistent.
179
Table 5.6
R, Q and P̄ Estimation Monte Carlo Simulation for Case 3 (100 Runs; 10,000 Samples)
Method R Q P̄
r1 r2 q1 q2 q3 P̄11 P̄22 P̄33 P̄44 P̄55
Truth 1.000 1.000 1.000 1.000 1.000 72.31 1.143 1213 0.932 11.74
Proposed Method r1 r2 q1 q2 q3 P̄11 P̄22 P̄33 P̄44 P̄55
r 0.043 0.918 0.941 0.662 0.802 66.06 0.930 1141 0.633 9.639
Mean 1.067 1.008 0.998 1.000 0.994 72.36 1.146 1212 0.933 11.72
r 1.976 1.107 1.058 1.261 1.166 77.97 1.334 1290 1.172 13.76
RMSE 0.554 0.052 0.031 0.170 0.097 2.906 0.106 37.87 0.153 1.083
Mehra’s Method r1 r2 q1 q2 q3 P̄11 P̄22 P̄33 P̄44 P̄55
r 0.102 0.676 0.995 0.060 0.153 69.01 0.540 1270 0.060 0.791
Mean 1.597 1.024 1.224 1.788 0.989 89.88 1.744 1505 1.715 14.22
r 3.681 1.199 1.420 4.432 2.240 120.7 4.067 1855 4.138 34.21
RMSE 1.484 0.212 0.251 1.464 0.652 23.38 1.280 330.0 1.419 10.61
Bélanger’s Method r1 r2 q1 q2 q3 P̄11 P̄22 P̄33 P̄44 P̄55
r 0.0270 0.755 0.885 0.043 0.292 60.37 0.564 1069 0.043 2.507
Mean 1.171 1.008 0.992 1.319 1.117 74.58 1.416 1216 1.238 13.86
r 2.631 1.254 1.126 3.160 2.198 92.54 2.765 1370 2.902 27.078
RMSE 0.789 0.122 0.064 0.829 0.516 9.461 0.667 81.21 0.764 6.829
5.10.4 Case 4
0.1 0 1
x(k) = x(k − 1) + v(k − 1) (5.212)
0 0.2 2
z(k) = 1 0 x(k) + w(k) (5.213)
180
Table 5.7
R, Q and P̄ Estimation when Varying the Number of Samples Observed N , Monte
Carlo Simulation for Case 3 (100 Runs; 500–10,000 Samples)
R Q P̄
r1 r2 q1 q2 q3 P̄11 P̄22 P̄33 P̄44 P̄55
Truth 1.000 1.000 1.000 1.000 1.000 72.31 1.143 1213 0.932 11.74
N = 500 r1 r2 q1 q2 q3 P̄11 P̄22 P̄33 P̄44 P̄55
r 0.075 0.708 0.715 0.593 0.431 62.97 0.965 855.5 0.579 7.505
Mean 2.246 1.014 0.919 1.343 1.120 73.25 1.396 1139 1.263 13.45
r 5.221 1.358 1.137 2.847 1.542 89.88 2.204 1336 2.677 20.37
RMSE 2.060 0.182 0.133 0.663 0.308 6.998 0.398 136.6 0.626 3.511
N = 2,500 r1 r2 q1 q2 q3 P̄11 P̄22 P̄33 P̄44 P̄55
r 0.041 0.767 0.864 0.635 0.719 64.94 0.941 1071 0.660 8.554
Mean 1.453 1.010 0.977 1.127 1.049 73.24 1.245 1195 1.056 12.49
r 3.354 1.173 1.103 1.685 1.431 80.49 1.653 1342 1.638 16.44
RMSE 1.076 0.094 0.070 0.302 0.181 4.472 0.203 77.80 0.280 2.088
N = 5,000 r1 r2 q1 q2 q3 P̄11 P̄22 P̄33 P̄44 P̄55
r 0.043 0.885 0.919 0.578 0.726 65.55 0.856 1126 0.550 8.864
Mean 1.100 1.011 0.997 1.008 0.981 72.27 1.148 1211 0.942 11.62
r 2.580 1.161 1.084 1.397 1.265 80.13 1.390 1329 1.293 14.85
RMSE 0.757 0.077 0.045 0.218 0.140 3.692 0.136 52.74 0.197 1.530
N = 10,000 r1 r2 q1 q2 q3 P̄11 P̄22 P̄33 P̄44 P̄55
r 0.043 0.918 0.941 0.662 0.802 66.06 0.930 1141 0.633 9.639
Mean 1.067 1.008 0.998 1.000 0.994 72.36 1.146 1212 0.933 11.72
r 1.976 1.107 1.058 1.261 1.166 77.97 1.334 1290 1.172 13.76
RMSE 0.554 0.052 0.031 0.170 0.097 2.906 0.106 37.87 0.153 1.083
with
181
Table 5.8
W Estimation when Varying the Number of Samples Observed N , Monte Carlo
Simulation for Case 3 (100 Runs; 500–10,000 Samples)
W11 W21 W31 W41 W51 W12 W22 W32 W42 W52
−3 −4
Truth 0.95 2.80·10 −2.86 −1.76 · 10 0.03 0.77 0.34 −1.49 0.25 −0.77
N =500 W11 W21 W31 W41 W51 W12 W22 W32 W42 W52
r 0.90 −0.04 −2.97 −0.05 −0.03 0.56 0.27 −1.64 0.14 −0.95
Mean 0.96 −1.57 · 10−3 −2.74 7.53 · 10−3 0.04 0.79 0.33 −1.49 0.26 −0.79
r 1.01 0.03 −2.56 0.08 0.09 1.00 0.41 −1.27 0.44 −0.54
RMSE 0.03 0.02 0.17 0.03 0.03 0.11 0.03 0.11 0.08 0.12
N =2,500 W11 W21 W31 W41 W51 W12 W22 W32 W42 W52
r 0.92 −0.02 −2.97 −0.03 −6.51 · 10−3 0.64 0.28 −1.64 0.14 −0.90
Mean 0.96 8.95 · 10−4 −2.82 2.17 · 10−3 0.03 0.77 0.33 −1.50 0.25 −0.78
r 1.01 0.02 −2.62 0.05 0.07 0.95 0.40 −1.31 0.36 −0.59
RMSE 0.02 0.01 0.10 0.02 0.02 0.08 0.03 0.09 0.05 0.08
N =5,000 W11 W21 W31 W41 W51 W12 W22 W32 W42 W52
r 0.92 −0.02 −2.98 -0.03 8.68 · 10−3 0.67 0.30 −1.63 0.15 −0.89
Mean 0.95 2.02 · 10−3 −2.85 1.15 · 10−3 0.03 0.77 0.34 −1.49 0.25 −0.76
r 0.98 0.01 −2.77 0.03 0.06 0.90 0.39 −1.32 0.32 −0.64
−3
RMSE 0.02 7.08 · 10 0.06 0.01 0.01 0.06 0.02 0.08 0.05 0.06
N =10,000 W11 W21 W31 W41 W51 W12 W22 W32 W42 W52
r 0.93 −8.65 · 10−3 −2.94 −0.02 0.02 0.71 0.31 −1.57 0.18 −0.84
−3 −4
Mean 0.95 2.52 · 10 −2.86 5.29 · 10 0.03 0.77 0.34 −1.50 0.25 −0.76
r 0.97 0.01 −2.80 0.02 0.05 0.84 0.38 −1.41 0.30 −0.68
−3 −3 −3
RMSE 0.01 5.33 · 10 0.04 9.31 · 10 9.60 · 10 0.03 0.02 0.05 0.03 0.04
182
3
2.5
Average NIS 2
1.5
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Samples ·104
in each run. We set nL = 100, Ns = 1000, and λQ = 0.1. Table 5.9 shows the
estimated parameters with the initial Kalman filter gain obtained by solving the
Riccati equation with R(0) = 0.2, and Q(0) = 0.4. Note that the system is not fully
observable, i.e., the condition number for the observability matrix is infinity, while
that for the controllability matrix is 25.8. In Table 5.9, the true values lie within the
95% PI associated with each distribution. Fig.5.10 shows a wide variation of Q and R
estimates; however, the NIS in Fig.5.11 shows that the Kalman filter is consistent.
183
2.5
1.5
R
0.5
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
Q
Figure 5.10: Q and R estimation for Case 4.
1.5
1.4
1.3
1.2
Average NIS
1.1
1
0.9
0.8
0.7
0.6
0.5
0 100 200 300 400 500 600 700 800 900 1,000
Samples
Figure 5.11: Averaged NIS for Case 4.
184
Table 5.9
Monte Carlo Simulation for Case 4 with M = 100 and PI = 2σ (100 Runs; 1,000 Samples)
R Q W1 W2 P̄11 P̄22
Truth 1.00 1.00 0.50 1.01 1.01 4.08
r 3.27 · 10−3 0.26 0.05 0.32 0.27 1.09
Mean 1.04 1.02 0.50 1.00 1.03 4.16
r 2.05 2.09 1.22 2.00 2.09 8.37
RMSE 0.60 0.53 0.32 0.52 0.53 2.11
5.10.5 Case 5
0.1 0 0.1 1
x(k) = 0 0.2 0 x(k − 1) + 2 v(k − 1) (5.217)
0 0 0.3 3
z(k) = 0.1 0.2 0 x(k) + w(k) (5.218)
with
The condition number for observability and controllability matrices are 362 and 561,
respectively; hence it is an ill-conditioned case. With the initial Kalman filter gain,
185
the noise covariance identifiability matrix I is
0.28 1.37
−0.09 −0.67
I= (5.221)
0.006 0.11
0 −0.006
The rank of I is 2 and we have a total of 2 unknown variables indicating that both Q
and R are identifiable (albeit due to the high condition number, not very well relative
to the other systems tested). The condition number for I is 36.4. We simulated
200 MC runs with N = 1000 observed samples for each run. We set M = 15 to be
consistent with the setup in [132]. We also set the maximum number of iterations
nL = 100, Ns = 1000, and the regularization term from (5.129) is λQ = 0.3. Table
5.10 shows the estimated parameters with the initial Kalman filter gain obtained by
solving the Riccati equation with R(0) = 0.1, and Q(0) = 0.5. The results are detailed
in Table 5.10, where the true value stays within the 95% PI. Fig.5.12 shows the scatter
plot for the estimates of R and Q of each MC run. The plot is similar to the estimates
in [132]. However, the upper bound on Q is less than that of [132] (about 0.2), which
does not provide the detailed results presented in Table 5.10. Fig.5.13 shows that the
Kalman filter is consistent.
186
Table 5.10
Monte Carlo Simulation for Case 5 with M = 100 and PI = 2σ (100 Runs; 1,000 Samples)
0.2
0.18
0.16
0.14
0.12
R
0.1
0.08
0.06
0.04
0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7
Q
Figure 5.12: Q and R estimation for Case 5.
187
1.4
1.2
1
Average NIS
0.8
0.6
0.4
0.2
0 100 200 300 400 500 600 700 800 900 1,000
Samples
Figure 5.13: Averaged NIS for Case 5.
188
Chapter 6
Conclusion
189
for a large number of solutions (in Chapter 2, we were interested in obtaining 104
ranked solutions).We also evaluated different decomposition methods and compared
their scalability and consistency with Murty’s search space decomposition. From
our analysis, it can be seen that, when solving for a large number of solutions
within a 3-D assignment problem, utilizing dual variable inheritance, tight upper
bounds on the feasible reward, and partitioning in an optimized order offer the
best performance, solving for all m-best solutions in a fraction of the time of the
original Murty’s decomposition method, with little to no sacrifice in solution quality.
These optimizations offered a maximum speedup of 10.8 over Murty’s search space
decomposition. On average, it took 49.64 s to obtain 104 solutions for a tensor of
dimension 30×30×8 required for the nuclear fuel loading problem, which was well
within the 10 minute time limit placed on the algorithm.
In Chapter 3, we proposed five asset allocation algorithms to the maritime surveil-
lance problem: 1) Exhaustive Branch-and-Cut: enumerate all possible asset-case
combinations over all times and find the optimal allocation. 2) Greedy Branch-and-
Cut-I: enumerate and solve for the best allocation for each asset and compute the
probability of successful detection for each asset, and iteratively generate a schedule
based on the highest probability of successful detection. 3) Greedy Branch-and-Cut-II:
similar to Exhaustive Branch-and-Cut, except the algorithm directly enforces the asset
schedule once the best allocation is found. 4) Multi-Step Lookahead Approximate
Dynamic Programming-I: utilize Multi-Step Lookahead Rollout in a heuristic to it-
eratively schedule asset-case assignments for individual time epochs. 5) Multi-Step
Lookahead Approximate Dynamic Programming-II: similar to Multi-Step Lookahead
Approximate Dynamic Programming-I, except that the algorithm directly enforces the
asset schedule based on the highest incremental reward. We validated each algorithm
190
and solved the NP-hard counter-smuggling surveillance problem in a relatively short
amount of time for any of the three objectives examined – maximizing the contraband
disrupted, number of detections, or the number of smugglers detected. We found that
branch-and-cut-based methods are able to obtain more contraband when optimizing
the amount of contraband disrupted, while the approximate dynamic approaches are
better at optimizing over the number of smugglers and the number of detections.
We conducted scalability and robustness analyses to evaluate the solution quality,
runtimes, and contraband detection performance behavior of each algorithm. We
found that the algorithms scale reasonably well with the problem size. We also
found that Approximate Dynamic Programming-based approaches are able to obtain
effective asset allocations within seconds of computation time with a minimal sacrifice
in optimality, while proving to provide the most robust solution as measured by the
SN R metric. Additionally, we found the 2SLADP-II algorithm to be the best when
measuring with respect to nominal-the-best SN R. Our future work includes further
sensitivity analyses with varying asset types, aloft times, number of unavailable assets,
and rest times, and spatio-temporal variations in the PoA surface (e.g., scenario-based
asset allocation to handle uncertainty in PoA surfaces). Additionally, higher fidelity
simulations could easily be analyzed for more accurate detection models and other op-
erational PoA surfaces (e.g., historical flow surface and active cases). Future work also
includes the incorporation of UAVs, either as in [73], where solely UAVs collaborate,
or in a mixed initiative sense, an augmentation of our proposed approach.
In Chapter 4, we proposed a new approximate dynamic programming-based
approach to Pareto optimization, named NAPO. The algorithm excels for the many-
objective ship routing problem under complex and uncertain weather impacts. We
formulated the problem as an approximate dynamic programming problem with
191
uncertain, nonconvex stage costs, and developed a methodology that exploits the use
of A*, GMMs and silhouette scores to approximate many-objective Pareto fronts. We
applied the method on a real world weather event of Tropical Storm Emily off the
coast of Florida in early August of 2017, and discussed our findings. We found our
algorithm performs several orders of magnitude faster than any other multiobjective
algorithm due to smart search and selection of Pareto labels to expand, resulting in
minimal sacrifice to Pareto optimality. With this method, up to 15 objectives are able
to be optimized under a dynamic and weather-impacted uncertain environment, in
15 minutes. Future work includes adding wait time as a decision variable (i.e., not
assuming the time of arrival is the time of departure at each node), and parallelization
of the NAPO algorithm. Much of the approach is highly parallelizable and much time
can be saved if cost calculations are performed in parallel as opposed to sequentially.
Further extension to Monte Carlo tree search and Q-learning in [23] will also be
investigated.
In Chapter 5, we derived necessary and sufficient conditions for the identification of
the process and measurement noise covariances for a Gauss-Markov system. We also
provide a novel six-step successive approximation method, coupled with an adaptive
gradient method, to estimate the steady-state Kalman filter gain W , unknown noise
covariance matrices R, and Q, as well as the state prediction (or updated) covariance
matrix P̄ (or P ) when Q and R are identifiable. Moreover, we developed a novel
iterative approach to obtain positive definite Q, R and P̄ , while ensuring that the
structural assumptions on Q and R are enforced (e.g., diagonality of Q and R, if
appropriate, symmetry and positive definiteness). We provided several approaches to
estimate the unknown noise covariance R via post-fit residuals. We examined previous
methods from the literature and heretofore undiscussed assumptions of these methods
192
that result in largely inaccurate or unstable estimates of the unknown parameters. The
proposed method significantly outperformed the previous ones, given the same system
assumptions. We validated the proposed method on five different test cases and were
able to obtain parameter estimates where the truth stays within the 95% probability
interval based on the estimates. In the future, we plan to pursue a number of research
avenues, including 1) estimating Q and R using one-step lag smoothed residuals; 2)
exploring vector moving average estimation algorithms using the minimal polynomial
approach and/or truncating the effects of state; 3) replacing the batch innovation
covariance estimates by their individual or mini-batch estimates, as is done in machine
learning, to enable real-time estimation; 4) investigating accelerated gradient methods
(e.g., Adam [89], AdaGrad [51], RMSProp [169], conjugate gradient, memoryless
quasi-Newton, and trust region methods [19]); 5) automatic model selection from a
library of models; and 6) extension to nonlinear dynamic models.
193
Appendix A
From (5.6) and (5.9), we can write the steady-state updated state covariance matrix
as
P̄ = P + W SW 0 = F P F 0 + ΓQΓ0 (A.1)
Thus,
P = F P F 0 − W SW 0 + ΓQΓ0 (A.2)
194
P = F P F 0 − P H 0 R−1 SR−1 HP 0 + ΓQΓ0 (A.3)
195
A.3 Proof of Estimability of R
Without loss of generality, let us assume that am 6= 0 and the closed-loop transition
matrix F̄ is invertible. Note that W should be such that F̄ does not correspond to a
deadbeat observer (which has no noise assumption) or an observer with zero eigenvalues
for F̄ . Since R is assumed to be positive definite, F̄ is always invertible [84, 136, 176].
When the Kalman filter gain W = 0, it is evident that
Gm = am Inz (A.6)
Then,
L m = am R (A.7)
and R is clearly identifiable. When W is not zero, using (5.21) in (5.29), we have
Recall that (Inz − HW ) is invertible because it relates the innovations and post-fit
residuals (see (5.67)). So, we have
(Inz − HW ) Lm = am R (A.10)
Thus, R is estimable.
196
A.4 Procedure to Obtain W using minimal poly-
nomial
Let Ws be the suboptimal Kalman filter gain and ẽ be the difference of the state
predictions between the optimal and suboptimal filter, that is,
Then, using the minimal polynomial of F̄s from (5.21), νs (k − i) can be written as
" m
#
X
νs (k − i) = H F̄sm−i ẽ(k − m|k − m − 1) + F̄sl−i−1 F (W − Ws )ν(k − l) + ν(k − i)
l=i+1
(A.14)
m m
(" m
# )
X X X
ξ(k) = ai νs (k − i) = ai H F̄sl−i−1 F (W − Ws )ν(k − l) + ν(k − i)
i=0 i=0 l=i+1
(A.15)
m
" l−1
#
X X
= al Inz + H ai F̄sl−i−1 F (W − Ws ) ν(k − l) (A.16)
l=0 i=0
m
X
= Vl ν(k − l) (A.17)
l=0
197
where
l−1
X
Vl = al Inz + H ai F̄sl−i−1 F (W − Ws ) (A.18)
i=0
m
X
ξ(z) = Vl z−l ν(z) (A.19)
l=0
Note that we can write ξ(k) as a vector auto-regressive process of infinite order (which
can be truncated to Mth order), that is,
∞
X
ξ(k) = Yj ξ(k − j) + ν(k) (A.20)
j=1
" ∞
#−1
X
−j
ξ(z) = Inz − Yj z ν(k) (A.21)
j=1
∞
! m
X X
Inz − Yj z−j Vl z−l = Inz (A.22)
j=1 l=0
j−1
X
Yj = V j − Yj−l Vl j = 0, 1, 2, . . . , m
l=1
m
X
=− Yj−l Vl j =m+1
l=1
198
We can truncate the infinite vector auto-regressive model at M m, for i =
1, 2, . . . , M,
( M )
X
E[ξ(k)ξ(k − i)0 ] = E Yj ξ(k − j)ξ(k − i)0 + ν(k)ξ(k − i)0 (A.23)
j=1
M
Then, we obtain the estimates of {Yi }i=1 by solving
i
X m+i
X
Yj Li−j + Yj L0j−i = Li i = 1, 2, . . . , m
j=1 j=i+1
i
X m+i
X
Yj Lm−j+1 + Yj L0j−i = 0 i = m + 1, m + 2, . . . , M
j=i−m j=i+1
Let ν̂(k) be
M
X
ν̂(k) = ξ(k) − Yj ξ(k − j) (A.24)
j=1
l−1
X
Ĉl = H ai F̄sl−i−1 F (A.25)
i=0
V0 = Inz (A.27)
l−1
X
Vl = Vi Yl−i l = 1, 2, . . . , m (A.28)
i=0
199
Recalling (A.18), we have the following relationship,
Then,
Ṽ1 Inz ⊗ Ĉ1
Ṽ2 In ⊗ Ĉ2
z
vec . = vec (W ) (A.30)
.. .
..
Ṽm Inz ⊗ Ĉm
where the vec(·) function converts W into a column vector as in (5.35) and ⊗ is the
Kronecker product. We can obtain the optimal Kalman filter gain W by solving the
least squares problem, where a unique solution exists if Ĉa has full column rank.
Note that Θ(i), Ψ, and W Ĉ(0) are all functions of W in (5.55). Thus,
(M −1 )
1 X
δJ = trace [δΘ(i)Ω + Θ(i)δΩ] (A.32)
2 i=1
200
where
(A.34)
and
M
X −1 M
X −1 n
i−1
δΘ(i)Ω = [F 0 (Inx −(W +δW )H) 0 F 0 ] H 0 E2 H [F (Inx −(W +δW )H)]i−1 F
i =1 i=1
o
− F 0 (F̄ 0 )i−1 H 0 E2 H F̄ i−1 F Ω
(A.35)
(A.36)
Then,
M
X −1 M
X −1 X
i−2
0 0 ` 0 0 0 i−2−` 0 2
δΘ(i)Ω ≈ − F (F̄ ) H δW (F̄ ) H E H F̄ i−1 F Ω (A.37)
i =1 i=1 `=0
0 0 i−1
H 0 E2 H F̄ r δW H F̄ i−2−` F Ω
+ F (F̄ )
201
So,
−1
M
! " M −1 X
i−2
#
1 X X
trace δΘ(i)Ω = −trace δW 0 (F̄ 0 )i−2−` H 0 E2 H F̄ i−1 F ΩF 0 (F̄ 0 )` H 0
2 i=1 i=1 `=0
(A.38)
" M −1 X
i−2
#
X
0
= −trace δW (F̄ 0 )i−2−` H 0 E2 C(i)E2 C(` + 1)0
i=1 `=0
(A.39)
PM −1
For i=1 Θ(i)δΩ, we have
M
X −1 M
X −1
Θ(i) [δP H 0 − δW C(0)] E2 (Ψ0 − C(0)W 0 )
Θ(i)δΩ =
i=1 i=1
(A.40)
where,
Then,
−1
M
! ( M −1
1 X
0
X
trace Θ(i)δΩ = trace −δW Θ(i) [Ψ − W C(0)] E2 Ĉ(0)
2 i=1 i=1
)
1
+ Θ(i)(Ψ−W C(0))E2 H +H 0 E2 (Ψ0 −C(0)W 0 )Θ(i) δP
2
(A.43)
202
Substituting (A.42) into (A.46), we get
−1
M
! ( M −1
)
1 X X
trace Θ(i)δΩ = −trace δW 0 Θ(i)(Ψ − W C(0))E2 C(0)
2 i=1 i=1
− trace {[F δW (Ψ0 − C(0)W 0 )F 0 + F (Ψ − W C(0))δW 0 F 0 ] Z}
(A.44)
∞
" M −1 #
X 1 X
(F̄ 0 )b Θ(i)(Ψ − W C(0))E2 H + H 0 E2 (Ψ0 − C(0)W 0 )Θ(i) F̄ b
Z=
b=0
2 i=1
(A.45)
We can solve for Z via a Lyapunov equation as in (5.62). Then, by substituting Z
into (A.44), we have
−1
M
! ( "M −1 #)
1 X X
trace Θ(i)δΩ = −trace −δW 0 Θ(i)X E2 C(0) + F 0 ZF X (A.46)
2 i=1 i=1
where X can be estimated using (5.64). Then, by subsituting (A.39) and (A.46) into
(A.32), we get (5.61).
To solve for R using R3, we first perform Cholesky decomposition of S −1 . That is,
S −1 = LL0 (A.47)
Then,
L0 RS −1 RL = (L0 RL)2 = L0 GL (A.48)
203
Let us perform eigen decomposition on (A.48), that is
L0 GL = U ΛU 0 (A.49)
Then, we have
L0 RL = U Λ1/2 U 0 (A.50)
To solve for R using R3, we first perform eigen decomposition on S −1 . That is,
S −1 = U1 Λ1 U10 (A.52)
1/2
= (U1 Λ1 U10 )2 (A.53)
Noting that
2
S −1/2 GS −1/2 = S −1/2 RS −1/2 (A.54)
1/2 1/2
we perform another eigen decomposition on U1 Λ1 U10 GU1 Λ1 U10 to get
204
and R can be computed as
205
Bibliography
[3] Ž. Agić, “K-best spanning tree dependency parsing with verb valency lexicon
reranking,” in 24th International Conference on Computational Linguistics
(COLING 2012), 2012.
206
[6] W. F. Arnold and A. J. Laub, “Generalized Eigenproblem Algorithms and
Software for Algebraic Riccati Equations,” Proceedings of the IEEE, vol. 72,
no. 12, pp. 1746–1754, 1984.
[9] E. Balas and M. J. Saltzman, “An algorithm for the three-index assignment
problem,” Operations Research, vol. 39, no. 1, pp. 150–161, 1991.
207
[13] G. Bayler and H. Lewit, “The Navy Operational Global and Regional At-
mospheric Prediction Systems at the Fleet Numerical Oceanography Center,”
Weather and Forecasting, vol. 7, no. 2, pp. 273–279, 1992.
[15] R. Bellman, “On a Routing Problem,” Quarterly of Applied Mathematics, vol. 16,
pp. 87–90, 1958.
[16] J. Berclaz, F. Fleuret, E. Turetken, and P. Fua, “Multiple object tracking using
k-shortest paths optimization,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 33, no. 9, pp. 1806–1819, 2011.
[18] ——, Dynamic Programming and Optimal Control, 2nd ed. Athena Scientific,
Belmont, MA, 1995, vol. 1.
[20] D. P. Bertsekas and D. A. Castanon, “The Auction Algorithm for the Trans-
portation Problem,” Annals of Operations Research, vol. 20, pp. 67–96, 1989.
[21] D. P. Bertsekas, P. Tseng et al., RELAX-IV: A faster version of the RELAX code
for solving minimum cost flow problems. Massachusetts Institute of Technology,
Laboratory for Information and Decision Systems Cambridge, MA, 1994.
208
[22] D. L. Bessman, “Optimal interdiction of an adaptive smuggler,” Naval Post-
graduate School, Tech. Rep., 2010.
[24] N. Blachman and F. Proschan, “Optimum search for objects having unknown
arrival times,” Operations Research, vol. 7, no. 5, pp. 625–638, 1959.
209
[30] J. I. Canelon, R. S. Provence, N. Mehta, and L. S. Shieh, “An Alternative
Kalman Innovation Filter Approach for Receiver Position Estimation Based on
GPS Measurements,” International Journal of Systems Science, vol. 38, no. 12,
pp. 977–990, 2007.
[32] P. Carraresi and C. Sodini, “A binary enumeration tree to find k shortest paths,”
in Proc. 7th Symp. operations research, 1983, pp. 177–188.
[35] C.-W. Chen and J.-K. Huang, “Estimation of Steady-State Optimal Filter
Gain from Nonoptimal Kalman Filter Residuals,” Journal of Dynamic Systems,
Measurement, and Control, vol. 116, no. 3, pp. 550–553, 1994.
[36] S. Chen, “Kalman filter for robot vision: a survey,” IEEE Transactions on
Industrial Electronics, vol. 59, no. 11, pp. 4409–4420, 2011.
[37] P. C. Chu, S. E. Miller, and J. A. Hansen, “Fuel-saving ship route using the
Navy’s ensemble meteorological and oceanic forecasts,” The Journal of Defense
Modeling and Simulation: Applications, Methodology, Technology, vol. 12, no. 1,
pp. 41–56, 2013.
210
[38] J. M. Coutinho-Rodrigues, J. Clımaco, and J. R. Current, “An Interactive Bi-
Objective Shortest Path Approach: Searching for Unsupported Nondominated
Solutions,” Computers & Operations Research, vol. 26, no. 8, pp. 789–798, Jul.
1999.
[39] I. J. Cox and M. L. Miller, “On Finding Ranked Assignments with Application
to Multitarget Tracking and Motion Correspondence,” IEEE Transactions on
Aerospace and Electronic Systems, vol. 31, no. 1, pp. 486–489, 1995.
[41] P. H. Cullom, “Being Energy Smart Creates More Combat Capability,” National
Defense Magazine, pp. 18–20, July 2015.
[44] A. Das and T. K. Ghoshal, “Market Risk Beta Estimation Using Adaptive
Kalman Filter,” International Journal of Engineering Science and Technology,
vol. 2, no. 6, pp. 1923–1934, 2010.
211
[45] E. de Queirós Vieira Martins, M. M. B. Pascoal, and J. L. E. D. Santos,
“Deviation algorithms for ranking shortest paths,” International Journal of
Foundations of Computer Science, vol. 10, no. 03, pp. 247–261, 1999.
[51] J. Duchi, E. Hazan, and Y. Singer, “Adaptive Subgradient Methods for Online
Learning and Stochastic Optimization,” Journal of Machine Learning Research,
vol. 12, no. Jul, pp. 2121–2159, 2011.
212
[52] J. Dunı́k, M. Ŝimandl, and O. Straka, “Methods for Estimating State and
Measurement Noise Covariance Matrices: Aspects and Comparison,” IFAC
Proceedings Volumes, vol. 42, no. 10, pp. 372–377, 2009.
[55] A. M. Frieze and J. Yadegar, “An Algorithm for Solving 3-Dimensional Assign-
ment Problems with Application to Scheduling a Teaching Practice,” Palgrave
Macmillan Journal, vol. 32, no. 11, pp. 989–995, Nov. 1981.
[57] H. N. Gabow, “Two algorithms for generating weighted spanning trees in order,”
SIAM Journal on Computing, vol. 6, no. 1, pp. 139–150, 1977.
213
[59] M. R. Garey and D. S. Johnson, ““Strong” NP-Completeness Results: Motiva-
tion, Examples, and Implications,” Journal of the ACM (JACM), vol. 25, no. 3,
pp. 499–508, 1978.
[65] H. W. Hamacher and G. Ruhe, “On spanning tree problems with multiple
objectives,” Annals of Operations Research, vol. 52, no. 4, pp. 209–230, 1994.
214
selection and planning,” IEEE Transactions on Systems, Man, and Cybernetics:
Systems, vol. 43, no. 2, pp. 237–251, 2013.
[70] P. E. Hart, N. J. Nilsson, and B. Raphael, “A Formal Basis for the Heuristic
Determination of Minimum Cost Paths,” IEEE Transactions on Systems Science
and Cybernetics, vol. 4, no. 2, pp. 100–107, 1968.
[71] R. Hassin, “Approximation Schemes for the Restricted Shortest Path Problem,”
Mathematics of Operations Research, vol. 17, no. 1, pp. 36–42, Feb. 1992.
215
[74] H. Heffes, “The Effect of Erroneous Models on the Kalman Filter Response,”
IEEE Transactions on Automatic Control, vol. 11, no. 3, pp. 541–543, 1966.
[75] M. I. Henig, “The Shortest Path Problem With Two Objective Functions,”
European Journal of Operational Research, vol. 25, no. 2, pp. 281–291, May
1986.
[78] W. Hoffman and R. Pavley, “A method for the solution of the n th best path
problem,” Journal of the ACM (JACM), vol. 6, no. 4, pp. 506–514, 1959.
[80] C. Igel and M. Hüsken, “Improving the Rprop Learning Algorithm,” in Proceed-
ings of the Second International ICSC Symposium on Neural Computation (NC
2000), 2000, pp. 115–121.
216
[82] R. Jonker and A. Volgenant, “A Shortest Augmenting Path Algorithm for Dense
and Sparse Linear Assignment Problems,” Computing, vol. 38, no. 4, pp. 325–340,
Dec. 1987.
[89] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv
preprint arXiv:1412.6980, 2014.
217
[91] ——, “Stabilizing a Discrete, Constant, Linear System with Application to
Iterative Methods for Solving the Riccati Equation,” IEEE Transactions on
Automatic Control, vol. 19, no. 3, pp. 252–254, 1974.
[93] ——, Search and Screening: General Principles with Historical Applications.
Pergamon Press New York, 1980, vol. 7.
[95] M. Kress, J. O. Royset, and N. Rozen, “The Eye and the Fist: Optimizing
Search and Interdiction,” European Journal of Operational Research, vol. 220,
no. 2, pp. 550–558, 2012.
[96] H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval
Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.
218
[99] J. LeClerc and S. Joslyn, “The Cry Wolf Effect and Weather-Related Decision
Making,” Risk Analysis, vol. 35, no. 3, pp. 385–395, Mar. 2015.
[103] K. P. Logan, “Using a Ship’s Propeller for Hull Condition Monitoring,” Naval
Engineers Journal, vol. 124, no. 1, pp. 71–87, 2012.
[104] M. Lukka, On the optimal searching tracks for a stationary target. Institute
for Applied Mathematics, University of Turku, 1974, no. 4.
219
[107] L. Mandow and J. L. P. De La Cruz, “Multiobjective A* Search With Consistent
Heuristics,” Journal of the ACM (JACM), vol. 57, no. 5, pp. 1–25, Jun. 2008.
[108] M. Mangel, “Search for a randomly moving object,” SIAM Journal on Applied
Mathematics, vol. 40, no. 2, pp. 327–338, 1981.
[110] P. Matisko and V. Havlena, “Noise Covariance Estimation for Kalman Filter
Tuning using Bayesian Approach and Monte Carlo,” International Journal of
Adaptive Control and Signal Processing, vol. 27, no. 11, pp. 957–973, 2013.
[113] R. Mehra, “On the Identification of Variances and Adaptive Kalman Filtering,”
IEEE Transactions on Automatic Control, vol. 15, no. 2, pp. 175–184, Apr. 1970.
[115] R. Ménard and R. Daley, “The Application of Kalman Smoother Theory to the
Estimation of 4DVAR Error Statistics,” Tellus A, vol. 48, no. 2, pp. 221–237,
1996.
220
[116] M. L. Miller, H. S. Stone, and I. J. Cox, “Optimizing Murty’s Ranked Assignment
Method,” IEEE Transactions on Aerospace and Electronic Systems, vol. 33,
no. 3, pp. 851–862, Jul. 1997.
[117] S. Miller, “Smart Voyage Planning Model Sensitivity Analysis Using Ocean
and Atmospheric Models Including Ensemble Methods,” Master’s thesis, Naval
Postgraduate School, 2012.
[122] E. F. Moore, “The shortest path through a maze,” in Proceedings of the Interna-
tional Symposium on the Theory of Switching, and Annals of the Computation
Laboratory of Harvard University. Harvard University Press, 1959, pp. 285–292.
221
[123] J. B. Moore, “Discrete-Time Fixed-Lag Smoothing Algorithms,” Automatica,
vol. 9, no. 2, pp. 163–173, 1973.
[124] K. G. Murty, “An Algorithm for Ranking All the Assignments in Order of
Increasing Cost,” Operations Research, vol. 16, no. 3, pp. 682–687, 1968.
[125] K. Myers and B. Tapley, “Adaptive sequential estimation with unknown noise
statistics,” IEEE Transactions on Automatic Control, vol. 21, no. 4, pp. 520–523,
1976.
[130] L. H. Nunn, “An introduction to the literature of search theory,” Center for
Naval Analyses, Alexandria, VA – Operations Evaluation Group, Tech. Rep.,
1981.
222
[131] B. J. Odelson, A. Lutz, and J. B. Rawlings, “The autocovariance least-squares
method for estimating covariances: application to model-based control of chemi-
cal reactors,” IEEE Transactions on Control Systems Technology, vol. 14, no. 3,
pp. 532–540, 2006.
[138] M. S. Phadke, Quality engineering using robust design. Prentice Hall PTR,
1995.
223
[139] M. Q. Phan, F. Vicario, R. W. Longman, and R. Betti, “State-Space Model
and Kalman Filter Gain Identification by a Kalman Filter of a Kalman Filter,”
Journal of Dynamic Systems, Measurement, and Control, vol. 140, no. 3, 2018.
[142] ——, “Optimal search and interdiction planning,” Military Operations Research,
vol. 20, no. 4, pp. 59–73, 2015.
[144] F. Poloni and G. Sbrana, “Closed-Form Results for Vector Moving Average
Models with a Univariate Estimation Approach,” Econometrics and Statistics,
vol. 10, pp. 27–52, 2019.
[145] A. B. Poore and X. Yan, “k-near optimal solutions to improve data association
in multiframe processing,” in SPIE’s International Symposium on Optical Sci-
ence, Engineering, and Instrumentation. International Society for Optics and
Photonics, 1999, pp. 435–443.
224
[147] R. L. Popp, K. R. Pattipati, and Y. Bar-Shalom, “m-best S-D assignment
algorithm with application to multitarget tracking,” IEEE Transactions on
Aerospace and Electronic Systems, vol. 37, no. 1, pp. 22–39, 2001.
[152] J. O. Royset and H. Sato, “Route optimization for multiple searchers,” Naval
Research Logistics (NRL), vol. 57, no. 8, pp. 701–717, 2010.
225
[154] V. Sastry, T. Janakiraman, and S. I. Mohideen, “New Algorithms for Multi
Objective Shortest Path Problem,” Opsearch, vol. 40, no. 4, pp. 278–298, Dec.
2003.
[157] R. H. Shumway and D. S. Stoffer, “An Approach to Time Series Smoothing and
Forecasting using the EM Algorithm,” Journal of Time Series Analysis, vol. 3,
no. 4, pp. 253–264, 1982.
226
for Context-Driven Interdiction Operations in Counter-Smuggling Missions,” in
System Integration (SII), 2014 IEEE/SICE International Symposium on. Tokyo,
Japan, Dec. 2014, pp. 659–664.
[162] H. A. Simon, “Rational Choice and the Structure of the Environment,” Psycho-
logical Review, vol. 63, no. 2, p. 129, Mar. 1956.
[165] B. S. Stewart and C. C. White III, “Multiobjective A*,” Journal of the ACM
(JACM), vol. 38, no. 4, pp. 775–814, Oct. 1991.
227
[169] T. Tieleman and G. Hinton, “Lecture 6.5-Rmsprop: Divide the Gradient by a
Running Average of its Recent Magnitude,” COURSERA: Neural Networks for
Machine Learning, vol. 4, no. 2, pp. 26–31, 2012.
[171] United Nations Office on Drugs and Crime, “World Drug Report 2010,” 2010.
[173] United States and Joint Chiefs of Staff, Command and Control for Joint Maritime
Operations. Washington, D.C.: Joint Chiefs of Staff, 2013.
[174] E. S. Van der Poort, M. Libura, G. Sierksma, and J. A. van der Veen, “Solving
the k-best traveling salesman problem,” Computers & Operations Research,
vol. 26, no. 4, pp. 409–425, 1999.
[175] C. F. Van Loan and G. H. Golub, Matrix Computations, 4th ed. Johns Hopkins
University Press, 2013.
228
[178] A. Warburton, “Approximation of Pareto Optima in Multiple-Objective,
Shortest-Path Problems,” Operations Research, vol. 35, no. 1, pp. 70–79, Feb.
1987.
[179] A. Washburn and K. Wood, “Two-person zero-sum games for network interdic-
tion,” Operations Research, vol. 43, no. 2, pp. 243–251, 1995.
[181] T. Whitcomb, “Navy global forecast system, NAVGEM: Distribution and user
support,” in Proceedings of the 2nd Scientific Workshop on ONR DRI: Unified
Parameterization for Extended Range Prediction, 2012.
[185] ——, “Fault diagnosis using distributed PCA architecture,” May 12 2020, US
Patent 10,650,616.
229
[186] L. Zhang, D. Sidoti, G. V. Avvari, D. F. M. Ayala, M. Mishra, D. L. Kellmeyer,
J. A. Hansen, and K. R. Pattipati, “Context-Aware Dynamic Asset Allocation
for Maritime Surveillance Operations,” Journal of Advances in Information
Fusion, pp. 1–20, 2019.
230