Abstract
This article applies novel results for infinite- and finite-horizon optimal control problems with nonlinear dynamics and constraints. We use the Valentine transformation to convert a constrained optimal control problem into an unconstrained one and show uniqueness of the value function to the corresponding Hamilton–Jacobi–Bellman (HJB) equation. From there, we show how to approximate the solution of the initial (in)finite-horizon problem with a family of solutions that is \(\varGamma \)-convergent. Optimal solutions are efficiently obtained via a solver based on Pontryagin’s Principle (PP). The proposed methodology is demonstrated on the path planning problem using the full nonlinear dynamics of an unmanned aerial vehicle (UAV) and autonomous underwater vehicle (AUV) involving state constraints in 3D environments with obstacles.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Building upon the recent existence and uniqueness results regarding the Hamilton–Jacobi–Bellman (HJB) equation [2], this article proposes a novel framework for solving a general class of infinite- and finite-horizon optimal control problems with constraints. The class of optimal problems considered herein includes dynamics of the form \({\dot{x}} = a(t,x) + b(t,x)u\) (additional structure mentioned in Sect. 3) and performance indices with locally bounded, Lipschitz and convex integral functions. Elliptic, and therefore convex, state constraints are considered and handled via the Valentine transformation. In addition, the admissible control signals belong to time-varying convex sets.
We utilize the off-the-shelf software package DIDO [16, 20], built upon the pseudo-spectral analysis and Pontryagin’s principle (PP), to numerically obtain trajectories/controls starting from various initial positions. Since DIDO outputs at most one trajectory/control per computation, in order to ensure the uniqueness of trajectory/control for a given problem, we impose a smoothness condition on our model. In a nutshell, given a constrained optimal control problem, we extend earlier HJB results devised for unconstrained problems and make a connection with PP to facilitate the application of PP-based solvers.
The proposed framework revolves around infinite-horizon Bolza-type cost functions with running costs exponentially decaying in time. This class of integral functions leads to existence and uniqueness of the HJB solutions [2], and mitigates the HJB solution-seeking process. We show \(\varGamma \)-convergence of solutions with these cost functions to the solutions of the original constrained problems whose running costs do not have to decay exponentially in time. Essentially, we demonstrate how to employ our infinite-horizon framework to approximate the original constrained optimal control problems.
Prevalent approaches toward solving HJB are found in, among others, [1, 2, 7, 13, 15]. Owing to recent advances in computational power, a large portion of nowadays approaches are based on Dynamic Programming (DP) such as Approximate Dynamic Programming (ADP) and Reinforcement Learning (RL) [6]. ADP refers to model-based approaches while RL refers to model-free approaches. It is to be noted that the approach delineated herein is model-based. However, ADP and RL do not deal well with constraints [13], but so-called soft constraints are typically imposed through penalty terms in cost functions, which is yet another incentive behind our framework. A somewhat more mature and more akin to our approach is Model Predictive Control (MPC) [7]. However, optimality of MPC solutions is typically traded for computational efficiency and stability guarantees. The present work aims for maintaining optimality when resorting to numeric solutions of HJB.
This article is a comprehensive version of [23] and complements it in several directions. In particular, (a) dynamics for which the control uniqueness cannot be guaranteed (but merely the value-function uniqueness) are taken into account (b) along with corresponding numerical examples involving autonomous robots (e.g., the AUV and UAV), and (c) all mathematical proofs and pertaining discussions omitted from [23] are enclosed here.
The remainder of this article is organized as follows. Section 2 introduces the utilized notation and definitions. The infinite- and finite-horizon constrained optimal control problems considered herein are formulated in Sect. 3. Section 4 proposes a methodology to successfully solve the problems of interest. Section 5 establishes relations between solutions obtained in Sect. 4 and the original infinite- and finite-horizon problems from Sect. 3. Section 6 provides linear and nonlinear numerical examples. The conclusions and future research avenues are in Sect. 7. Appendix brings several technical results aiding in article self-containment.
2 Preliminaries
Going forward, we use the following conventions. We denote by \(|\cdot |\), \(|\cdot |_{\infty }\), \(|\cdot |_{m}\), and \(\langle \cdot ,\cdot \rangle \) the Euclidean norm, the max norm, the matrix norm induced by the Euclidean norm, and the inner product in \(\mathbb {R}^{k}\), respectively. For a nonempty set \(C \subset \mathbb {R}^{k}\), we denote its closure by \({\overline{C}}\) and its boundary by \(\partial C\). For a matrix M, we denote its Moore-Penrose (left) inverse by \(M^{\dag }\).
For row vectors we use \(r = [r_{1}\; \ldots \; r_{N_{x}}]\) and for column vectors we use \(r = [r_{1};\; \ldots ;\; r_{N_{x}}]\), where \(r_i\)’s are scalars. When the argument of a function \(f:\mathbb {R}\rightarrow \mathbb {R}\) is a row vector r, it means \(f(r) = [f(r_{1})\; \ldots \; f(r_{N_{x}})]\). Likewise, for a column vector r, we have \(f(r) = [f(r_{1});\; \ldots ;\; f(r_{N_{x}})]\) and I(r) be a diagonal matrix where the diagonal entries are given by r. Unless stated otherwise, vectors are column vectors by default.
Define the matrix \({\hat{x}}\in SO(3)\) such that for \(x\in \mathbb {R}^{3}\), we have
Let \(S(x)\in \mathbb {R}^{9\times 9}\), where \(x\in \mathbb {R}^{3}\), be a block diagonal matrix with main diagonal matrices \({\hat{x}}\). For \(x \in \mathbb {R}^{9}\), define \([x]_{3} \in \mathbb {R}^{3\times 3}\) as
Let I and J be closed intervals in \(\mathbb {R}\). Let \(\mu \) be the Lebesgue measure. Denote by \(L^{1}(I;\;J)\) the set of all J-valued Lebesgue integrable functions on I. We say that \(f \in L^{1}_{loc}(I;\;J)\) if \(f \in L^{1}(K;\;J)\) for any compact subset \(K \subset I\). We denote by \({\mathcal {L}}_{loc}\) the set of all functions \(f \in L^{1}_{loc}([0,\infty );\;\mathbb {R}^{+})\) such that \(\lim _{\sigma \rightarrow 0}\sup \limits \{\int _{J}f(\tau )\,d\tau \;: J\subset [0,\infty ),\; \mu (J)\le \sigma \} = 0\). We say that a function is \(C^{1}\) if it is continuous and has continuous first derivatives in all variables. For a function f, denote \(f^{(l)}\), \(l\in \mathbb {N}\), as the \(l^{\text {th}}\) time derivative and \(\{f^{(l)}\}\) denote the finite sequence of time derivatives of f from one to some order. Suppose g is a function such that its evaluation can be related to a sequence, i.e., a series or a max/min function. Let \(g(\{s_l\},K, v)\) be the corresponding value, where \(\{s_l\}\) is the given sequence, K is the length of the sequence, and v is a vector whose components represent scalars that may arise for the computation of \(g(\{s_l\},K, v)\). Note that the lengths of \(\{s_l\}\) and v may be different.
Definition 2.1
A set-valued function \(G: \mathbb {R}^{k} \rightsquigarrow \mathbb {R}^{n}\) is lower semicontinuous if, for all \(x \in \mathbb {R}^{k}\), \(\text {Lim} \inf \limits _{y\rightarrow x}G(y)\subset G(x)\).
Finally, we introduce a form of convergence for functionals.
Definition 2.2
Given a family of functionals \(\{F_{n}: X\times U\rightarrow \mathbb {R} \}\) and a functional \(F: X\times U\rightarrow \mathbb {R}\), we say that \(F_{n}\) \(\varGamma \)-converges to F (denoted \(F_{n}\xrightarrow {\varGamma } F\)) if the following hold:
-
1.
For every \((x,u)\in X\times U\), there exists a sequence \((x_{n}, u_{n})\) such that, as \(n \rightarrow +\infty \), \(F(x,u)\ge \limsup F_{n}(x_{n},u_{n})\) and \((x_{n}, u_{n})\rightarrow (x,u)\) “in some sense". (This will be made precise later on.)
-
2.
For every sequence \((x_{n}, u_{n})\) and (x, u) such that \((x_{n}, u_{n})\rightarrow (x,u)\) “in some sense", as \(n \rightarrow +\infty \), we have \(F(x,u)\le \liminf F_{n}(x_{n},u_{n})\).
2.1 Motivations for Using DIDO
To establish a connection between PP and HJB, we consider the method of characteristics for first order nonlinear PDEs. For first order PDEs of the form
with a continuous function \(F: \mathbb {R}^{N_{x}}\times \mathbb {R}\times \mathbb {R}^{N_{x}}\rightarrow \mathbb {R}\) and endpoint data \({\overline{V}}(x)\), one can construct a solution V by solving the characteristic ODEs (substituting \(p = V_{x}\))
with \(x(0) = y,\; V(0) = V(y),\; p(0) = V_{x}(y)\) for \(y\in \partial \varOmega \). It can be shown ([15, Section 7.2], [5, Chapter 8]) that the characteristic ODEs for HJB are the canonical ODEs for PP. It should be noted that the underlying derivation is based on sufficient smoothness of all functions involved—including V being \(C^{1}\). Fortunately, characteristic ODEs can still be used with weaker assumptions on V ([15, Section 7.2], [5, Chapter 8]).
A side motivation to connect PP and HJB is the following. Controls generated by PP are, in general, open-loop, that is, \(u = u(t)\). Controls given by HJB, in contrast, are closed-loop, that is, \(u = u(t,x)\). Consequently, the concept of real-time optimal control (RTOC) can be used to transform an open-loop control, such as those constructed by DIDO, into a closed-loop control. The idea behind RTOC is to assume that an open-loop control exists beforehand and then to account for nonzero computational time of the optimal trajectory (i.e., updates based on given data). Methods that could be used to transform open-loop controls generated by DIDO into closed-loop controls are found in [19, 21].
3 Problem Formulation
Before stating the problem, let us define our state space for ease of notation. Let \(\mathbb {X}_{s}\subset \mathbb {R}^{N_{s}}\), with \(N_{s} \in \{2,\; 3\}\), be the obstacle space and \(\mathbb {X}_{ns}\subset \mathbb {R}^{k*N_{s}}\), with \(k \in \mathbb {N}\cup \{0\}\), be the free space; for \(k=0\), one has \(\mathbb {X}_{ns} = \emptyset \). Let \(\mathbb {X} = \mathbb {X}_{s}\times \mathbb {X}_{ns}\) with both \(\mathbb {X}_{s}\) and \(\mathbb {X}_{ns}\) as open, connected, and bounded sets whose cluster points fall within \(\mathbb {X}\) or on \(\partial {\mathbb {X}}\). The dimension of \(\mathbb {X}\) is equal to \(N_{x} = (k+1)N_{s}\). Here \(\mathbb {X}_{s}\) represents the spatial region in which a mobile vehicle moves – typically in 2- or 3-dimensions. \(\mathbb {X}_{ns}\) represents the space in which all other variables (i.e., velocities, rotations, etc.) are active. For vector \(x_{ns}(t) = [x_{ns1}(t); \;\ldots ; x_{nsk}(t)]\in \mathbb {X}_{ns}\), components \(x_{nsi}(t)\) are of dimension \(N_{s}\times 1\). We also take the functions \(t\mapsto u(t)\) to be measurable, admissible, and taking values in \(U(t)\subset \mathbb {R}^{N_{u}}\) with U(t) bounded for all \(t\in [t_0, t_f]\).
We consider the following optimal control problem:
subject to
where \(a: [t_{0},t_{f}]\times \mathbb {X}\rightarrow \mathbb {R}^{(k+1)N_{s}}\) and \(b:[t_{0}, t_{f}]\times \mathbb {X}\rightarrow \mathbb {R}^{(k+1)N_{s}\times N_{u}}\) are defined as follows for \(x = [x_{s};\; x_{ns1}; \;\ldots ; x_{nsk}]\) and some \(K_{o}\in \mathbb {N}\):
with functions \(a_{K_{o}}:[t_0, t_f]\times \mathbb {X}\rightarrow \mathbb {R}^{N_{s}},\; a_{e}:[t_0, t_f]\times \mathbb {X}\rightarrow \mathbb {R}^{(k+1-K_{o})*N_{s}}, \; b_{K_{o}}:[t_0, t_f]\times \mathbb {X}\rightarrow \mathbb {R}^{N_{s}\times N_{u}},\text { and } b_{e}: [t_0, t_f]\times \mathbb {X}\rightarrow \mathbb {R}^{(k+1-K_{o})*N_{s}\times N_{u}}\) bounded, Lipschitz continuous, and componentwise convex in \(x\in \mathbb {X}\) for almost all \(t\ge t_{0}\). Also, we assume the dynamical system \({\dot{x}} = a(t,x)+b(t,x)u\) is controllable in \(\mathbb {X}\) while satisfying all state constraints (the role and actual value of \(K_{o}\) will be made clear in Sect. 4.1). It should be noted that many autonomous system models, such as AUV and UAV, follow this structure. Furthermore, U(t) is a compact and convex subset of \(\mathbb {R}^{N_{u}}\) for each t.
Define \(h: \mathbb {X}_{s} \rightarrow \mathbb {R}^{n_{o}}\) componentwise as
for \(i = 1,\ldots ,n_{o}\), where \({\bar{x}}_{i} = ({\bar{x}}_{i1},\; {\bar{x}}_{i2},\;\ldots ,\; {\bar{x}}_{iN_{s}})\), \(r_{i} = (r_{i1},\; r_{i2},\;\ldots ,\; r_{iN_{s}})\) represents the center and principal axis lengths of an ellipsoid, respectively, and \(n_{o}\) represents the number of state constraints (that is, obstacles) in our model with \(n_{o}\ge N_{s}\). The case for \(0< n_o < N_s\) will appear in future works. We define \(h^{L}\) componentwise as \(h^{L}_{i} = ({\hat{r}}_{i}+d_{i})^{2}\), where \({\hat{r}}_{i}\) relates to principle axis lengths of a given obstacle (i.e., if the obstacle is a sphere, \({\hat{r}}_i\) is just the radius) and \(d_{i} > 0\) represents a buffer distance for avoiding obstacles (for example, when mobile agents are represented merely as points, \(d_{i}\) takes into account their actual dimensions so as to avoid collisions). Without loss of generality, it is also assumed that the obstacles are completely contained in \(\mathbb {X}_{s}\) (just cut out the partially contained obstacles from \(\mathbb {X}_{s}\)). For further clarification, the inequality constraint \(h^{L}\le h(x_s)\) should be interpreted componentwise as \(h^{L}_{i}\le h_{i}(x_s)\) for \(i = 1,\ldots ,n_o\). At this point we make an additional assumption on the vector \(b_{K_o}(t,x)\), i.e., the vector \(\{\nabla _{x}h(x_s)b_{K_o}(t,x)\}^{\dag }\), is convex and Lipschitz continuous in x.
Let l(t, x, u) be nonnegative, Lipschitz in x, uniformly continuous in u, lower semicontinuous in t, bounded for all \(t \ge t_0\), strictly convex in u, and coercive and bounded from below in x and u. Note that since \(\mathbb {X}\) and U(t) are compact, l(t, x, u) is bounded for any feasible trajectory-control pair. The terminal cost \(E(\cdot )\) for finite-horizon problems is typically positive definite in \(x(t_f)\) with respect to \(x_f\) whereas it is absent in infinite-horizon problems. Clearly, we allow \(t_f\) to be finite or infinite. In what follows, we seek to approximate a solution to problem (3) in a reasonable way.
4 Methodology
In order to approximate a solution to (3), we exploit the following auxiliary infinite-horizon optimal control problem:
for \(\lambda > 0\) where \(F_{\lambda }(t,x,u) = l(t,x,u)e^{-\lambda t}\).
4.1 Application to State-Constrained Problem
One can transform (5) into an unconstrained problem using the Valentine transformation [18, 22]. Consider a slack function \(\alpha : [t_0, \infty )\rightarrow \mathbb {R}^{n_{o}}\) such that
where \({\tilde{h}}(x) = h_{L} - h(x)\) and \(\frac{1}{2}(\alpha (t))^{2} = \frac{1}{2}[\alpha _{1}(t)^{2},\ldots ,\alpha _{n_{o}}(t)^{2}]^{\top }\). In [23], the differential equation
is rewritten as
where \(I(\alpha )\) is an \(n_{o}\times n_{o}\) diagonal matrix with the main diagonal equal to \(\alpha \). By the implicit function theorem, [23] writes the control u from (8) as \(u = \varPhi (t,x(t),\alpha (t),{\dot{\alpha }}(t))\) and defines a new control \({\tilde{u}} ={\dot{\alpha }}\). This derivation is possible in [23] due to \(K_{o} = 1\). In general, this might not be the case. For example, consider the quadrotor example in Sect. 6.3 with one obstacle (for simplicity). Here, with \(x_{s} = [x;\; y;\; z] \in \mathbb {R}^{3}\), we have the following case:
where \(I({r}_{-1;1})=\text {diag}\{r_{11}^{-1}; \ldots ; r_{1N_{x}}^{-1}\}\). We note that the time derivative of \({\tilde{h}}\) is not an explicit function of the control (given in the example as \(\tau \), which is the notation more common in the UAV community). In fact, getting an expression similar to (8) for the quadrotor example requires the second time derivative of (6). For a general application of the Valentine transformation, one differentiates (6) until the control variable explicitly appears on the left hand side, that is,
with \(K_{o}\) as the order of differentiation for which this happens (for the first time) and such that \(\partial {\tilde{h}}^{(K_{o})}/\partial u \) (the partial derivative of \({\tilde{h}}^{(K_{o})}\) with respect to u) is nonzero. From the definition of \(K_{o}\), we can replace \({\tilde{h}}^{(K_{o})}(x(t))\) in (9) with \({\tilde{h}}^{(K_{o})}(x(t),u(t))\). For a more detailed derivation, we first show the case with \(n_{o} = 1\) and then follow with the general case of \(n_{o}\) obstacles. This is possible due to the fact that the number of obstacles involved is of no concern to the application of the Valentine transformation (the condition of \(n_o \ge N_{s}\) is necessary for convexity conditions in the proof of Theorem 4.1). First, consider the term \({\tilde{h}}^{(K_{o})}(x, u)\), which is in fact reduced to \({\tilde{h}}^{(K_{o})}(x, u) = -h^{(K_{o})}(x, u)\).
Next,
where
Similarly,
with
and combining (10) and (11) gives us
For general \(n_{o}\), we define the following matrices for \(t\ge t_{0}\) and \(i \in \{1,\ldots ,n_{o}\}\):
-
\(\alpha ^{(l)} \in \mathbb {R}^{n_{o}}\) such that \(\alpha ^{(l)} = [\alpha ^{(l)}_{1};\ldots ;\alpha ^{(l)}_{n_{o}}]\), \(\alpha ^{0} = \alpha \), and \(\overrightarrow{\alpha } = [\alpha ; \alpha ^{(1)};\ldots ;\alpha ^{(K_{o}-1)}]\),
-
\(D(x_{s}(t))\in \mathbb {R}^{n_{o}\times N_{s}}\) such that
$$\begin{aligned} D_{ij}(x_{s}(t)) = 2\left( \frac{x_{sj}(t)-{\bar{x}}_{ij}}{r^{2}_{ij}}\right) = {\partial _{x_{sj}}h_{i}(x_{s}(t))}, \end{aligned}$$ -
\(\overrightarrow{G}_{s}(\{x^{(l)}_{s}(t)\}; K_{o},\; {r})\in \mathbb {R}^{n_{o}}\) where
$$\begin{aligned} \overrightarrow{G}_{s,i}(\{x^{(l)}_{s}(t)\}; K_{o},\; {r}) = G_{s}(\{x^{(l)}_{s}(t)\}; K_{o},\; {r_{i}}), \end{aligned}$$ -
\(\overrightarrow{G}_{\alpha }({\overrightarrow{\alpha }}; K_{o}) \in \mathbb {R}^{n_{o}}\) where
$$\begin{aligned} \overrightarrow{G}_{\alpha ,i}({\overrightarrow{\alpha }}; K_{o}) = G_{\alpha }({\{\alpha ^{(l)}_{i}\}^{K_{o}-1}_{l = 1}}; K_{o}). \end{aligned}$$
As in [23], we express \(u = \varPhi (t, x, {\overrightarrow{\alpha }}, {\tilde{u}})\) with new control \({\tilde{u}} = \alpha ^{(K_{o})}\) and derive \(\varPhi \) from the generalized version of (12) as
With notation \(\overrightarrow{\alpha } = [\alpha ; \beta _{1};\ldots ;\beta _{K_{o}-1}]\), we reach the following unconstrained problem
such that
where \(\mathbb {A}\subset \mathbb {R}^{K_{o}n_{o}}\) and \({\tilde{F}}_{\lambda }(t,x,\overrightarrow{\alpha }, {\tilde{u}}):= F_{\lambda }(t,x,\varPhi (t, x, \overrightarrow{\alpha },{\tilde{u}}))\). Here we note that \(\alpha (t_0) \ne 0\) and that the expressions for \(\alpha ,\; \beta _{1},\; \beta _{2},\) and the like are interpreted componentwise. We also define the corresponding Hamiltonian
From here, the following theorem can be applied with \({\tilde{F}}_{\lambda }\) and \({\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}}):= a(t,x)+b(t,x)\) \(\varPhi (t,x,\overrightarrow{\alpha },{\tilde{u}})\) in place of \(F_{\lambda }\) and \(f(t,x,u):=a(t,x)+b(t,x)u\), respectively. Observe that in both necessity and sufficiency, the existence of a global minimizer \({\tilde{u}}\) for \({\tilde{H}}\) is a condition of interest. In general, while existence can be established through some effort, uniqueness is not guaranteed.
Remark 4.1
It should be noted that above equations (6), (8), (9), and (12) hold along the trajectories with further details held in [18, 22]. Also, observe that for l, a and b satisfying the assumptions from Sect. 3, the function \({\tilde{l}}(t,x,\overrightarrow{\alpha },{\tilde{u}}) = l(t,x,\varPhi (t,x,\overrightarrow{\alpha },{\tilde{u}}))\) is convex and Lipschitz in \((x,\overrightarrow{\alpha })\) and strictly convex in \({\tilde{u}}\). Consequently, \({\tilde{F}}_{\lambda }\) inherits the favorable properties of \(F_{\lambda }\).
Remark 4.2
It is possible that a symmetric obstacle configuration exists such that multiple optimal control/trajectory pairs could exist for a particular example of (5). We note that although the map \(\varPhi \) is affine in \({\tilde{u}}\), the Valentine transformation that changes (5) into (14)–(15) can allow for the multiple optimal pairs possible in a given example of (5) to be represented by the unique optimal pair for corresponding (14)–(15) via restrictions, projections, etc. (Further details will be included in the upcoming work.)
4.2 Weak Solutions to HJB
We now present the main theorem of this article, which establishes a unique weak solution of the HJB equation in the class of lower semicontinuous functions vanishing at infinity. For our purposes, the HJB equation is given as follows
Theorem 4.1
For the HJB equation derived from (14)–(15), there exists a unique viscosity solution \({\hat{V}}\) vanishing at infinity. In addition, if \({\tilde{H}}\) is continuous and convex in \({\tilde{u}}\), \({\tilde{U}}(t,x,\overrightarrow{\alpha })\) is a compact and convex set and \(K_{o} = 1\), then there exists a unique control \({\hat{u}}\).
Proof
In this proof, we first establish uniqueness of \({\hat{V}}\) by showing that (14)–(15) satisfies a set of assumptions and by using a proof in the style of Proposition 2 from the Appendix. Afterward, we establish existence and uniqueness of the control \({\hat{u}}\) based on the properties of the Hamiltonian \({\tilde{H}}\).
Existence and Uniqueness of the viscosity solution \({\hat{V}}\)
For existence and uniqueness of \({\hat{V}}\), we check our problem against the assumptions required in Proposition 2.
-
A1
We note that both \({\tilde{f}}\) and \({\tilde{F}}_{\lambda }\) are continuous in \((t,{\tilde{u}})\) for all \(x\in \mathbb {X},\, \overrightarrow{\alpha }\in \mathbb {A}\) and thus Lebesgue-Borel measurable in those arguments. For \(\phi \in L^{1}([0,\infty );\mathbb {R})\), let us choose \(\phi (t) = {\tilde{l}}_{-}e^{-\lambda t}\), where \({\tilde{l}}_{-} = \inf \limits _{(t,x,\overrightarrow{\alpha },{\tilde{u}})}|{\tilde{l}}(t,x,\overrightarrow{\alpha },{\tilde{u}})|\).
-
A2
We need to establish a growth bound for a.e. \(t>0\) and for all \((x,\overrightarrow{\alpha })\in \mathbb {X}\times ~ \mathbb {A},\) \({\tilde{u}}(t) \in {\tilde{U}}(t,x,\overrightarrow{\alpha })\). First, we rewrite the dynamics for easier computation of bounds. From (13), we rewrite the dynamic function \({\tilde{f}}\) as
$$\begin{aligned} {\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}}) = A(t,x,\overrightarrow{\alpha })+B(t,x,\overrightarrow{\alpha }){\tilde{u}}(t), \end{aligned}$$(18)where
$$\begin{aligned} A(t,x,\overrightarrow{\alpha }) = a(t,x) + b(t,x)\{D(x_{s})b_{{K_o}}(t,x)\}^{\dag }\\ \left[ -\overrightarrow{G}_{s}(\{x^{(l)}_{s}(t)\}; K_{o},\; {r}) + G_{\alpha }({\overrightarrow{\alpha }}; K_{o}) - D(x_{s}){a_{K_{o}}(t,x)}\right] \end{aligned}$$and
$$\begin{aligned} B(t,x,\overrightarrow{\alpha }) = b(t,x)\{D(x_{s})b_{{K_{o}}}(t,x)\}^{\dag }I(\alpha ). \end{aligned}$$Going forward, we note that \({\tilde{f}}\) is bounded in all arguments (a.e. in t); thus, we have
$$\begin{aligned} |{\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}})| \le K_{{\tilde{f}}}(t)(1+|(x,\overrightarrow{\alpha })|). \end{aligned}$$(19)Next,
$$\begin{aligned} \begin{array}{ll} |{\tilde{F}}_{\lambda }(t,x,\overrightarrow{\alpha },{\tilde{u}})|&{}\le \sup \limits _{(x,\overrightarrow{\alpha },{\tilde{u}})}|{\tilde{l}}(t,x,\overrightarrow{\alpha },{\tilde{u}})|e^{-\lambda t},\\ &{}\le K_{{\tilde{F}}_{\lambda }}(t)(1+|(x,\overrightarrow{\alpha })|), \end{array} \end{aligned}$$(20)where \( K_{{\tilde{F}}_{\lambda }}(t):= \sup \limits _{(x,\overrightarrow{\alpha },{\tilde{u}})}|{\tilde{l}}(t,x,\overrightarrow{\alpha },{\tilde{u}})|e^{-\lambda t}\). Thus, we take \(c \in L^{1}_{loc}([0,\infty );\mathbb {R}^{+})\) as
$$\begin{aligned} c(t) = K_{{\tilde{f}}}(t)+K_{{\tilde{F}}_{\lambda }}(t). \end{aligned}$$ -
A3
Define set-valued maps \({\mathcal {F}}(t,x,\overrightarrow{\alpha }):= \{{\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}}): {\tilde{u}}\in {\tilde{U}}(t,x,\overrightarrow{\alpha })\}\) and \({\mathcal {L}}(t,x,\overrightarrow{\alpha })\) \(:= \{{\tilde{F}}_{\lambda }(t,x,\overrightarrow{\alpha }, {\tilde{u}}): {\tilde{u}}\in {\tilde{U}}(t,x,\overrightarrow{\alpha })\}\). Since the functions \({\tilde{f}}\) and \({\tilde{F}}_{\lambda }\) are continuous for \((x,\overrightarrow{\alpha })\in \mathbb {X}\times \mathbb {A}\) and t a.e., one has that the maps \({\mathcal {F}}\) and \({\mathcal {L}}\) are continuous. To show the closedness of images, consider the sequence \(\{(x_{n},\overrightarrow{\alpha }_{n})\}\subset \mathbb {X}\times \mathbb {A}\) that converges to a point \(({\bar{x}},\bar{\overrightarrow{\alpha }})\in \mathbb {X}\times \mathbb {A}\). It is clear that
$$\begin{aligned} \begin{array}{c} {\mathcal {F}}(t,x_{n},\overrightarrow{\alpha }_{n})\rightarrow {\mathcal {F}}(t,{\bar{x}},\bar{\overrightarrow{\alpha }}),\\ {\mathcal {L}}(t,x_{n},\overrightarrow{\alpha }_{n})\rightarrow {\mathcal {L}}(t,{\bar{x}},\bar{\overrightarrow{\alpha }}). \end{array} \end{aligned}$$Convexity of the set
$$\begin{aligned} \{({\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}}); {\tilde{F}}_{\lambda }(t,x,\overrightarrow{\alpha },{\tilde{u}}) +r): {\tilde{u}}\in {\tilde{U}}(t,x,\overrightarrow{\alpha }), r \ge 0\} \end{aligned}$$in \({({\tilde{u}}, r)}\) follows from the convexity of \({\tilde{f}}\) and \({\tilde{F}}_{\lambda }\) in \({\tilde{u}}\) and the affine relationship of r.
-
A4
To satisfy the assumption A4 of Proposition 2, we break this item into finding suitable bounds for \({\tilde{F}}_{\lambda }\) and \({\tilde{f}}\). First, fix t and \({\tilde{u}}\in {\tilde{U}}(t,x,\overrightarrow{\alpha })\). Next, for \((x,\overrightarrow{\alpha }_{1})\) and \((y,\overrightarrow{\alpha }_{2})\), we proceed with the bounds on \({\tilde{F}}_{\lambda }\). With the assumptions on \({\tilde{l}}\), we have \(|{\tilde{F}}_{\lambda }(t,x,\overrightarrow{\alpha }_{1},{\tilde{u}})-{\tilde{F}}_{\lambda }(t,y,\overrightarrow{\alpha }_{2},{\tilde{u}})| \le K_{{\tilde{F}}_{\lambda }}(t)|(x,\overrightarrow{\alpha }_{1})-(y,\overrightarrow{\alpha }_{2})|.\) In order to establish the appropriate bounds on \({\tilde{f}}\), we consider similar bounds on the matrix-valued functions A and B as given in A2. We note that since a and b are Lipschitz and bounded in x for a.e. \(t \ge t_{0}\), we can establish Lipschitz bounds for A of the form
$$\begin{aligned} |A(t,x,\overrightarrow{\alpha }_{1})-A(t,y,\overrightarrow{\alpha }_{2})|\le K_{A}(t)|(x,\overrightarrow{\alpha }_{1})-(y,\overrightarrow{\alpha }_{2})|. \end{aligned}$$For the function B, we have
$$\begin{aligned} \begin{array}{l} |(B(t,x,\overrightarrow{\alpha }_{1})-B(t,y,\overrightarrow{\alpha }_{2})){\tilde{u}}(t)|\le K_{B}(t){\tilde{u}}_{b}|(x,\overrightarrow{\alpha }_{1})-(y,\overrightarrow{\alpha }_{2})|. \end{array} \end{aligned}$$Here \({\tilde{u}}_{b} = \sup \limits _{{\tilde{u}}\in {\tilde{U}}}|{\tilde{u}}|\) is justified due to a coercivity argument similar to the proof of Theorem 7.4.6 in [8] using point A2 and coercive properties of \({\tilde{F}}\) in \({\tilde{u}}\). Thus, we conclude
$$\begin{aligned} \begin{array}{l} |{\tilde{f}}(t,x,\overrightarrow{\alpha }_{1},{\tilde{u}})-{\tilde{f}}(t,y,\overrightarrow{\alpha }_{2},{\tilde{u}})|\le C_{{\tilde{f}}}(t)|(x,\overrightarrow{\alpha }_{1})-(y,\overrightarrow{\alpha }_{2})|,\\ \end{array} \end{aligned}$$(21)where \(C_{{\tilde{f}}}(t) = K_{A}(t) + K_{B}(t){\tilde{u}}_{b}\). In other words, A4 is satisfied with
$$\begin{aligned} k(t) = K_{{\tilde{l}}}(t)e^{-\lambda t} + ~C_{{\tilde{f}}}(t). \end{aligned}$$ -
A5
To show that \(k \in {\mathcal {L}}_{loc}\), we note that the suprema of \(K_{{\tilde{l}}}(t),\,K_{A}(t),\text { and }K_{B}(t)\) are finite. Thus we can say \(k(t) \le 3*\max \{\sup \limits _{t}K_{{\tilde{l}}}(t),\, \sup \limits _{t}K_{A}(t), \sup \limits _{t}K_{B}(t){\tilde{u}}_{b}\}=: K\). For \(J \subset [t_{0},\infty )\) and \(\sigma \in [t_{0},\infty )\), we have
$$\begin{aligned} \int _{J}k(\tau )d \tau \le K\mu (J), \end{aligned}$$$$\begin{aligned} \sup \limits _{\{J:\mu (J)\le \sigma \}}\int _{J}k(\tau )d \tau \le \sup \limits _{\{J:\mu (J)\le \sigma \}} K\mu (J). \end{aligned}$$(22)Letting \(\sigma \) approach zero shows that the RHS of (22) approaches zero and thus \(k \in {\mathcal {L}}_{loc}\). Moreover, we get
$$\begin{aligned} \begin{array}{rl} \frac{1}{t} \int ^{t}_{t_0}(c(\tau ) +k(\tau ))d \tau &{}\le K\frac{t-t_{0}}{t} + \frac{1}{t}\int ^{t}_{t_0}c(\tau ) d\tau \\ &{}\le (K + c_{max})(1+\frac{t_{0}}{t}). \end{array} \end{aligned}$$(23)As t approaches infinity, we note that the right-hand side of (23) is finite.
-
A6
We know from A2 that for \(t \ge 0\), closed set \(\mathbb {X}\), and all \(x\in \partial \mathbb {X}\), it follows that \(|{\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}})| + |{\tilde{F}}_{\lambda }(t,x,\overrightarrow{\alpha },{\tilde{u}})| \le c(t)(1+{\tilde{x}}_{m}),\) where \({\tilde{x}}_{m} = \sup \{|(x,\overrightarrow{\alpha })|: (x,\overrightarrow{\alpha })\in \partial \mathbb {X}\times \partial \mathbb {A}\}\). Taking the supremum over \({\tilde{u}}\in {\tilde{U}}\) yields the assumption A6 with \(q(t) = c(t)(1+x_{m})\).
-
A7
By the structure of \({\tilde{F}}_{\lambda }\), the assumption A7 readily follows.
-
A8
The idea of A8 is to show that for a point \(({\bar{x}},\overrightarrow{\alpha })\in \partial \mathbb {X}\times \partial \mathbb {A}\), there exists a trajectory that lies in the interior of \(\mathbb {X}\times \mathbb {A}\) such that \(({\bar{x}},\overrightarrow{\alpha })\) is reached. Prior to problem (14)–(15), we assumed that the dynamics \({\dot{x}} = a(t,x)+b(t,x)u\) are controllable (which satisfies A8 for (5)). Now we need to show that the controllability property holds also for \({\dot{x}} ={\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}})\). Let us note that \(\varPhi (t,x,\overrightarrow{\alpha },\cdot )\) is a closed map (in the fourth variable). Since \(\varPhi \) is a closed map in \({\tilde{u}}\) and the original dynamics are controllable, there exists a control \({\tilde{u}}\) such that one can reach some final position \(x_{f}\in \partial \mathbb {X}\) with a trajectory in the interior of \(\mathbb {X}\) whose dynamics are governed by \({\tilde{f}}\).
Having established the assumptions A1-A8, it suffices to show that the value function \({\hat{V}}(t,x,\overrightarrow{\alpha }) =\int ^{\infty }_{t}{\tilde{l}}(s,x,\overrightarrow{\alpha },{\hat{u}})\) \(e^{-\lambda s}ds\) has the following properties: (i) \({\hat{V}}\) is lower semicontinuous, (ii) Dom \({\hat{V}}(t,\cdot ,\cdot ) \ne \emptyset \) for all large \(t>t_{0}\), and (iii)
We know that \({\tilde{l}}\) is at least lower semicontinuous in \(t,\; x \; \text { and }\overrightarrow{\alpha }\) and thus (i) is satisfied (e.g., see Remark 4.1). Similarly, since \({\tilde{l}}\) is bounded and defined in all necessary arguments, we have (ii). For (iii),
where \({\tilde{l}}_{+} = \sup \limits _{(t,x,\overrightarrow{\alpha })}|{\tilde{l}}(t,x,\overrightarrow{\alpha },{\hat{u}})|\). It is clear that the RHS converges to zero as \(t\rightarrow \infty \) and thus we establish (iii). From here, we apply Proposition 2 with \((1) \implies (2)\) and have the value function \({\hat{V}}\) as the unique viscosity solution.
Existence and Uniqueness of \({\hat{u}}\)
Existence and uniqueness of \({\hat{u}}\) depends on the structure of \({\tilde{H}}\) and \({\tilde{U}}(t,x,\overrightarrow{\alpha })\). First we show continuity and strict convexity of \({\tilde{H}}\) in \({\tilde{u}}\). Both of these conditions are connected to the continuity and convexity of \(\varPhi \) in \({\tilde{u}}\). We note that function \(\varPhi \) is affine in \({\tilde{u}}\) for all \((t,\,x,\,\overrightarrow{\alpha })\) and thus continuous and convex in \({\tilde{u}}\). By definitions of \({\tilde{F}}_{\lambda }\) and \({\tilde{f}}\), we know that \({\tilde{F}}_{\lambda }\) is strictly convex and \({\tilde{f}}\) convex in \({\tilde{u}}\). Both functions are also continuous in \({\tilde{u}}\). As a result, \({\tilde{H}}\) is continuous and strictly convex in \({\tilde{u}}\). When \(K_{o} =1\), compactness and convexity of \({\tilde{U}}(t,x,\overrightarrow{\alpha })\) can be established (see ([23], Lemma 1)) and \({\tilde{H}}\) admits a unique minimizer \({\hat{u}}\). This concludes the proof. \(\square \)
Remark 4.3
Note that the classical necessary and sufficient conditions for existence of solutions to (5) without constraints can be expressed through PP ([15, pp. 102–103]) and HJB ([15, pp. 165]), respectively. We stress that the presented existence and uniqueness results are applied to problem (14)–(15) rather than problem (5).
5 Solving the Original Problem
5.1 Convergence to Original Infinite-Horizon Problem
Up to this point, we have analyzed (5). Now we show how to exploit (5) to solve the infinite-horizon version of (3), that is, we set \(t_f = \infty \) and \(E(x(t_f),x_f) \equiv 0\) in (3). Recall that it is possible that non-unique trajectory/control pairs \(\{({\tilde{x}},{\tilde{u}})\}\) exist that solve infinite-horizon (3). Given \((x_{\lambda },u_{\lambda })\) as a solution of (5) as guaranteed by Theorem 4.1, we establish the conditions on \((x_{\lambda },u_{\lambda })\) to be a sufficient approximation for a solution of infinite-horizon (3).
Proposition 1
Consider the set of cost functions \(\{J_{\infty }: \mathbb {X} \times U(t) \times [0,\infty ) \rightarrow \mathbb {R}\}\), where \(J_{\infty }(x,u,\lambda ) = \int ^{\infty }_{t_{0}} l(s,x(s),u(s))\) \(e^{-\lambda s}ds\) with dynamics and constraints from (5) and \(J_{\infty }(x,u,0) = \int ^{\infty }_{t_{0}} l(s,x(s),u(s))ds\) (i.e., the cost function for the infinite-horizon variant of (3)). Let \((x_{\lambda },u_{\lambda })\) be the associated trajectory/control pair that minimizes \(J_{\infty }(x,u,\lambda )\) and l(t, x, u) satisfy the conditions posed in Sect. 3. Then \(J_{\infty }(x,u,\lambda )\xrightarrow {\varGamma }J_{\infty }(x,u,0)\) and there exists a trajectory/control pair \((x^{*}, u^{*})\) such that \((x_{\lambda }, u_{\lambda })\rightarrow (x^{*}, u^{*})\) uniformly and \((x^{*},u^{*})\) solves the infinite-horizon problem (3).
Proof
To show \(\varGamma -\)convergence of \(J(x,u,\lambda )\), we consider the standard arguments based on Definition 2.2. Since \(e^{-\lambda t} \le 1\) for \(\lambda \in [0,\infty )\), we know
and thus
From here, it is clear that
for some sequence \((x_{1,\epsilon }(t), u_{1,\epsilon }(t))\rightarrow (x(t),u(t))\) uniformly. To show the second inequality, we give the following argument. Given \(\lambda _{n}\rightarrow 0\), (x, u) such that \(J(x,u,0)< \infty \), and \((x_{n}, u_{n})\rightarrow (x,u)\) uniformly, then since l is nonnegative and continuous, we have for \(t_{0}< T < \infty \)
The uniform convergence of \((x_{n}, u_{n})\) to (x, u) on \([t_{0}, T]\), \(\lambda _{n}\rightarrow 0\) and the continuity of l, yield
which means that
Then with \(T\rightarrow \infty \), one has
Third, we show that there exists a compact set C with \( \inf \limits _{(x, u)} J(x, u,\epsilon ) = \inf \limits _{(x, u)\in C} J(x,u,\epsilon )\) as \(\epsilon > 0\). We note that, by the properties of the function l(t, x(t), u(t)) and the set \(\mathbb {X}\times U(t)\), \(J(x,u,\epsilon )\) is a bounded function. Thus for each \(\epsilon > 0\), we can always find such a set C.
Owing to the previous part of the proof and since \({\dot{x}} = a(t,x) + b(t,x)u\) is controllable, there exists a pair \((x^{*}, u^{*})\) such that \(\int ^{\infty }_{t_0}l(s, x^{*}(s), u^{*}(s)) ds\) is finite and minimal. By Theorem 4.1, for \(\lambda > 0\), there exists an optimal trajectory/control pair \((x_{\lambda }, u_{\lambda })\) that solves the corresponding optimal control problem (5) with the corresponding \(\lambda \) value.
Since \(J(x,u,\lambda )\xrightarrow {\varGamma }J(x,u,0)\) and each pair \((x_{\lambda }, u_{\lambda })\) minimizes \(J(x,u,\lambda )\) for a given \(\lambda \), we reach \((x_{\lambda }, u_{\lambda })\rightarrow (x^{*}, u^{*})\) [11, Corollary 7.24]. \(\square \)
Remark 5.1
One can interpret Proposition 1 as the result of when \(\lambda \rightarrow 0\) in the sense that we are determining the efficacy of \((x_{\lambda },u_{\lambda })\) when the cost function lacks the exponential decay in time (i.e., \(\lambda = 0\)).
Remark 5.2
Our approach is similar to the vanishing discount method of ergodic control theory. In [3] and [4], one requires constructing a Lyapunov function to show precompactness of the family of cost functions \(\mathbb {J} = \{J_{\infty }(\cdot ,\cdot ,\lambda )\}\). In short, one must rely on the dynamics of the problem (typically a stochastic differential equation [9]) to obtain convergence for \(\mathbb {J}\) and thus \(\{(x_{\lambda }, u_{\lambda })\}\). In our results, once Theorem 4.1 is applied, convergence only depends on the properties of \(\mathbb {J}\). With respect to state constrained problems [17], additional assumptions are required on the constraints. In this paper, state constraints are handled via the Valentine transformation and the results of Theorem 4.1, rather than with \(\varGamma \)-convergence. It should be noted that \(\varGamma \)-convergence can be used independently of any control framework, i.e., the behavior of a family functionals and their respective minimizers.
5.2 Approximation of Original Finite-Time Problem
For many practical problems, finite-horizon formulations are of interest, that is, \(t_f\) is finite and \(E(\cdot ) \not \equiv 0\) in (3). Essentially, we wish to form a relation between the infinite-time horizon problem (5) and the finite-time version of problem (3). Consider the new running cost functional \(l'(t,x,u,\lambda )\) defined as
and thus
We choose \(\gamma (t,u,\lambda )\) such that Theorem 4.1 holds and Proposition 1 is satisfied with the necessary modifications (e.g., restriction of one’s interest to the interval \([t_0,t_f]\)). For example, we use the expression \(\gamma (u,\lambda ) = \lambda e^{\lambda t_f}(u^{T}R_{s}u+1)\), where \(R_s\) is a positive definite matrix. In general, the function \(\gamma \) is not unique, so its choice is to personal preference. In order to assess the proposed approximation numerically, we consider the function
with \(({\hat{x}}, {\hat{u}})\) as the trajectory/control pair which minimizes \(J_{\infty }\). The function \(E_{D}\) acts as a (relative) error function between the given cost function J and the infinite-horizon approximation \(J_{\infty }\).
6 Examples
Suppose we have a bounded and convex set \(\mathbb {X}\) as defined before. We want an autonomous agent to reach a final state \(x_f \in \mathbb {X}\) starting from an initial state \(x_0 \in \mathbb {X}\) while minimizing some cost and staying within \(\mathbb {X}\). In addition, \(\mathbb {X}\) may contain obstacles that the autonomous agent must avoid and there may be constraints imposed on the motion (e.g., related to agent path, velocity, or acceleration) and performance of the autonomous agent (e.g., time required to traverse \(\mathbb {X},\) fuel usage). This section deals with numerical results regarding computation of solutions for such optimal control problems using DIDO. We introduce three path planning examples corresponding to constrained LQR, full nonlinear UAV dynamics, and dynamics related to the AUV. In addition we include finite-time applications and calculations of the suitable error function \(E_{D}\). For computation purposes, we use MATLAB R2018b on a PC using Windows 10, equipped with a 2.10GHz Intel(R) Xeon(R) Silver 4110 CPU.
6.1 Optimal Path Planning for a LQR Type Problem with Constraints
Let us consider the following LQR problem
subject to
We define \(J_{\infty }(t_{0}, x, u, \lambda )\!:=\! \int ^{\infty }_{t_{0}}(x(s)^{T}Qx(s)\!+\!u(s)^{T}Ru(s))e^{-\lambda s}ds\) and use this in the definition of \(E_{D}\).
We select the following values: \(A = [0\; 1; 2\; -1]\), \(B = [0; 1]\), \(Q = [1\;\; 0; 0\;\; 0.25]\), \(R = 0.5\), \(x_{0} = [4;4]\), and \(x_{f} = [0;0]\). We also include constraints \(U(t) = \{u(t) \in \mathbb {R}: -20 \le u(t) \le 20\}\), \(h_{1}(x(t)) = (x_{1}(t) - 2)^{2} + (x_{2}(t)-1)^{2}\), \(h_{2}(x(t)) =(\frac{x_{1}(t) - 7}{2})^{2} + (x_{2}(t)-1)^{2}\), \(h^{L} = [1.1; 1.1]\).
We notice from Fig. 1.a that the given trajectory satisfies the imposed state constraints and, from Fig. 1.b, we see that the control signal is confined to \([-20, 20]\). The somewhat unusual path in Fig. 1 stems from the dynamics of the problem given by (28)–(29). We include values comparing infinite- and finite-horizon LQR with obstacles. For the finite-time problem, let us choose \(t_f = 6\)s as the transient duration is about 6.9s. In addition, we consider \(R_s = 0.01\) in our choice for \(\gamma \). For infinite-time problem, we calculate \(J_{\infty }\) directly with \(l'(t,x(t),u(t),\lambda )\) as defined in (26). Resulting calculations are found in Table 1. As demonstrated, the smaller the value of \(\lambda ,\) the more accurate approximation is obtained, which is in accordance with Sect. 5.2.
6.2 Optimal Path Planning for a Fully Nonlinear Model of an AUV in 3D with Obstacles
With respect to the more common notation of rigid-body models found in literature, this example uses \(\omega \) for the state and \(\tau \) for control u. Consider the problem
subject to
with \(\omega = [\omega _{s}; \omega _{r}]\) and \(\tau = [\tau _{s}; \tau _{r}]\) where \(\omega _{s}=[x; y; z] \in \mathbb {R}^{3}\) represents the translation states, \(\omega _{r}=[\rho ; \theta ; \psi ] \in [-\pi , \pi ]^{3}\) represents the rotation states (measured in radians) and controls \(\tau _{s},\; \tau _{r}\in \mathbb {R}^{3}\). We define m as the mass of the AUV, \(3\times 3\) matrices \(D_{s},\; D_{r}\) are damping matrices, and \(I_{CG}\) is the inertia matrix which is symmetric and positive definite of the form
We also include constraints
We define constraints (32) in the form \(I^{-1}(h_{L})h(x_{s}) \ge I^{-1}(h_{L})h_{L}\) for ease of numerical computation and similarly do so for the remaining example. Furthermore, \(h_{L} = [1.7^{2}; 1.7^{2}; 2.7^{2}; 2.7^{2}]\) is understood with \(d_{i} = 0.7\) (0.5 accounts for the radius of AUV, 0.2 gives additional buffing from obstacles; all values are given in meters). Define spaces \(\mathbb {X} = [0,10]^{3}\times [-\pi , \pi ]^3\) and \(\mathbb {U}(t) = [-50, 50]^{6}\). We set values \(Q = \text { diag}\{{\textbf {1}}_{3\times 1}; {{\textbf {50}}}_{3\times 1}\}\), \(R = \text { diag}\{1;10;10;1; 5; 10\}\), \(m = 10,\) \(I_{x} = I_{y} = 3.45\), \(I_{xz} = I_{zx} = -1.28\times 10^{-15}\), \(I_{z} = 2.2\), \(I_{xy} = I_{yx} = I_{yz} = I_{zy} = 0\), \(D_{s} = \text { diag}\{0.1; 5; 5\}\), \(D_{r} = \text { diag}\{5; 5; 0.1\}\), \(\omega _{0} = [{{\textbf {10}}}_{3\times 1};{{\textbf {0}}}_{3\times 1}]\), and \(\omega _{f}= {{\textbf {0}}}_{6\times 1}\). The values for matrix \(I_{CG}\) are taken from the specifications in [10].
For finite-time problem approximation, we choose \(t_{f} = 10\) and also take into consideration the number of algorithm nodes (\(a_{n}\)) used for calculation with DIDO. In DIDO, \(a_{n}\) relates to the discretization of the time interval \([t_{0},\, t_{f}]\) for computation of cost function \(J(t_0, t_{f}, \omega ,\tau )\). Since model (30)–(31) has \(E(x(t_f),x_{f}) = 0\), the choice of function \(\gamma \) is unnecessary. However, for completeness, we take \(\gamma (t, u,\lambda )= \lambda e^{\lambda t_f}(u^{T}R_{s}u+1)\) with matrix \(R_{s} = 0.02I_{6}.\)
Simulation results for the path planing problem in 3D with 4 obstacles for an AUV are shown in Figs. 2 and 3. As we observe in Fig. 2, as \(\lambda \) approaches 0, both translations and rotations converge to the resulting finite-time behavior. This behavior is also reflected in Table 2 with respect to \(E_{D}\) for both \(a_{n} = 50\) and \(a_{n} = 20\).
6.3 Optimal Path Planning for a Fully Nonlinear Model of an UAV in 3D with Obstacles
Previously, we considered examples where uniqueness of control could be established, i.e., \(K_o = 1\). Now, we look at example in which that is not guaranteed. Consider the quadrotor example derived from [14]:
subject to
Here \(\omega = [\omega _{s}; \omega _{v}; \omega _{r}; \omega _{rv}]\) with \(\omega _{s}\) as defined in the previous example, \(\omega _{v}\in \mathbb {R}^{3}\) as the translational velocities, \(\omega _{r}\in [-1,1]^{9}\) as the rotational states, \(\omega _{rv} = [{\dot{\rho }}; {\dot{\theta }}; {\dot{\psi }}]\in \mathbb {R}^{3}\) as angular velocities and control \(\tau \in \mathbb {R}^{4}\). Furthermore, g is acceleration due to gravity (in \(m/s^2\)), \(M_{\omega _{r}}\) is a \(3\times 4\) matrix with columns \(M_{\omega _{r}}(:,i) = [\omega _{r}]_{3}e_{3}\) for \(i \in \{1,2,3,4\}\), \(e_{3} = [0;0;1]\) and \(\varLambda \) is \(3\times 4\) matrix of the form
with d as the distance from the center of mass to the center of each rotor (in \(xy-\)plane) and nonzero constant c.
The given constraints for this example are
We take some time to explain the structure of \(\omega _{r}\) dynamics. In usual quadrotor models, the rotation states - pitch, roll, and yaw - are represented in a standard rotation matrix \(R(\rho , \theta , \psi )\). Unfortunately, the modeling dynamics for \((\rho , \theta , \psi )\) in this form is quite complicated. To work around this, we instead work with a quaternion variant of this rotation matrix. This allows the rotation dynamics to be written as linear system in the new rotation states - see \(\omega _{r}\) dynamics. To recover \(\rho \), \(\theta \), and \(\psi \), we can use a suitable transformation that relates to \(\omega _{r}\). For more information, see ([12], Sections 2.2.3–2.2.4).
In addition, we relate the dimension of the control for the quadrotor to that of the AUV. In standard literature, systems are modeled with six control inputs. We note that in the previous example, \(\tau \) has six components whereas quadrotor has four inputs. To coincide with the norm, we notice that we can define a new control \(\tau _{new} = [\tau _{s}; \tau _{r}] \in \mathbb {R}^{6}\), where \(\tau _{s}=M_{b}\tau \) and \(\tau _{r} = \varLambda \tau \).
Define spaces \(\mathbb {X} = [0,10]^{3}\times [-20, 20]^3\times [-1, 1]^9\times [-20\pi , 20\pi ]^3\) and \(\mathbb {U}(t) = [-50, 50]^{4}\). We set values \(m = 4.34 (kg)\), \(g = 9.8 (m/s^{2})\), \(d = 0.315 (m)\), \(c = 8.004\times 10^{-4} (m)\), \(Q_{f} = I_{3}\), \(Q = \text { diag}\{{{\textbf {0}}.{\textbf {1}}}_{3\times 1}; {{\textbf {1}}}_{3\times 1};\) \({{\textbf {0}}}_{9\times 1}; {{\textbf {0}}.{\textbf {5}}}_{3\times 1}\}\), \(R = I({\textbf {0}}.{\textbf {5}}_{4\times 1})\), \(I_{x} = 2\times 10^{-3}\), \(I_{y} = 8.45\times 10^{-2}\), \(I_{z} = 0.1377\), \(I_{xz} = I_{zx} = I_{xy} = I_{yx} = I_{yz} = I_{zy} = 0\), \(\omega _{s0} = [0; 10; 10]\), \(\omega _{sf}= [10;0;0]\), \(\omega _{v0} = \omega _{vf}= {\textbf {0}}_{3\times 1}\), \([\omega _{r0}]_{3} = [\omega _{rf}]_{3} = I_{3}\), and \(\omega _{rv0} = \omega _{rvf} ={\textbf {0}}_{3\times 1}\).
For finite-time problem approximation, we choose \(t_{f} = 4\). Unlike the previous examples, here we have a nonzero endpoint function. Thus, we choose matrix \(R_{s} = 0.2I_{4}\) and notice behavior (Fig. 4 and Table 3) similar to the previous example. We note that model (33)–(35) has \(K_{o} = 2\) (see Sect. 4.1). As thus, uniqueness of \({\tilde{u}}\) cannot be guaranteed for any of the \(\lambda -\)approximations of (33)–(35). For more details, see Remark 4.2.
7 Conclusions and Future Work
We have extended uniqueness and existence results from [23] for a class of optimal control problem with constraints employing results for the HJB equation and Pontryagin’s Principle as well as the use of \(\varGamma \)-convergence.
The principal research avenue is to investigate stability and robustness of the solutions obtained via the proposed framework. In other words, we do not want to a priori assume that the trajectories live in a bounded set, but the trajectory boundedness for initial conditions of interest ought to be obtained via the proposed optimization. Also, we plan to extend current results to a coverage problem for autonomous agents with sensors such as cameras, sonars or lidars. We may also consider dynamic obstacles (e.g., agents that intersect with floating debris or local wildlife).
Data Availability
We do not analyze or generate datasets owing to the nature of this work. The code could be shared upon a reasonable request.
References
Bardi, M., Dolcetta, I.C.: Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations. Birkhäuser, Basel (2008)
Basco, V., Frankowska, H.: Hamilton–Jacobi–Bellman equations with time-measurable data and infinite horizon. Nonlinear Differ. Equ. Appl. 26, 7 (2019)
Borkar, V., Budhiraja, A.: Ergodic control for constrained diffusions: characterization using HJB equations. SIAM J. Control. Optim. 43(4), 1467–1492 (2004)
Borkar, V., Budhiraja, A.: HJB equations for ergodic control problems for constrained diffusions in polyhedral domains. In: IEEE Conference on Decision and Control, pp. 6668–6672 (2005)
Bressan, A., Piccoli, B.: Introduction to the Mathematical Theory of Control. American Institute of Mathematical Sciences, New York (2007)
Busoniu, L., de Bruin, T., Tolić, D., Kober, J., Palunko, I.: Reinforcement learning for control: performance, stability, and deep approximators. Annu. Rev. Control. 46, 8–28 (2018)
Camacho, E.F., Bordons, C.: Model Predictive Control. Springer, London (2007)
Cannarsa, P., Sinestrari, C.: Semiconcave Functions, Hamilton–Jacobi Equations, and Optimal Control. Birkhäuser, Basel (2004)
Campillo, F.: Optimal ergodic control of nonlinear stochastic systems. In: Kree, P., Wedig, W. (eds.) Probabilistic Methods in Applied Physics, vol. 451, pp. 239–269. Springer Berlin Heidelberg, Berlin (1995)
Chen, J., Sun, C., Zhang, A.: Autonomous navigation for adaptive unmanned underwater vehicles using fiducial markers. In: IEEE International Conference on Robotics and Automation, pp. 9298–9304 (2021)
Dal Maso, G.: An Introduction to \(\Gamma \)-Convergence. Birkhäuser, Basel (1993)
Fossen, T.I.: Handbook of Marine Craft Hydrodynamics and Motion Control. Wiley, New York (2011)
Gorges, D.: Relations between model predictive control and reinforcement learning. IFAC-PapersOnLine 50(1), 4920–4928 (2017)
Lee, T., Leok, M., McClamroch, N.H.: Geometric tracking control of a quadrotor UAV on \(SE(3)\). In: IEEE Conference on Decision and Control, pp. 5420–5425 (2010)
Liberzon, D.: Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton University, Princeton (2012)
Marsh, H. C., Karpenko, M., Gong Q.: A pseudospectral approach to high index DAE optimal control problems. arXiv:1811.12582 (2018)
Picarelli, A.: On Some Stochastic Control Problems with State Constraints. Ecole polytechnique X (2015)
Redfern, D.A., Goh, C.J.: Feedback control of state constrained optimal control problems. In: Doležal, J., Fidler, J. (eds.) System Modelling and Optimization. IFIP—The International Federation for Information Processing, pp. 442–449. Springer, Boston (1996)
Ross, I.M.: A Primer on Pontryagin’s Principle in Optimal Control. Collegiate Publishers, London (2015)
Ross, I. M.: Enhancement to the DIDO optimal control toolbox. arXiv:2004.13112 (2020)
Ross, I.M., Gong, Q., Fahroo, F., Kang, W.: Practical stabilization through real-time optimal control. American Control Conference, pp. 304–309 (2006)
Valentine, F.A.: The problem of Lagrange with differential inequalities as added side conditions. In: Giorgi, G., Kjeldsen, T. (eds.) Traces and Emergence of Nonlinear Programming, pp. 313–357. Birkhäuser, Basel (2014)
Weston, J., Tolić, D., Palunko, I.: Mixed use of Pontryagin’s principle and the Hamilton-Jacobi-Bellman equation in infinite- and finite-horizon constrained optimal control. In: Petrović, I., Menegatti, E., Marković, I. (eds.) Intelligent Autonomous Systems 17. IAS 2022. Lecture Notes in Networks and Systems, vol. 577, pp. 167–185 (2023)
Acknowledgements
This work is supported by SeaClear, European Union’s Horizon 2020 research and innovation programme under grant agreement No. 871295, and SeaClear2.0, European Union’s Horizon Europe innovation action programme under grant agreement No. 101093822. The authors also thank the reviewers for their insights and suggestions for the improvement of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Communicated by Fausto Gozzi.
Appendix
Appendix
In order to make this article as self-contained as possible and for the reader’s convenience, we establish some additional notation needed for this section. Let \(B_{r}(x)\) be the closed ball in \(\mathbb {R}^{k}\) of radius \(r > 0\) centered at \(x\in \mathbb {R}^{k}\). For a nonempty set \(C \subset \mathbb {R}^{k}\), we denote its convex hull by co(C).The negative polar cone of C is defined as \(C^{-}:= \{p \in \mathbb {R}^{k}: \langle p,c\rangle \le 0 \text { for all } c\in C\}\). Denote the set distance from \(x\in \mathbb {R}^{k}\) to C by \(d_{C}(x):= \inf \{|x-y|:\; y\in C\}\).
For a nonempty set \(C \subset \mathbb {R}^{k}\), collection of nonempty subsets \(\{C_{w}\}_{w\in C_{2}}\), and \(w_{0}\in C_{2}\), let
and
For \(x \in {\overline{C}} \) define \(T_{C}(x):= \{v\in \mathbb {R}^{k}: \liminf \limits _{h\rightarrow 0^{+}}\;d_{A}(x+hv)/h = 0 \}\) and \(N_{C}(x):= \text {Lim} \sup \limits _{y\rightarrow C^{x}}T_{C}(y)^{-}\). Finally, denote the epigraph of a function \(\varphi \) as epi \(\varphi \).
Below we state the necessary assumptions and proposition required for the proof of Theorem 4.1.
-
A1
For all \(x \in \mathbb {R}^{n}\), the mappings \({\tilde{f}}(\cdot , x, \cdot )\) and \({\tilde{F}}(\cdot , x, \cdot )\) are Lebesgue-Borel measurable and there exists \(\phi \in L^{1}([0,\infty );\mathbb {R})\) such that \({\tilde{F}}(t,x,u)\ge \phi (t)\) for a.e. \(t \ge 0\) and all \((x,u) \in \mathbb {R}^{n}\times \mathbb {R}^{m}\).
-
A2
There exists a function \(c \in L^{1}_{loc}([0,\infty ); \mathbb {R}^{+})\) such that for a.e. \(t \ge 0\) and for all \(x \in \mathbb {R}^{n},\; u \in U(t)\)
$$\begin{aligned} |{\tilde{f}}(t,x,u)| + |{\tilde{F}}(t,x,u)| \le c(t)(1+|x|). \end{aligned}$$ -
A3
For a.e. \(t \ge 0\) and all \(x \in \mathbb {R}^{n}\), the set-valued map
$$\begin{aligned} y \rightarrow \{({\tilde{f}}(t,y,u), {\tilde{F}}(t,y,u)): u\in U(t)\} \end{aligned}$$is continuous with closed images, and the set
$$\begin{aligned} \{({\tilde{f}}(t,x,u), {\tilde{F}}(t,x,u)+r): u \in U(t),\; r\ge 0 \} \end{aligned}$$is convex.
-
A4
There exists a function \(k \in L^{1}_{loc}([0,\infty ); \mathbb {R}^{+})\) such that for a.e. \(t\ge 0\) and for all \(x,y\in \mathbb {R}^{n},\; u \in U(t)\): \(|{\tilde{f}}(t,x,u) - {\tilde{f}}(t,y,u)| + |{\tilde{F}}(t,x,u)-{\tilde{F}}(t,y,u)| \le k(t)|x-y|.\)
-
A5
\(k\in {\mathcal {L}}_{loc}\) and \(\limsup _{t\rightarrow \infty } \frac{1}{t}\int ^{t}_{0}(c(s)+k(s))ds < \infty .\)
-
A6
There exists a function \(q\in {\mathcal {L}}_{loc}\) such that for a.e. \(t \ge 0\)
$$\begin{aligned} \sup \limits _{u\in U(t)}(|{\tilde{f}}(t,x,u)|+|{\tilde{F}}(t,x,u)|) \le q(t),\; \text {for all } x\in \partial C, \end{aligned}$$where \(C \subset \mathbb {R}^{n}\) is closed.
-
A7
Let \(V: [0,\infty )\times C \rightarrow \mathbb {R}\cup \{\pm \infty \}\) be the value function of the respective HJB equation with endpoint conditions. Then assume Dom \(V \ne \varnothing \) and there exist \(T > 0\) and \(\psi \in L^{1}([T,\infty ); \mathbb {R}^{+})\) such that for all \((t_{0}, x_{0}) \in \text {Dom }V \cap ([T,\infty )\times \mathbb {R}^{n})\) and any feasible trajectory-control pair \((x(\cdot ), u(\cdot ))\) on \([t_{0}, \infty ),\) with \(x(t_{0}) = x_{0},\)
$$\begin{aligned} |{\tilde{F}}(t, x(t),u(t))| \le \psi (t)\text { for a.e. } t\ge t_{0}. \end{aligned}$$ -
A8
There exist \(\eta ,\; r > 0\) and \(M \ge 0\) such that for a.e. \(t > 0\) and any \(y\in \partial C + \eta B_{1}({{\textbf {0}}}),\) and any \(v \in {\tilde{f}}(t,y,U(t)),\) with \(\inf \limits _{n\in N^{1}_{y,\eta }}\langle n,v \rangle \le 0,\) we can find \(w \in {\tilde{f}}(t,y,U(t))\cap B_{M}(v)\) satisfying
$$\begin{aligned} \inf _{n\in N^{1}_{y,\eta }}\{\langle n,w\rangle ,\; \langle n, w-v \rangle \}\ge r, \end{aligned}$$where \(N^{1}_{y,\eta }:= \{n \in \partial B_{1}({{\textbf {0}}}): n \in {\overline{co}}N_{C}(x),\; x \in \partial C\cap B_{\eta }(y)\}.\)
Proposition 2
(Theorem 3.3, [2]) Let assumptions A1-A8 be satisfied and let \(W:[0,\infty ) \times C \rightarrow \mathbb {R}\cup \{+\infty \}\) be a lower semicontinuous function such that \(\text {Dom }V(t,\cdot )\subset \text {Dom }W(t,\cdot ) \ne \emptyset \) for all large \(t > 0\) and
Then the following statements are equivalent:
-
1.
W = V;
-
2.
W is a weak (or viscosity) solution of HJB equation on \((0,\infty )\times C\) and \(t \rightarrow \text {epi }W(t,\cdot )\) is locally absolutely continuous.
In addition, V is the unique weak solution satisfying (36) with locally absolutely continuous \(t \rightarrow \text {epi }V(t,\cdot )\).
If we consider \({\tilde{F}}(t,x,u) = {\tilde{l}}(t,x,u)e^{-\lambda t}\) for \(\lambda > 0\) and \({\tilde{l}}\) bounded, this result holds with V as the unique locally Lipschitz continuous solution [2, Prop 5.7].
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Weston, J., Tolić, D. & Palunko, I. Application of Hamilton–Jacobi–Bellman Equation/Pontryagin’s Principle for Constrained Optimal Control. J Optim Theory Appl 200, 437–462 (2024). https://doi.org/10.1007/s10957-023-02364-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-023-02364-4