On infinite-time nonlinear quadratic optimal control

Vasilios Manousiouthakis

Available online at www.sciencedirect.com Systems & Control Letters 51 (2004) 259 – 268 www.elsevier.com/locate/sysconle On in nite-time nonlinear quadratic optimal control Yue Chena , Thomas Edgarb , Vasilios Manousiouthakisa;∗;1 a Chemical b Chemical Engineering Department, UCLA, Los Angeles, CA 90095-1592, USA Engineering Department, University of Texas at Austin, TX 78712-1062, USA Received 21 November 2002; received in revised form 17 July 2003; accepted 11 August 2003 Abstract This work presents an approximate solution method for the in nite-time nonlinear quadratic optimal control problem. The method is applicable to a large class of nonlinear systems and involves solving a Riccati equation and a series of algebraic equations. The conditions for uniqueness and stability of the resulting feedback policy are established. It is shown that the proposed approximation method is useful in determining the region in which the constrained and unconstrained optimal control problems are identical. A reactor control problem is used to illustrate the method. c 2003 Elsevier B.V. All rights reserved. Keywords: Nonlinear; Optimal control; HJB equation; Approximate solution; Constraints 1. Introduction The in nite-time nonlinear optimal control problem has been the subject of intense research e orts for a long time. Al’Brecht [1] rst considered this problem for analytic objective functions and analytic systems. He demonstrated that the optimal control could be obtained in the form of a power series, the terms of which could be sequentially obtained through solution of a quadratic optimal control problem for the linearized system and subsequent solution of a series of linear di erential equations. He was also able to establish the convergence of this power series, for single input systems of the form ẋ = f(x) + Bu. Lee and Markus, [12, p. 299] again employed the aforementioned analyticity assumption to establish that the analytic feedback controller u=u∗ (x), ∞ ∗ stabilizing the system ẋ = f(x; u), giving rise to the nite objective function J (x0 ; u ) = 0 G(x; u∗ (x)) dt, and satisfying the functional equation (@J=@x)(x; u∗ )(@f=@u)(x; u∗ (x)) + (@G=@u)(x; u∗ (x)) = 0 near the origin, is unique and optimal. Lukes, a student of Markus, later relaxed the analyticity condition to second-order di erentiability, in [13]. Werner and Cruz [18] considered an optimally adaptive control problem, expanded the optimal control as a Taylor series and proposed a method to identify the coecients through solution of a series of linear di erential equations. Garrard [7], presented a small perturbation procedure, to identify sub-optimal control laws, as power series of , for systems of the form ẋ = Ax + (; x) + Bu. He also demonstrated that a k-order truncation of the power series provides a (2k + 1)-order approximation of the optimal control. Nishikawa et al. [15] established a similar result for time varying systems using an induction proof. Corresponding author. Tel.: +1-3102060300; fax: +1-3102064207. E-mail addresses: yue@seas.ucla.edu (Y. Chen), tfedgar@austin.utexas.edu (T. Edgar), vasilios@ucla.edu (V. Manousiouthakis). 1 Also can be corresponded to. ∗ c 2003 Elsevier B.V. All rights reserved. 0167-6911/$ - see front matter doi:10.1016/j.sysconle.2003.08.006 260 Y. Chen et al. / Systems & Control Letters 51 (2004) 259 – 268 Later, in [8], Garrard and Jordanstudied systems of the form ẋ=Ax+’(x)+Bu; ’(x) polynomial. For objective ∞ functions of the form J (x0 ; u)= 0 (xT (t)Qx(t)+uT (t)Ru(t)) dt they were able to employ the Hamilton–Jacobi equation [12, p. 348] to express the optimal control in terms of a power series of the value function. For a ight control application involving a third-order aircraft model they were able to evaluate up to the third-order terms of this power series, using a general procedure involving sequential solution of a Riccati equation and a number of linear algebraic equations. Halme and Hamalainen [9] studied the same problem with Garrard and Jordan and presented a solution based on an integral equation formulation of the two-point boundary value problem arising from the necessary conditions of optimality. Freeman and Kokotovic [5,6], employed inverse optimality to establish that control Lyapunov functions are solutions of Hamilton–Jacobi equations associated with sensible cost functionals. In [16], Saridis and Lee proposed a recursive algorithm that was shown to converge to the optimum control law. In [2,3] Beard et al. proposed a Galerkin-based approximation for the solution of a so-called general HJB equation. They reduced the HJB equation to a sequence of linear partial di erential equations that they then approximated using the Galerkin spectral method, and established regions of convergence and stability for their solution method. In this same paper, a comprehensive review is given of HJB solution methodologies based on Taylor series. The need for solution of a linear partial di erential equation for the evaluation of terms beyond third order, and the diculties associated with estimating the regions of closed loop stability and power series convergence, are listed as major shortcomings of these techniques. More recently, Manousiouthakis and Chmielewski [14] employed an inverse optimality framework to provide an exact solution for an appropriately de ned constrained in nite-time nonlinear optimal control problem. This paper is organized as follows. In Section 2, the in nite-time nonlinear quadratic optimal control problem is presented and a Taylor series based solution method is discussed. Conditions are then established that help identify regions of stability for the approximate optimal control strategies. In Section 3, the technique is used to evaluate the region in which the constrained and unconstrained in nite-time optimal control problems have the same solution. Throughout the work, the method is illustrated on a chemical reactor control problem. 2. Unconstrained in nite-time nonlinear quadratic optimal control In this section, we consider the unconstrained in nite-time nonlinear quadratic optimal control(ITNQOC) problem described by ∞ V () = inf (x(t)T Q(x(t))x(t) + u(t)T R(x(t))u(t)) dt; x;u 0 s:t: ẋ(t) = f(x(t)) + g(x(t))u(t); x(0) = ; (1) where x(t) ∈ Rn ; u(t) ∈ Rm ; t ∈ [0; ∞). Throughout this work, the following assumptions are employed: (A1) (A2) (A3) (A4) (A5) f(0) = 0; g(0) = 0; n Q(·); R(·); R−1 (·); f(·); g(·); V (·) are analytic, in nitely di erentiable functions ∞ in R ; n V1 (x) = 0; V0 (x) = 0; ∀x ∈ R , where the power series of V (x) is V (x) = i=0 Vi (x); (1) admits an optimal control; ∞ Q0 (x) = 0; Q1 (x) = 0 ∀x ∈ Rn , where the power series of xT Q(x)x is xT Q(x)x = i=0 Qi (x). Under (A4), the optimal feedback control input is @V (x) u(x) = − 12 R−1 (x)gT (x) ; (2) @x where V (·) is the value function of (1) that satis es the Hamilton–Jacobi–Bellman (HJB) equation [11, p. 418] T T 1 @V (x) @V (x) @V (x) T x Q(x)x + (3) = 0: f(x) − g(x)R−1 (x)gT (x) @x 4 @x @x 261 Y. Chen et al. / Systems & Control Letters 51 (2004) 259 – 268 Based on assumptions (A1) – (A5), the functions (·)T Q(·)(·); G(·) = 14 g(·)R−1 (·)g(·)T ; f(·); V (·) can be expanded into a power (Taylor) series around the origin, i.e. ∞ ∞ ∞ xT Q(x)x = Qi (x); G(x) = Gi (x); f(x) = fi (x); f0 (x) = 0; (4) i=2 V (x) = ∞ i=0 Vi (x); V0 (x) = 0; i=1 V1 (x) = 0; (5) i=2 where Qi (x); Gi (x); fi (x); Vi (x) denote scalar or matrix ith-order polynomials, as appropriate. As an illustration, for a two-dimensional system, the third-order term in (5) can be written as: V3 (x) = V30 x13 + V21 x12 x2 + V12 x1 x22 + V03 x23 . Substituting (4) and (5) into the HJB equation (3), results in   ∞ T  ∞ T  ∞ ∞ ∞ ∞ @Vi (x) @V (x) @V (x) i k   (6) Qi (x) + = 0: fj (x) − Gj (x) @x @x @x i=2 i=2 i=2 j=1 j=0 k=2 To make (6) hold for all x, it is necessary and sucient that terms of any order be zero. Considering that (A3) holds, the zeroth-order and the rst-order terms are automatically satis ed. In turn this implies: • second-order term of (6): T T @V2 (x) @V2 (x) @V2 (x) (7) Q2 (x) + = 0; (f1 (x)) − (G0 (x)) @x @x @x • ‘th-order (‘ ¿ 3) term of (6):    T ‘−i ‘ @V‘−i−j+2 (x)   @Vi (x) f‘−i+1 (x) − Q‘ (x) + = 0: Gj (x) @x @x (8) j=0 i=2 Isolating the highest-order term @V‘ (x)=@x in (8), then yields T @V‘ (x) @V2 (x) f1 (x) − 2G0 (x) @x @x    T ‘−1 ‘−i @V (x) @V (x) ‘−i−j+2 i   f‘−i+1 (x)− = − Q‘ (x)− Gj (x) @x @x i=3 j=0    T ‘−2 @V (x) @V (x) ‘−j 2  f‘−1 (x) − − Gj (x) @x @x j=1 ∧ = S‘ (x): (9) The structure of Eqs. (7) and (9) can best be appreciated by considering the optimal control problem: ∞ ˙ = Ãx̃(t) + B̃ũ(t); x̃(0) = ; T Ṽ = inf (x̃(t)T Q̃x̃(t) + ũ(t)T R̃ũ(t)) dt; s:t: x̃(t) (10) x;˜ ũ 0 n where x̃(t) ∈ R ; ũ(t) ∈ Rm ∀t ∈ [0; ∞); Ṽ ; Q̃; R̃ are constant symmetric matrices such that Q̃ = Q(0); R̃ = R(0), and xT Ṽ x = V2 (x); xT Q̃x = Q2 (x) ∀x ∈ Rn (11) and Ã; B̃ are constant matrices of appropriate dimensions such that Ãx = f1 (x) ∀x ∈ Rn ; B̃ = g(0): (12) 262 Y. Chen et al. / Systems & Control Letters 51 (2004) 259 – 268 Considering the additional assumption: (A6) Q̃ ¿ 0; R̃ ¿ 0; (Ã; B̃) controllable, the optimal control policy for (10) exists, and is stabilizing, unique and equal to: ũ(x̃) = −R̃−1 B̃T Ṽ x̃, where Ṽ is the unique positive de nite solution of the Riccati equation (16). Indeed, based on (10)–(12), it holds that @V2 (x̃) = 2Ṽ x̃; @x̃ (13) G0 (x̃) = 14 g(0)R−1 (0)g(0)T = 41 B̃R̃−1 B̃T : (14) Thus, substituting (11)–(14) into (7), we obtain x̃T (Q̃ + Ṽ Ã + ÃT Ṽ − Ṽ B̃R̃−1 B̃T Ṽ )x̃ = 0: (15) Since (15) must hold for all x̃ = 0, it is equivalent to (16) Q̃ + Ṽ Ã + ÃT Ṽ − Ṽ B̃R̃−1 B̃T Ṽ = 0: (16) This implies that (7) is equivalent to the Riccati equation corresponding to the optimal control problem (10) for the linearization of (1). This same linearization (10) also plays an important role in (9). Indeed, it is important to note that remains the same for all ‘, and that it is equal to the f1 (x) − 2G0 (x)(@V2 (x)=@x) = (Ã − B̃R̃−1 B̃T Ṽ )x = Ax vector eld that determines the closed-loop dynamics for the linearization of (1), with constant objective function weights, i.e. problem (10). Examining (9) closely reveals that both its left- and right-hand sides are polynomials of degree ‘. Thus, equating the coecients of corresponding terms in (9), gives rise to a set of linear equations in terms of the coecients of V‘ (x), which is denoted as ‘ • V‘ = S‘ ; (17) V‘ where ‘ denotes a known square matrix whose elements are known linear functions of the entries of A; is a vector whose elements are all the coecients of V‘ (x) and S‘ is a known vector whose entries depend on the coecients of the value function polynomials Vi (x) of order i lower than ‘. The following theorem can then be stated. Theorem 1. If (A1) – (A6) hold, then the HJB equation (3) admits a unique solution V (x) ¿ 0 ∀x ∈ Rn −{0}; that is a positive de nite function near the origin. Furthermore, the coecients of the power series expansion of this solution can be identi ed through rst solution of (7) and then sequential solution of (9) (or (17)) for ever increasing values of ‘. Finally, if the Taylor series of the value function V (x) is truncated at the ‘-order term, then, the (‘ − 1)-order terms of the obtained solution for this ‘-order approximation are the same as the (‘ − 1)-order terms of both the obtained solution for the (‘ − 1)-order approximation and of the exact solution of the HJB equation. Proof. We rst establish that (9) admits a unique solution. To that purpose, we show that V‘ (x) is unique for any ‘. We proceed by contradiction. Let ‘ be the lowest order such that Ṽ ‘ (x); Ṽ˜ ‘ (x) are two solutions of (9) that are di erent at least at some point x̃0 ∈ Rn . Clearly, ‘ must be greater than or equal to 3 since V2 (x) is unique based on (A6), (13) and (16). It then holds T T @(Ṽ ‘ (x) − Ṽ˜ ‘ (x)) @(Ṽ ‘ (x) − Ṽ˜ ‘ (x)) @V2 (x) = 0: f1 (x) − 2G0 (x) =0⇔ Ax @x @x @x 263 Y. Chen et al. / Systems & Control Letters 51 (2004) 259 – 268 Table 1 The number of coecients of the ‘th-order term of the value function (V‘ (x)) ‘ n 1 2 3 4 5 4 10 30 50 4 10 30 50 10 55 455 1275 20 220 4960 22,100 35 715 40,920 292,825 56 2002 278,256 3,162,510 Applying the above equation to the linearized closed-loop system trajectory x̃(t) starting at x̃0 (see (10), with = x̃0 ), then, yields T @(Ṽ ‘ (x̃) − Ṽ˜ ‘ (x̃)) d x̃ d(Ṽ ‘ (x̃(t)) − Ṽ˜ ‘ (x̃(t)) =0⇔ = 0 ⇔ Ṽ ‘ (x̃(t)) − Ṽ˜ ‘ (x̃(t)) = constant ∀t: @x̃ dt dt However, the linearized closed-loop system d x̃=dt = A x̃; x̃(0) = x̃0 is asymptotically stable, because of A6. Therefore, limt→∞ x̃(t) =0 and thus the above constant is zero, since limt→∞ [Ṽ ‘ (x̃(t))− Ṽ˜ ‘ (x̃(t))]= Ṽ ‘ (0)− Ṽ˜ ‘ (0) = 0. Thus, Ṽ ‘ (x̃(t)) = Ṽ˜ ‘ (x̃(t)) ⇒ Ṽ ‘ (x̃0 ) = Ṽ˜ ‘ (x̃0 ) which is a contradiction. To establish positive de niteness of V (x) near the origin, we recollect that under assumptions (A1) – (A6), solving the Riccati equation (16) leads to a positive de nite solution Ṽ . Since all higher order terms of V (x) can be neglected in a small enough neighborhood of the origin, V (x) is a positive de nite function near the origin. As stated above, V2 (x) is unique based on (A6), (13) and (16). Then, the coecients of V3 (x) can be solved for uniquely using (17). Carrying on this procedure iteratively until (‘ − 1) order, the coecients of V3 (x) through V‘−1 (x) can be solved for uniquely based on (17). Based on the above iterative solution procedure, it is also easy to verify that if the value function V (x) is truncated with an ‘-order polynomial, it is sucient to just expand f(x); G(x) and xT Q(x)x in (4) up to at most an ‘-order polynomial, since any terms of f(x); G(x) and Q(x) higher than ‘-order will not appear in (9). In order to assess the growth of computational complexity with the order of approximation and the dimension of the system, we calculate the total number of variables involved in all terms and in the ‘-order term of the value function. The number of coecients involved in the ‘-order term of the value function (i.e. the K ‘−1 dimension of V‘ in (17)) is N‘ = i=1 Cin Ci−1 ; K = min(n; ‘). N‘ is tabulated for some n, ‘ in Table 1, where the row index is n and the column index is ‘. The total number of coecients involved K in the rst ‘ terms of the value function (i.e. e ectively from 2 to ‘ since V0 (x) = 0, V1 (x) = 0) is N = i=0 Cin Ci‘ . Having outlined a systematic method for the computation of the ‘th-order term of the value function (V‘ (x)), we now proceed to quantify the closed-loop stability region for the associated control law u‘ (x) = − 12 R−1 (x)gT (x) ‘ @Vi (x) @x i=2 (18) : To establish stability, we rst de ne the following two sets. De nition 1. D‘ is the set in which ‘ T ‘ T ‘ @Vi (x) @Vi (x) 1 @Vi (x) −1 T ¡0 f(x) − g(x)R (x)g (x) @x 2 @x @x i=2 i=2 i=2 264 Y. Chen et al. / Systems & Control Letters 51 (2004) 259 – 268 and ‘ ∀x = 0: Vi (x) ¿ 0 i=2 De nition 2. The a-level set of the ‘th-order approximately optimal value function is the connected set that contains the origin and is de ned as Da‘ − {0} = {x ∈ Rn | 0 ¡ ‘ Vi (x) 6 a; a ¿ 0}: i=2 Then the following holds. Theorem 2. If assumptions (A1) – (A6) hold, then the feedback control given by the above approximation method is asymptotically stable for all initial conditions in the set Da‘ as long as Da‘ ⊂ D‘ . Proof. Employing the ‘-order approximate control u‘ (x)=− 21 R−1 (x)gT (x) rise to the following closed-loop system: ‘ @Vi (x) −1 T 1 ẋ = f(x) − 2 g(x)R (x)g (x) : @x ‘ i=2 @Vi (x)=@x (Eq. (18)) gives (19) i=2 ‘ ‘ Vi (x) along the trajectories of (19), denoted as i=2 V̇ i (x), is given by T ‘ T ‘ ‘ ‘ @Vi (x) @Vi (x) 1 @Vi (x) −1 T : V̇ i (x) = f(x) − g(x)R (x)g (x) @x 2 @x @x The derivative of i=2 i=2 i=2 i=2 (20) i=2 ‘ ‘ ‘ But then i=2 Vi (0) = 0, i=2 Vi (x) ¿ 0 in Da‘ − {0}, i=2 V̇ i (x) ¡ 0 in Da‘ − {0}. Since the origin is an equilibrium point of (19) and Da‘ contains the origin, application of Theorem 3.1 [10, p. 100] yields that the origin is asymptotically stable with Da‘ as a region of attraction [10, p. 109]. The proposed approximation procedure and associated stability results are illustrated on a chemical reactor control problem. Example 1 (Manousiouthakis and Chmielewski [14]). Consider a continuously stirred tank reactor (CSTR) governed by the following system of equations: 0:02 −0:01x12 − 0:338x1 + 0:02x2 ẋ1 + u: (21) = 0 ẋ2 0:05x12 + 0:159x1 − 0:03x2 The performance to be optimized is in the form of (1) with weight matrices 10 0 Q= ; R = 1: 0 1 First, the positive de nite solution of the Riccati equation (16) is identi ed. The resulting second-order approximate value function and corresponding control are V2 (x) = 19:6527x12 + 21:6336x1 x2 + 23:0978x22 ; (22) u 2 (x) = −0:3931x1 − 0:2163x2 : (23) 265 Y. Chen et al. / Systems & Control Letters 51 (2004) 259 – 268 0.2 3 x1 0.2 x2 0 u 0.1 2.5 − 0.2 0 − 0.4 2 −0.1 − 0.6 1.5 − 0.8 −0.2 −1 1 −0.3 −1.2 0.5 −0.4 −1.4 −1.6 0 100 200 300 0 0 100 200 300 −0.5 0 100 200 300 Fig. 1. Closed-loop simulations for fth-order approximate nonlinear optimal controller (initial condition x1 = −1:5, x2 = 3). Then sequential solution of a series of algebraic equations allows computation of any required order approximate value function and control. The fth-order approximate value function and corresponding control are 5 Vi (x) = 19:6527x12 + 21:6336x1 x2 + 23:0978x22 + 1:1399x13 + 3:1079x12 x2 + 0:3018x1 x22 i=2 +0:05256x23 + 0:0833x14 − 0:0295x13 x2 − 0:00164x12 x22 − 0:00134x1 x23 − 0:0003x24 −0:0029x15 + 8:07 × 10−5 x14 x2 − 8:1210−5 x13 x22 − 3:63 × 10−5 x12 x23 +0:304 × 10−5 x1 x24 + 0:86 × 10−6 x25 ; (24) u5 (x) = −0:3931x1 − 0:2163x2 − 0:0342x12 − 0:0622x1 x2 − 3:018 × 10−3 x22 −0:00333x13 + 0:000885x12 x2 + 0:0000328x1 x22 + 0:0000135x23 +1:45 × 10−4 x14 − 0:32 × 10−5 x13 x2 + 0:24 × 10−5 x12 x22 +0:727 × 10−6 x1 x23 − 0:304 × 10−7 x24 : (25) Closed-loop simulations under the fth-order approximately optimal nonlinear controller (Eq. (25)) are shown in Fig. 1. It can be seen that the closed-loop system is driven to the origin through use of this controller. Based on Theorem 2, stability regions (Da‘ ) are identi ed in Fig. 2 for these approximately optimal control strategies. Furthermore, as discussed in Theorem 1, the (‘ − 1)-order coecients obtained for the (‘ − 1)-order approximation are the same as the (‘ − 1)-order coecients for the ‘-order approximation. 3. Constrained in nite-time nonlinear optimal control In this section, we consider the constrained in nite-time nonlinear quadratic optimal control problem (CITNQOC): ∞ () = inf (x(t)T Q(x(t))x(t) + u(t)T R(x(t))u(t)) dt x;u s:t: 0 ẋ(t) = f(x(t)) + g(x(t))u(t); x(0) = ; Ci (x(t); u(t)) 6 0; i = 1; · · · ; p ∀t ¿ 0: (26) 266 Y. Chen et al. / Systems & Control Letters 51 (2004) 259 – 268 15 2-order, a=978.14 3-order, a=467.23 4-order, a=1892 5-order, a=2279.4 10 5 0 −5 −10 −15 −20 −25 −15 −10 −5 0 5 10 15 20 Fig. 2. Stability Regions Da‘ for di erent orders of approximation. Let u(x) denote the optimal feedback control law (Eq. (2)) of the unconstrained problem (ITNQOC) de ned by (1), while u‘ (x) (Eq. (18)) denotes the ‘th-order approximate solution of ITNQOC obtained by the approximate method given earlier. Similarly, x(t; ) indicates the trajectory of the unconstrained system with control u(x) given by (2), and initial state x(0) = , while x‘ (t; ) denotes the trajectory of the unconstrained system with the ‘th order approximate optimal control u‘ (x), and initial state x(0) = . Then the following sets are de ned. De nition 3. The set of initial conditions such that the unconstrained solution violates no constraints is de ned as O∞ = { ∈ Rn | Ci (x(t; ); u(x(t; ))) 6 0 ∀t ¿ 0}: De nition 4. The set of initial conditions such that the unconstrained approximate solution violates no constraints is de ned as ‘ O∞ = { ∈ Rn | Ci (x‘ (t; ); u‘ (x‘ (t; ))) 6 0 ∀t ¿ 0}: De nition 5. The set of initial conditions which satisfy the constraints is de ned as O0 ={ ∈ Rn |Ci(; u())60}. It is shown in [4,14,17] that () ¡ ∞ in problem (26) implies that there exists a suciently large but nite T such that the optimum solution satis es x∗ (T ) ∈ O∞ for system (26). So, the CITNQOC problem can be converted to a constrained nite-time quadratic nonlinear optimal control (CFTNQOC) problem and an unconstrained in nite-time optimal control problem. Thus, when the solution of the CITNQOC problem enters O∞ (or a subset of O∞ ), the CITNQOC problem can be solved as an unconstrained problem with initial state x∗ (T ). For the proposed approximate method of unconstrained nonlinear optimal control and the aforementioned CITNQOC problem, the following then holds: Theorem 3. If for some a ¿ 0, Da‘ ⊂ (O0 ∩D‘ ), where D‘ and Da‘ are the sets de ned in Section 2 (De nitions ‘ 1 and 2), then Da‘ ⊂ O∞ . 267 8 6 4 2 0 −2 −4 −6 −8 -5 0 x1 5 4-order approximate solution −5 0 x1 5 8 6 4 2 0 −2 −4 −6 −8 8 6 4 2 0 −2 −4 −6 −8 5 (a) O∞2 ~ O ∞ 8 3,4,5 - order 3-order approximate solution 6 2-order 4 -5 0 x1 5 2 x2 2-order approximate solution x2 8 6 4 2 0 −2 −4 −6 −8 x2 x2 x2 Y. Chen et al. / Systems & Control Letters 51 (2004) 259 – 268 0 5-order approximate solution −2 −4 −5 0 x1 5 −6 −4 −3 −2 -1 0 x1 1 2 3 4 (b) Comparison of O ∞l ‘ for di erent orders of approximation and comparison. Fig. 3. O∞ ‘ ‘ ‘ Proof. Let z ∈ Da‘ ⊂ D‘ . Then i=2 Vi (z) 6 a, i=2 Vi (z) ¿ 0 and i=2 Vi (z) ¡ 0 from Theorem 2. This ‘ ‘ implies that x‘ (t; z) will remain in Da for all t ¿ 0. Since Da ⊂ O0 , this implies that the constraints will be ‘ satis ed from then on and thus z ∈ O∞ . ‘ Remark 1. The above theorem provides a way to identify a subset of O∞ . However, the approximate HJB ‘ solution methodology outlined above can also be employed to identify O∞ and, as the order of approximation ‘ increases, O∞ itself (since O∞ converges to O∞ as ‘ → ∞). This fact is illustrated in Example 2. Example 2 (Manousiouthakis and Chmielewski [14]). Consider the same CSTR model as Example 1 with the addition of the inequality constraints: −1:59 6 x1 6 0:16; −4:21 6 x2 ; −10 6 u 6 10. ‘ ‘ We are interested in identifying O∞ and in demonstrating that as ‘ increases, O∞ become indistinguishable from one another. The area surrounded by the dashed lines in Fig. 3(a) is the intersection of the constraint sets imposed on the system states. The constraint on the control input does not appear in these two gures since it is far from the surrounded area in the scale of these gures. 2 It is observed in Fig. 3(b) that the constraint satisfaction region O∞ for the controller corresponding to the 3 second-order approximation of the optimal value function is di erent than the region O∞ corresponding to ‘ the third-order approximation. However, O∞ begins to converge for approximations with order higher than 3. 3 4 5 The di erences among the regions O∞ , O∞ , O∞ , corresponding to the third-, fourth- and fth-order approximations, are practically indistinguishable. This observation is important when dealing with the CITNQOC problem. It implies that, for this example, we can use a low order value function approximation to accurately approximate the set O∞ corresponding to the actual nonlinear model (1), and the optimal nonlinear feedback law (2). Of course, there is no guarantee that, for other nonlinear systems, a low-order approximation will work as well. 4. Conclusions In this paper, the unconstrained in nite time nonlinear quadratic optimal control problem is studied for a general class of nonlinear systems. A power series-based approximation method is proposed to solve the associated HJB equation. The method involves solution of the Riccati equation for the linearized problem, 268 Y. Chen et al. / Systems & Control Letters 51 (2004) 259 – 268 followed by sequential solution of a series of linear algebraic equations. Uniqueness of the solution and a region of stability are established. The constrained in nite time nonlinear quadratic optimal control problem is also studied. The aforementioned HJB approximation method is employed to establish regions in which the constrained and unconstrained optimal control problems have identical solutions. An example is employed throughout this work to illustrate the proposed approximation method, and to demonstrate its use in identifying constraint satisfying regions as well as regions of stability for the approximate optimal feedback laws obtained. References [1] E.G. Al’Brekht, On the optimal stabilization of nonlinear systems, J. Appl. Math. Mech. (PMM) 25 (1961) 836–844 (in Russian). [2] R.W. Beard, G.N. Saridis, J.T. Wen, Galerkin approximation of the generalized Hamilton–Jacobi–Bellman equation, Automatica 33 (1997) 2159–2177. [3] R.W. Beard, G.N. Saridis, J.T. Wen, Approximation solutions to the time-invariant Hamilton–Jacobi–Bellman equation, J. Optim. Theory Appl. 96 (1998) 589–626. [4] D. Chmielewski, V. Manousiouthakis, Constrained in nite-time quadratic optimal control: the linear stochastic and nonlinear deterministic cases, Proceedings of American Control Conference, Philadelphia, PA, 1998, pp. 2093–2097. [5] R.A. Freeman, P.V. Kokotovic, Optimal nonlinear controllers for feedback linearizable systems, Proceedings of the American Control Conference, Seattle, WA, 1995, pp. 2722–2726. [6] F.A. Freeman, P.V. Kokotovic, Robust Nonlinear Control Design: State-space and Lyapunov Techniques, Birkhauser, Boston, 1996. [7] W.L. Garrard, Additional result on sub-optimal feedback control of non-linear systems, Internat. J. Control 10 (1969) 657–663. [8] W.L. Garrard, J.M. Jordan, Design of nonlinear automatic ight control systems, Automatica 13 (1977) 497–505. [9] A. Halme, R.P. Hamalainen, On the nonlinear regulator problem, J. Optim. Theory Appl. 16 (1975) 255–275. [10] H.K. Khalil, Nonlinear Systems, 2nd Edition, Prentice-Hall, Englewood Cli s, NJ, 1996. [11] D. Kirk, Optimal Control Theory: an Introduction, Prentice-Hall, Englewood Cli s, NJ, 1970. [12] E.B. Lee, L. Markus, Foundations of Optimal Control Theory, Wiley, New York, 1967. [13] D.L. Lukes, Optimal regulation of nonlinear dynamical systems, SIAM J. Control Optim. 7 (1969) 75–100. [14] V. Manousiouthakis, D.J. Chmielewski, On constrained in nite-time nonlinear optimal control, Chem. Eng. Sci. 57 (2002) 105–114. [15] Y. Nishikawa, N. Sannomiya, H. Itakura, A method for suboptimal design of nonlinear feedback systems, Automatica 7 (1971) 703–712. [16] G.N. Saridis, C.G. Lee, An approximation theory of optimal control for trainable manipulators, IEEE Trans. Automatic Control 13 (1968) 621–629. [17] M. Sznaier, J. Cloutier, Receding horizon control Lyapunov function approach to suboptimal regulation of nonlinear systems, J. Guidance Control Dyn. 23 (2000) 399–405. [18] R.A. Werner, J.B. Cruz, Feedback control which preserves optimality for systems with unknown parameters, IEEE Trans. Autom. Control 13 (1968) 621–629.

RELATED PAPERS

RELATED TOPICS

Log In

On infinite-time nonlinear quadratic optimal control

On infinite-time nonlinear quadratic optimal control