This document discusses developing near-optimal state feedback controllers for nonlinear discrete-time systems using iterative approximate dynamic programming (ADP) algorithms. Specifically:
1) An infinite-horizon optimal state feedback controller is developed for discrete-time systems based on the dual heuristic programming (DHP) algorithm.
2) A new optimal control scheme is developed using the generalized DHP (GDHP) algorithm and a discounted cost functional.
3) An infinite-horizon optimal stabilizing state feedback controller is designed based on the globalized dual heuristic programming (GHJB) algorithm.
4) Finite-horizon optimal controllers with an ε-error bound are proposed, where the number of optimal control steps can be determined
This document discusses uncertainty propagation techniques for determining statistics of model outputs given uncertain model inputs. It covers analytic approaches for linear models, perturbation methods for nonlinear models, and direct sampling methods. It also discusses computing moments using stochastic spectral methods like stochastic Galerkin with polynomial chaos. The document provides an example of applying perturbation and sampling methods to a nonlinear oscillator model with uncertain parameters. It compares the results from both approaches to the true natural frequency. Finally, it discusses uncertainty quantification for a HIV model and the use of prediction intervals in nuclear power plant design.
The document discusses techniques for uncertainty propagation and constructing surrogate models. It describes Monte Carlo sampling, analytic techniques, and perturbation techniques for propagating uncertainties in nonlinear models. It also discusses constructing surrogate models such as polynomial, Kriging, and Gaussian process models to approximate computationally expensive discretized partial differential equation models for applications such as Bayesian calibration and design. The document provides an example of constructing a quadratic surrogate model to approximate the response of a heat equation model.
This chapter discusses discrete image transforms. It introduces linear transformations and unitary transforms. The discrete Fourier transform (DFT) and discrete cosine transform (DCT) are presented as examples of unitary transforms. The DFT represents an image as a sum of sinusoidal basis images, while the DCT uses cosine basis images. Other transforms discussed include the discrete sine transform (DST), Hartley transform, and Hadamard transform. Orthogonal transforms preserve image properties while changing the representation basis.
This document discusses the emergence of chimera states, a unique collective state where coherent and incoherent dynamics coexist, in a system of non-locally coupled phase oscillators with propagation delays. It presents the complex Ginzburg-Landau equation (CGLE) model of reaction-diffusion systems and derives a phase reduction with propagation time delays. Numerical simulations and solutions to the self-consistency equation reveal chimera cluster states induced by the time delays.
This document discusses the emergence of chimera states, a unique collective state where coherent and incoherent dynamics coexist, in a system of non-locally coupled phase oscillators with propagation delays. It presents the complex Ginzburg-Landau equation (CGLE) model of reaction-diffusion systems and derives a phase reduction with propagation time delays. Numerical simulations and solutions to the self-consistency equation validate the existence of chimera clusters induced by time delays.
- The document discusses estimating structured vector autoregressive (VAR) models from time series data.
- A VAR model of order d is defined as xt = A1xt-1 + ... + Adxt-d + εt, where xt is a p-dimensional time series, Ak are parameter matrices, and εt is noise.
- The document proposes regularizing the VAR model estimation problem to promote structured sparsity in the parameter matrices Ak. This involves transforming the model into a linear regression form and applying group lasso or fused lasso regularization.
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
Deep learning models with millions or billions of parameters should overfit according to classical theory, but they do not. The emerging theory of double descent seeks to explain why larger neural networks can generalize well. Random matrix theory provides a tractable framework to model double descent through random feature models, where the number of random features controls model capacity. In the high-dimensional limit, the test error of random feature regression exhibits a double descent shape that can be computed analytically.
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
ICML 2021 tutorial on random matrix theory and machine learning.
Part 3 covers: 1. Motivation: Average-case versus worst-case in high dimensions 2. Algorithm halting times (runtimes) 3. Outlook
Random Matrix Theory and Machine Learning - Part 1Fabian Pedregosa
This document provides an introduction to random matrix theory and its applications in machine learning. It discusses several classical random matrix ensembles like the Gaussian Orthogonal Ensemble (GOE) and Wishart ensemble. These ensembles are used to model phenomena in fields like number theory, physics, and machine learning. Specifically, the GOE is used to model Hamiltonians of heavy nuclei, while the Wishart ensemble relates to the Hessian of least squares problems. The tutorial will cover applications of random matrix theory to analyzing loss landscapes, numerical algorithms, and the generalization properties of machine learning models.
This document summarizes several papers on principal component analysis (PCA) with network/graph constraints. It discusses graph-Laplacian PCA (gLPCA) which adds a graph smoothness regularization term to standard PCA. It also covers robust graph-Laplacian PCA (RgLPCA) which uses an L2,1 norm and iterative algorithms. Further, it summarizes robust PCA on graphs which learns the product of principal directions and components while assuming smoothness on this product. Finally, it discusses manifold regularized matrix factorization (MMF) which imposes orthonormal constraints on principal directions.
Control of Discrete-Time Piecewise Affine Probabilistic Systems using Reachab...Leo Asselborn
This presentation proposes an algorithmic approach to
synthesize stabilizing control laws for discrete-time piecewise
affine probabilistic (PWAP) systems based on computations of
probabilistic reachable sets. The considered class of systems
contains probabilistic components (with Gaussian distribution)
modeling additive disturbances and state initialization. The
probabilistic reachable state sets contain all states that are
reachable with a given confidence level under the effect of
time-variant control laws. The control synthesis uses principles
of the ellipsoidal calculus, and it considers that the system
parametrization depends on the partition of the state space. The
proposed algorithm uses LMI-constrained semi-definite programming
(SDP) problems to compute stabilizing controllers,
while polytopic input constraints and transitions between regions
of the state space are considered. The formulation of
the SDP is adopted from a previous work in [1] for switched
systems, in which the switching of the continuous dynamics
is triggered by a discrete input variable. Here, as opposed
to [1], the switching occurs autonomously and an algorithmic
procedure is suggested to synthesis a stabilizing controller. An
example for illustration is included.
Stochastic Alternating Direction Method of MultipliersTaiji Suzuki
This document discusses stochastic optimization methods for solving regularized learning problems with structured regularization and large datasets. It proposes applying the alternating direction method of multipliers (ADMM) in a stochastic manner. Specifically, it introduces two stochastic ADMM methods for online data: RDA-ADMM, which extends regularized dual averaging with ADMM updates; and OPG-ADMM, which extends online proximal gradient descent with ADMM updates. These methods allow the regularization term to be optimized in batches, resolving computational difficulties, while the loss is optimized online using only a small number of samples per iteration.
This document summarizes key concepts from Chapter 3 of the source text. It begins by deriving a closed system of averaged hydrodynamic equations to describe turbulent flows in multicomponent media. These equations account for factors like compressibility, heat and mass transfer, and chemical reactions. The document then discusses choosing an appropriate averaging operator for the equations. It argues that using weighted-mean averaging simplified the form and analysis of the equations for compressible, variable density media like those found in astrophysics. This approach defines weighted means for some fluctuating quantities based on the local density. The document aims to establish a general phenomenological theory of turbulent transport in multicomponent mixtures at the first-order closure level.
This document summarizes the history and development of telesurgery and surgical robotics. It discusses how early Greek surgeons began using basic surgical tools to separate the surgeon's hands from the patient. The development of laparoscopic surgery in the 20th century further reduced surgical trauma by moving the surgeon's hands outside of the body. Surgical robotics in the 1980s allowed procedures to be performed remotely using telemanipulators with a master console separated from slave robotic arms. A key milestone was the first transatlantic telesurgical procedure in 2001. While challenges remain, telesurgery holds promise to improve access to surgical expertise worldwide.
This document provides examples of early representations of technical objects from prehistoric, ancient, and medieval times. Some key points:
- Early representations served artistic and documentary purposes rather than technical design. They provide insights into prehistoric life and ancient manufacturing techniques.
- Examples shown include a prehistoric engraving of plowing, an ancient Egyptian pottery workshop, and a medieval drawing combining top and side views of water pipes.
- Representations of technical objects evolved over time but continued to prioritize documentation over systematic technical design through the Middle Ages.
This document provides an overview of a book on value stream design for lean manufacturing. It discusses:
1) The book aims to illustrate the effectiveness of the value stream method for production design and optimization. It presents the method systematically and expands on traditional representations, guidelines, and application areas.
2) The value stream method visualizes the entire value chain and supports optimal production design through application of design guidelines. The book introduces eight consecutive design guidelines and expands traditional product family formation.
3) The book provides examples from industrial projects and aims to demonstrate how the value stream method can be applied to complex productions beyond its original use in the automotive industry. It is addressed to corporate managers and production planners.
This document summarizes key concepts from Sobolev spaces and their applications in mechanics problems. It introduces Sobolev spaces Wm,p(Ω) whose norms involve integrals of function and derivative values. These spaces allow generalized notions of derivatives. Sobolev's imbedding theorem establishes continuity properties of mappings between Sobolev and other function spaces. These properties are important for analyzing mechanical models that involve elements in Sobolev spaces.
This document discusses molecular parameters that have prognostic or predictive value in colorectal cancer. It summarizes that microsatellite instability (MSI) confers a favorable prognosis while chromosome instability (CIN) indicates an unfavorable prognosis. Certain gene mutations like KRAS and BRAF also have prognostic or predictive significance. Molecular testing for markers like methylated DNA in stool samples shows potential as a noninvasive screening method for early detection of colorectal cancer.
The document discusses potential future works building upon the research presented in the book. It proposes:
1) Developing a wide-field of view imaging system with polarization sensitivity that could enable applications like security, robot navigation, and endoscopy.
2) Creating a high angular resolution imaging system with polarization sensitivity that could provide more accurate polarization measurements and material classification in real-time.
3) Enhancing navigation algorithms by computing distance traveled in addition to direction from polarized light, allowing egocentric navigation computations on-chip.
This chapter introduces chamber systems, which are derived from the set of maximal flags (chambers) of a geometry. A chamber system is defined as a set of chambers with equivalence relations indexed by types that represent adjacency between chambers. The chapter shows that every geometry gives rise to a chamber system, but not every chamber system comes from a geometry. It explores properties of chamber systems like residues, homomorphisms, and quotients. The main result is a correspondence between residually connected chamber systems and geometries.
The feedback-control-for-distributed-systemsCemal Ardil
The document summarizes a study on feedback control synthesis for distributed systems. The study proposes a zone control approach, where the state space is partitioned into zones defined by observable points. Control actions are piecewise constant functions that only change when the system transitions between zones. An optimization problem is formulated to determine the optimal constant control value for each zone. Gradient formulas are derived to solve this using numerical optimization methods. The zone control approach was tested on heat exchanger process control problems and showed more robust performance than alternative methods.
To give the complete list of Uq(sl2)-actions of the quantum plane, we first obtain the structure of quantum plane automorphisms. Then we introduce some special symbolic matrices to classify the series of actions using the weights. There are uncountably many isomorphism classes of the symmetries. We give the classical limit of the above actions.
The document provides an outline for a course on quantum mechanics. It discusses key topics like the time-dependent Schrodinger equation, eigenvalues and eigenfunctions, boundary conditions for wave functions, and applications like the particle in a box model. Specific solutions to the Schrodinger equation are explored for stationary states with definite energy, including the wave function for a free particle and the quantization of energy for a particle confined to a one-dimensional box.
An Approach For Solving Nonlinear Programming ProblemsMary Montoya
This document presents an approach for solving nonlinear programming problems using measure theory. It begins by transforming a nonlinear programming problem into an optimal control problem by treating the variables as time-varying and integrating the objective and constraint functions. It then solves the optimal control problem using measure theory by representing the control-trajectory pair as a positive Radon measure on the space of trajectories and controls. Finally, it shows that the optimal solution to the transformed optimal control problem provides an approximate optimal solution to the original nonlinear programming problem.
The document discusses quantum mechanical concepts including:
1) The time derivative of the momentum expectation value satisfies an equation involving the potential gradient.
2) For an infinite potential well, the kinetic energy expectation value is proportional to n^2/a^2 and the potential energy expectation value vanishes.
3) Eigenfunctions of an eigenvalue problem under certain boundary conditions correspond to positive eigenvalues that are sums of squares of integer multiples of pi.
1. The document provides solutions to homework problems involving partial differential equations.
2. Problem 1 solves the wave equation utt = c2uxx using d'Alembert's formula to find the solution u(x,t).
3. Problem 2 proves that if the initial conditions φ and ψ are odd functions, then the solution u(x,t) is also an odd function.
Solution to schrodinger equation with dirac comb potential slides
This document summarizes solving the Schrödinger equation for a Dirac comb potential. The potential is an infinite series of Dirac delta functions spaced periodically. Floquet theory is used to solve the time-independent Schrödinger equation for this potential. Boundary conditions are applied and the resulting equations are solved graphically. Allowed energy bands are determined and plotted versus wave vector for both attractive and repulsive delta function potentials.
This document provides an overview of the Kalman filter, including its derivation and applications. It begins with an example of using a Kalman filter to estimate the position and velocity of a truck moving on rails. It then presents the general setup of the Kalman filter for estimating a stochastic dynamic system based on noisy observations. The derivation shows how to recursively calculate the optimal estimate and its error covariance. Several numerical examples are provided to illustrate Kalman filtering, including estimating a constant voltage, tracking a vehicle's position in 1D and 2D.
A current perspectives of corrected operator splitting (os) for systemsAlexander Decker
This document discusses operator splitting methods for solving systems of convection-diffusion equations. It begins by introducing operator splitting, where the time evolution is split into separate steps for convection and diffusion. While efficient, operator splitting can produce significant errors near shocks.
The document then examines the nonlinear error mechanism that causes issues for operator splitting near shocks. When a shock develops in the convection step, it introduces a local linearization that neglects self-sharpening effects. This leads to splitting errors.
To address this, the document discusses corrected operator splitting, which uses the wave structure from the convection step to identify where nonlinear splitting errors occur. Terms are added to the diffusion step to compensate for
This document contains 13 examples of exercises related to control systems. The examples involve tasks such as deriving state space models, bringing feedback control systems into generalized standard form, designing controllers, computing norms, and parametrizing stabilizing controllers. The examples cover topics including disturbance decoupling, observer design, state feedback, and coprime factorizations.
1) The document discusses using the residue theorem to evaluate a complex contour integral to calculate the Laplace transform of the output of an ideal sampler. This provides a closed-form solution that is less painful than the infinite series form.
2) An ideal sampler can be modeled as a carrier signal modulated by the input signal. The output of the sampler is then sent to a zero-order hold.
3) By choosing an appropriate contour, the complex contour integral can be evaluated using the residue theorem. This gives the Laplace transform of the sampler output in terms of the residues of the integrand's poles.
Robust Control of Uncertain Switched Linear Systems based on Stochastic Reach...Leo Asselborn
This presentation proposes an approach to algorithmically synthesize control strategies for
set-to-set transitions of uncertain discrete-time switched linear systems based on a combination
of tree search and reachable set computations in a stochastic setting. For given Gaussian
distributions of the initial states and disturbances, state sets wich are reachable to a chosen
confidence level under the effect of time-variant hybrid control laws are computed by using
principles of the ellipsoidal calculus. The proposed algorithm iterates over sequences of the
discrete states and LMI-constrained semi-definite programming (SDP) problems to compute
stabilizing controllers, while polytopic input constraints are considered. An example for illustration is included.
1. The document discusses linear discrete-time systems and their representation in state space and other canonical forms.
2. It covers topics like stability analysis, feedback systems, and analyzing the stationary and transient response of systems.
3. Optimization techniques for linear systems are also covered, including finding the optimal control inputs that minimize an objective function.
This document provides solutions to problems regarding disturbance rejection and integral control. It summarizes the disturbance models for different types of disturbances, including piecewise exponential, piecewise constant, and piecewise harmonic oscillations. It then designs state feedback and feedforward gains to reject disturbances. Finally, it derives the transfer function from disturbance to output and adds integral feedback to reject constant disturbances.
Reachability Analysis "Control Of Dynamical Non-Linear Systems" M Reza Rahmati
The document summarizes an approach for algorithmically synthesizing control strategies for discrete-time nonlinear uncertain systems based on reachable set computations using ellipsoidal calculus. The method uses a first-order Taylor approximation of the nonlinear dynamics combined with a conservative approximation of the Lagrange remainder to transform the system into an affine form. The reachable sets are then over-approximated using ellipsoidal operations. An iterative algorithm is proposed to compute stabilizing controllers by solving constrained optimization problems to drive the system state into a target ellipsoidal set within a finite number of steps while satisfying input constraints.
Reachability Analysis Control of Non-Linear Dynamical SystemsM Reza Rahmati
The document summarizes an approach for algorithmically synthesizing control strategies for discrete-time nonlinear uncertain systems based on reachable set computations using ellipsoidal calculus. The method uses a first-order Taylor approximation of the nonlinear dynamics combined with a conservative approximation of the Lagrange remainder to transform the system into an affine form. The reachable sets are then over-approximated using ellipsoidal operations, allowing the application of ellipsoidal calculus techniques. An iterative algorithm is proposed to compute stabilizing controllers by solving constrained optimization problems at each step to drive the system to a terminal ellipsoidal set within a finite number of steps while satisfying input constraints.
Computer Controlled Systems (solutions manual). Astrom. 3rd edition 1997JOAQUIN REA
This document contains solutions to problems in a textbook on computer-controlled systems. It provides solutions to problems from Chapter 2, which deals with modeling continuous systems as discrete-time systems using sampling. The document includes analytical solutions to several problems involving sampling continuous systems and determining the corresponding discrete-time models. It also contains the solutions presented in matrix form.
LAB05ex1.m
function LAB05ex1
m = 1; % mass [kg]
k = 4; % spring constant [N/m]
omega0=sqrt(k/m);
y0=0.1; v0=0; % initial conditions
[t,Y]=ode45(@f,[0,10],[y0,v0],[],omega0); % solve for 0<t<10
y=Y(:,1); v=Y(:,2); % retrieve y, v from Y
figure(1); plot(t,y,'b+-',t,v,'ro-'); % time series for y and v
grid on;
%------------------------------------------------------
function dYdt= f(t,Y,omega0)
y = Y(1); v= Y(2);
dYdt = [v; -omega0^2*y];
__MACOSX/._LAB05ex1.m
LAB05ex1a.m
function LAB05ex1a
m = 1; % mass [kg]
k = 4; % spring constant [N/m]
c = 1; % friction coefficient [Ns/m]
omega0 = sqrt(k/m); p = c/(2*m);
y0 = 0.1; v0 = 0; % initial conditions
[t,Y]=ode45(@f,[0,10],[y0,v0],[],omega0,p); % solve for 0<t<10
y=Y(:,1); v=Y(:,2); % retrieve y, v from Y
figure(1); plot(t,y,'b+-',t,v,'ro-'); % time series for y and v
grid on
%------------------------------------------------------
function dYdt= f(t,Y,omega0,p)
y = Y(1); v= Y(2);
dYdt = [v; ?? ]; % fill-in dv/dt
__MACOSX/._LAB05ex1a.m
MAT275_LAB05.pdf
MATLAB sessions: Laboratory 5
MAT 275 Laboratory 5
The Mass-Spring System
In this laboratory we will examine harmonic oscillation. We will model the motion of a mass-spring
system with differential equations.
Our objectives are as follows:
1. Determine the effect of parameters on the solutions of differential equations.
2. Determine the behavior of the mass-spring system from the graph of the solution.
3. Determine the effect of the parameters on the behavior of the mass-spring.
The primary MATLAB command used is the ode45 function.
Mass-Spring System without Damping
The motion of a mass suspended to a vertical spring can be described as follows. When the spring is
not loaded it has length ℓ0 (situation (a)). When a mass m is attached to its lower end it has length ℓ
(situation (b)). From the first principle of mechanics we then obtain
mg︸︷︷︸
downward weight force
+ −k(ℓ − ℓ0)︸ ︷︷ ︸
upward tension force
= 0. (L5.1)
The term g measures the gravitational acceleration (g ≃ 9.8m/s2 ≃ 32ft/s2). The quantity k is a spring
constant measuring its stiffness. We now pull downwards on the mass by an amount y and let the mass
go (situation (c)). We expect the mass to oscillate around the position y = 0. The second principle of
mechanics yields
mg︸︷︷︸
weight
+ −k(ℓ + y − ℓ0)︸ ︷︷ ︸
upward tension force
= m
d2(ℓ + y)
dt2︸ ︷︷ ︸
acceleration of mass
, i.e., m
d2y
dt2
+ ky = 0 (L5.2)
using (L5.1). This ODE is second-order.
(a) (b) (c) (d)
y
ℓ
ℓ0
m
k
γ
Equation (L5.2) is rewritten
d2y
dt2
+ ω20y = 0 (L5.3)
c⃝2011 Stefania Tracogna, SoMSS, ASU
MATLAB sessions: Laboratory 5
where ω20 = k/m. Equation (L5.3) models simple harmonic motion. A numerica ...
This document studies module-algebra structures of the quantum universal enveloping algebra Uq(sl(m+1)) on the coordinate algebra of quantum n-dimensional vector spaces Aq(n). The main results are:
1) A complete classification is given of Uq(sl(m+1)) module-algebra structures on Aq(3) when m ≥ 2 and ki acts as an automorphism of Aq(3).
2) All module-algebra structures of Uq(sl(m+1)) on Aq(2) are characterized with the same method.
3) The module-algebra structures of Uq(sl(m+1)) on Aq(n)
Geometric and viscosity solutions for the Cauchy problem of first orderJuliho Castillo
This document summarizes a doctoral dissertation on geometric and viscosity solutions to first order Cauchy problems. It introduces two types of solutions - viscosity solutions and minimax solutions - which are generally different. The aim is to show that iterating the minimax procedure over shorter time intervals approaches the viscosity solution. This extends previous work relating geometric and viscosity solutions in the symplectic case. The document outlines characteristics methods, generating families, Clarke calculus tools, and a proof constructing generating families to relate iterated minimax solutions to viscosity solutions.
Similar to Adaptive dynamic programming for control (20)
The chemistry of the actinide and transactinide elements (set vol.1 6)Springer
Actinium is the first member of the actinide series of elements according to its electronic configuration. Actinium closely resembles lanthanum chemically. The three most important isotopes of actinium are 227Ac, 228Ac, and 225Ac. 227Ac is a naturally occurring isotope in the uranium-actinium decay series with a half-life of 21.772 years. 228Ac is in the thorium decay series with a half-life of 6.15 hours. 225Ac is produced from 233U with applications in medicine.
Transition metal catalyzed enantioselective allylic substitution in organic s...Springer
This document provides an overview of computational studies of palladium-mediated allylic substitution reactions. It discusses the history and development of quantum mechanical and molecular mechanical methods used to study the structures and reactivity of allyl palladium complexes. In particular, density functional theory methods like B3LYP have been widely used to study reaction mechanisms and factors controlling selectivity. Continuum solvation models have also been important for properly accounting for reactions in solvent.
1) Ranchers in Idaho observed lambs born with cyclopia (one eye) due to ewes grazing on corn lily plants. Cyclopamine was identified as the compound responsible and was later found to inhibit the Hedgehog signaling pathway.
2) Nakiterpiosin and nakiterpiosinone were isolated from cyanobacterial sponges and shown to inhibit cancer cell growth. Their unique C-nor-D-homosteroid skeleton presented synthetic challenges.
3) The authors developed a convergent synthesis of nakiterpiosin involving a carbonylative Stille coupling and a photo-Nazarov cyclization. Model studies led them to propose a revised structure for n
This document reviews solid-state NMR techniques that have been used to determine the molecular structures of amyloid fibrils. It discusses five categories of NMR techniques: 1) homonuclear dipolar recoupling and polarization transfer via J-coupling, 2) heteronuclear dipolar recoupling, 3) correlation spectroscopy, 4) recoupling of chemical shift anisotropy, and 5) tensor correlation methods. Specific techniques described include rotational resonance, dipolar dephasing, constant-time dipolar dephasing, REDOR, and fpRFDR-CT. These techniques have provided insights into the hydrogen-bond registry, spatial organization, and backbone torsion angles of amyloid fibrils.
This document discusses principles of ionization and ion dissociation in mass spectrometry. It covers topics like ionization energy, processes that occur during electron ionization like formation of molecular ions and fragment ions, and ionization by energetic electrons. It also discusses concepts like vertical transitions, where electronic transitions occur much faster than nuclear motions. The document provides background information on fundamental gas phase ion chemistry concepts in mass spectrometry.
Higher oxidation state organopalladium and platinumSpringer
This document discusses the role of higher oxidation state platinum species in platinum-mediated C-H bond activation and functionalization. It summarizes that the original Shilov system, which converts alkanes to alcohols and chloroalkanes under mild conditions, involves oxidation of an alkyl-platinum(II) intermediate to an alkyl-platinum(IV) species by platinum(IV). This "umpolung" of the C-Pt bond facilitates nucleophilic attack and product formation rather than simple protonolysis back to alkane. Subsequent work has validated this mechanism and also demonstrated that platinum(IV) can be replaced by other oxidants, as long as they rapidly oxidize the
Principles and applications of esr spectroscopySpringer
- Electron spin resonance (ESR) spectroscopy is used to study paramagnetic substances, particularly transition metal complexes and free radicals, by applying a magnetic field and measuring absorption of microwave radiation.
- ESR spectra provide information about electronic structure such as g-factors and hyperfine couplings by measuring resonance fields. Pulse techniques also allow measurement of dynamic properties like relaxation.
- Paramagnetic species have unpaired electrons that create a magnetic moment. ESR detects transition between spin energy levels induced by microwave absorption under an applied magnetic field.
This document discusses crystal structures of inorganic oxoacid salts from the perspective of periodic graph theory and cation arrays. It analyzes 569 crystal structures of simple salts with the formulas My(LO3)z and My(XO4)z, where M are metal cations, L are nonmetal triangular anions, and X are nonmetal tetrahedral anions. The document finds that in about three-fourths of the structures, the cation arrays are topologically equivalent to binary compounds like NaCl, NiAs, and FeB. It proposes representing these oxoacid salts as a quasi-binary model My[L/X]z, where the cation arrays determine the crystal structure topology while the oxygens play a
Field flow fractionation in biopolymer analysisSpringer
This document summarizes a study that uses flow field-flow fractionation (FlFFF) to measure initial protein fouling on ultrafiltration membranes. FlFFF is used to determine the amount of sample recovered from membranes and insights into how retention times relate to the distance of the sample layer from the membrane wall. It was observed that compositionally similar membranes from different companies exhibited different sample recoveries. Increasing amounts of bovine serum albumin were adsorbed when the average distance of the sample layer was less than 11 mm. This information can help establish guidelines for flow rates to minimize fouling during ultrafiltration processes.
1) The document discusses phonons, which are quantized lattice vibrations in crystals that carry thermal energy. It describes modeling crystal vibrations using a harmonic lattice approach.
2) Normal modes of the lattice vibrations can be described as a set of independent harmonic oscillators. Quantum mechanically, these normal modes are quantized as phonons with discrete energy levels.
3) Phonons can be thought of as quasiparticles that carry momentum and energy in the crystal lattice. Their propagation is described using a phonon field approach rather than independent normal modes.
This chapter discusses 3D electroelastic problems and applied electroelastic problems. For 3D problems, it presents the potential function method for solving problems involving a penny-shaped crack and elliptic inclusions. It derives the governing equations and introduces potential functions to obtain the general static and dynamic solutions. For applied problems, it discusses simple electroelastic problems, laminated piezoelectric plates using classical and higher-order theories, and piezoelectric composite shells. It also presents a unified first-order approximate theory for electro-magneto-elastic thin plates.
Tensor algebra and tensor analysis for engineersSpringer
This document discusses vector and tensor analysis in Euclidean space. It defines vector- and tensor-valued functions and their derivatives. It also discusses coordinate systems, tangent vectors, and coordinate transformations. The key points are:
1. Vector- and tensor-valued functions can be differentiated using limits, with the derivatives being the vector or tensor equivalent of the rate of change.
2. Coordinate systems map vectors to real numbers and define tangent vectors along coordinate lines.
3. Under a change of coordinates, components of vectors and tensors transform according to the Jacobian of the coordinate transformation to maintain geometric meaning.
This document provides a summary of carbon nanofibers:
1) Carbon nanofibers are sp2-based linear filaments with diameters of around 100 nm that differ from continuous carbon fibers which have diameters of several micrometers.
2) Carbon nanofibers can be produced via catalytic chemical vapor deposition or via electrospinning and thermal treatment of organic polymers.
3) Carbon nanofibers exhibit properties like high specific area, flexibility, and strength due to their nanoscale diameters, making them suitable for applications like energy storage electrodes, composite fillers, and bone scaffolds.
Shock wave compression of condensed matterSpringer
This document provides an introduction and overview of shock wave physics in condensed matter. It discusses the assumptions made in treating one-dimensional plane shock waves in fluids and solids. It briefly outlines the history of the field in the United States, noting that accurate measurements of phase transitions from shock experiments established shock physics as a discipline and allowed development of a pressure calibration scale for static high pressure work. It describes some of the practical applications of shock wave experiments for providing high-pressure thermodynamic data, understanding explosive detonations, calibrating pressure scales, and enabling studies of materials under extreme conditions.
Polarization bremsstrahlung on atoms, plasmas, nanostructures and solidsSpringer
This document discusses the quantum electrodynamics approach to describing bremsstrahlung, or braking radiation, of a fast charged particle colliding with an atom. It derives expressions for the amplitude of bremsstrahlung on a one-electron atom within the first Born approximation. The amplitude has static and polarization terms. The static term corresponds to radiation from the incident particle in the nuclear field, reproducing previous results. The polarization term accounts for radiation from the atomic electron and contains resonant denominators corresponding to intermediate atomic states. The full treatment allows various limits to be taken, such as removing the nucleus or atomic electron, reproducing known results from quantum electrodynamics.
Nanostructured materials for magnetoelectronicsSpringer
This document discusses experimental approaches to studying magnetization and spin dynamics in magnetic systems with high spatial and temporal resolution.
It describes using time-resolved X-ray photoemission electron microscopy (TR-XPEEM) to image the temporal evolution of magnetization in magnetic thin films with picosecond time resolution. Results are presented showing the changing domain structure in a Permalloy thin film following excitation with a magnetic field pulse. Different rotation mechanisms are observed depending on the initial orientation of the magnetization with respect to the applied field.
A novel pump-probe magneto-optical Kerr effect technique using higher harmonic generation is also discussed for addressing spin dynamics in magnetic systems with femtosecond time resolution and element selectivity.
This document discusses nanomaterials for biosensors and implantable biodevices. It describes how nanostructured thin films have enabled the development of more sensitive electrochemical biosensors by improving the detection of specific molecules. Two common techniques for creating nanostructured thin films are described - Langmuir-Blodgett films and layer-by-layer films. These techniques allow for the precise control of film thickness at the nanoscale and have been used to immobilize biomolecules like enzymes to create biosensors. Recent research is also exploring how these nanostructured films and biomolecules can be used to create implantable biosensors for real-time monitoring inside the body.
Modern theory of magnetism in metals and alloysSpringer
This document provides an introduction to magnetism in solids. It discusses how magnetic moments originate from electron spin and orbital angular momentum at the atomic level. In solids, electron localization determines whether magnetic properties are described by localized atomic moments or collective behavior of delocalized electrons. The key concepts of metals and insulators are introduced. The document then presents the basic Hamiltonian used to describe magnetism in solids, including terms for kinetic energy, electron-electron interactions, spin-orbit coupling, and the Zeeman effect. It also discusses how atomic orbitals can be used as a basis set to represent the Hamiltonian and describes the symmetry properties of s, p, and d orbitals in cubic crystals.
This chapter introduces and classifies various types of damage that can occur in structures. Damage can be caused by forces, deformations, aggressive environments, or temperatures. It can occur suddenly or over time. The chapter discusses different damage mechanisms including corrosion, excessive deformation, plastic instability, wear, and fracture. It also introduces concepts that will be covered in more detail later such as damage mechanics, fracture mechanics, and the influence of microstructure on damage and fracture. The chapter aims to provide an overview of damage types before exploring specific mechanisms and analyses in later chapters.
This document summarizes research on identifying spin-wave eigen-modes in a circular spin-valve nano-pillar using Magnetic Resonance Force Microscopy (MRFM). Key findings include:
1) Distinct spin-wave spectra are observed depending on whether the nano-pillar is excited by a uniform in-plane radio-frequency magnetic field or by a radio-frequency current perpendicular to the layers, indicating different excitation mechanisms.
2) Micromagnetic simulations show the azimuthal index φ is the discriminating parameter, with only φ=0 modes excited by the uniform field and only φ=+1 modes excited by the orthogonal current-induced Oersted field.
3) Three indices are used to label resonance
Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀.
📕 Agenda:
12:30 Welcome Coffee/Light Lunch ☕
13:00 Event opening speech
Ebert Knol, Managing Partner, Tacstone Technology
Jonathan Smith, UiPath MVP, RPA Lead, Ciphix
Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA
Dion Mes, Principal Sales Engineer, UiPath
13:15 ASML: RPA as Tactical Automation
Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives.
Yannic Suurmeijer, System Architect, ASML
13:30 PostNL: an insight into RPA at PostNL
Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations.
Leonard Renne, RPA Developer, PostNL
13:45 Break (30')
14:15 Breakout Sessions: Round 1
Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding
Mike Bos, Senior Automation Developer, Tacstone Technology
Process Orchestration: scale up and have your Robots work in harmony
Jon Smith, UiPath MVP, RPA Lead, Ciphix
UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors
Johans Brink, CTO, MvR digital workforce
15:00 Breakout Sessions: Round 2
Automation, and GenAI: practical use cases for value generation
Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes
Human in the Loop/Action Center
Dion Mes, Principal Sales Engineer @UiPath
Improving development with coded workflows
Idris Janszen, Technical Consultant, Ilionx
15:45 End remarks
16:00 Community fun games, sharing knowledge, drinks, and bites 🍻
Generative AI technology is a fascinating field that focuses on creating comp...Nohoax Kanont
Generative AI technology is a fascinating field that focuses on creating computer models capable of generating new, original content. It leverages the power of large language models, neural networks, and machine learning to produce content that can mimic human creativity. This technology has seen a surge in innovation and adoption since the introduction of ChatGPT in 2022, leading to significant productivity benefits across various industries. With its ability to generate text, images, video, and audio, generative AI is transforming how we interact with technology and the types of tasks that can be automated.
Connecting Attitudes and Social Influences with Designs for Usable Security a...Cori Faklaris
Many system designs for cybersecurity and privacy have failed to account for individual and social circumstances, leading people to use workarounds such as password reuse or account sharing that can lead to vulnerabilities. To address the problem, researchers are building new understandings of how individuals’ attitudes and behaviors are influenced by the people around them and by their relationship needs, so that designers can take these into account. In this talk, I will first share my research to connect people’s security attitudes and social influences with their security and privacy behaviors. As part of this, I will present the Security and Privacy Acceptance Framework (SPAF), which identifies Awareness, Motivation, and Ability as necessary for strengthening people’s acceptance of security and privacy practices. I then will present results from my project to trace where social influences can help overcome obstacles to adoption such as negative attitudes or inability to troubleshoot a password manager. I will conclude by discussing my current work to apply these insights to mitigating phishing in SMS text messages (“smishing”).
How CXAI Toolkit uses RAG for Intelligent Q&AZilliz
Manasi will be talking about RAG and how CXAI Toolkit uses RAG for Intelligent Q&A. She will go over what sets CXAI Toolkit's Intelligent Q&A apart from other Q&A systems, and how our trusted AI layer keeps customer data safe. She will also share some current challenges being faced by the team.
Lecture 8 of the IVE 2024 short course on the Pscyhology of XR.
This lecture introduced the basics of Electroencephalography (EEG).
It was taught by Ina and Matthias Schlesewsky on July 16th 2024 at the University of South Australia.
Ensuring Secure and Permission-Aware RAG DeploymentsZilliz
In this talk, we will explore the critical aspects of securing Retrieval-Augmented Generation (RAG) deployments. The focus will be on implementing robust secured data retrieval mechanisms and establishing permission-aware RAG frameworks. Attendees will learn how to ensure that access control is rigorously maintained within the model when ingesting documents, ensuring that only authorized personnel can retrieve data. We will also discuss strategies to mitigate risks of data leakage, unauthorized access, and insider threats in RAG deployments. By the end of this session, participants will have a clearer understanding of the best practices and tools necessary to secure their RAG deployments effectively.
IVE 2024 Short Course Lecture 9 - Empathic Computing in VRMark Billinghurst
IVE 2024 Short Course Lecture 9 on Empathic Computing in VR.
This lecture was given by Kunal Gupta on July 17th 2024 at the University of South Australia.
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc
In a landmark year marked by significant AI advancements, it’s vital to prioritize transparency, accountability, and respect for privacy rights with your AI innovation.
Learn how to navigate the shifting AI landscape with our innovative solution TRUSTe Responsible AI Certification, the first AI certification designed for data protection and privacy. Crafted by a team with 10,000+ privacy certifications issued, this framework integrated industry standards and laws for responsible AI governance.
This webinar will review:
- How compliance can play a role in the development and deployment of AI systems
- How to model trust and transparency across products and services
- How to save time and work smarter in understanding regulatory obligations, including AI
- How to operationalize and deploy AI governance best practices in your organization
Jacquard Fabric Explained: Origins, Characteristics, and Usesldtexsolbl
In this presentation, we’ll dive into the fascinating world of Jacquard fabric. We start by exploring what makes Jacquard fabric so special. It’s known for its beautiful, complex patterns that are woven into the fabric thanks to a clever machine called the Jacquard loom, invented by Joseph Marie Jacquard back in 1804. This loom uses either punched cards or modern digital controls to handle each thread separately, allowing for intricate designs that were once impossible to create by hand.
Next, we’ll look at the unique characteristics of Jacquard fabric and the different types you might encounter. From the luxurious brocade, often used in fancy clothing and home décor, to the elegant damask with its reversible patterns, and the artistic tapestry, each type of Jacquard fabric has its own special qualities. We’ll show you how these fabrics are used in everyday items like curtains, cushions, and even artworks, making them both functional and stylish.
Moving on, we’ll discuss how technology has changed Jacquard fabric production. Here, LD Texsol takes center stage. As a leading manufacturer and exporter of electronic Jacquard looms, LD Texsol is helping to modernize the weaving process. Their advanced technology makes it easier to create even more precise and complex patterns, and also helps make the production process more efficient and environmentally friendly.
Finally, we’ll wrap up by summarizing the key points and highlighting the exciting future of Jacquard fabric. Thanks to innovations from companies like LD Texsol, Jacquard fabric continues to evolve and impress, blending traditional techniques with cutting-edge technology. We hope this presentation gives you a clear picture of how Jacquard fabric has developed and where it’s headed in the future.
2. 28 2 Optimal State Feedback Control for Discrete-Time Systems
a new iterative ADP algorithm is presented with convergence proof to solve the HJB
equation derived.
2.2.1 Problem Formulation
Consider a class of discrete-time affine nonlinear systems as follows:
x(k + 1) = f (x(k)) + g(x(k))u(k), (2.1)
where x(k) ∈ Rn is the state vector, and f : Rn → Rn and g : Rn → Rn×m are
differentiable in their arguments with f (0) = 0. Assume that f + gu is Lips-
chitz continuous on a set Ω in Rn containing the origin, and that the system
(2.1) is controllable in the sense that there exists at least a continuous control law
on Ω that asymptotically stabilizes the system. We denote Ωu = {u(k) | u(k) =
[u1 (k), u2 (k), . . . , um (k)]T ∈ Rm , |ui (k)| ≤ ui , i = 1, . . . , m}, where ui is the sat-
¯ ¯
¯
urating bound for the ith actuator. Let U ∈ Rm×m be the constant diagonal matrix
¯
given by U = diag{u1 , u2 , . . . , um }.
¯ ¯ ¯
In this subsection, we mainly discuss how to design an optimal state feedback
controller for this class of constrained discrete-time systems. It is desired to find the
optimal control law v(x) so that the control sequence u(·) = (u(i), u(i + 1), . . . )
with each u(i) ∈ Ωu minimizes the generalized cost functional as follows:
∞
J (x(k), u(·)) = x T (i)Qx(i) + W (u(i)) , (2.2)
i=k
where u(i) = v(x(i)), W (u(i)) ∈ R is positive definite, and the weight matrix Q is
also positive definite.
For optimal control problems, the state feedback control law v(x) must not only
stabilize the system on Ω but also guarantee that (2.2) is finite. Such a control law
is said to be admissible.
Definition 2.1 A control law v(x) is said to be admissible with respect to (2.2) on
Ω if v(x) is continuous with v(x(k)) ∈ Ωu for ∀x(k) ∈ Ω and stabilizes (2.1) on Ω,
v(0) = 0, and for ∀x(0) ∈ Ω, J (x(0), u(·)) is finite, where u(·) = (u(0), u(1), . . . )
and u(k) = v(x(k)), k = 0, 1, . . . .
Based on the above definition, we are ready to explain the admissible control
law sequence. A control law sequence {ηi } = (η0 , η1 , . . . , η∞ ) is called admissible
if the resultant control sequence (u(0), u(1), . . . , u(∞)) stabilizes the system (2.1)
with any initial state x(0) and guarantees that J (x(0), u(·)) is finite. It should be
mentioned that, in this case, each control action obeys a different control law, i.e.,
u(i) is produced by a control law ηi for i = 0, 1, . . . . The control law sequence
{ηi } = (η0 , η1 , . . . , η∞ ) is also called a nonstationary policy in the literature [2].
3. 2.2 Infinite-Horizon Optimal State Feedback Control Based on DHP 29
For convenience, in the sequel J ∗ (x(k)) is used to denote the optimal value func-
tion which is defined as J ∗ (x(k)) = minu(·) J (x(k), u(·)), and u∗ (x) is used to de-
note the corresponding optimal control law.
For the unconstrained control problem, W (u(i)) in the performance functional
(2.2) is commonly chosen as the quadratic form of the control input u(i). How-
ever, in this subsection, to confront the bounded control problem, we employ a non-
quadratic functional as follows:
u(i)
W (u(i)) = 2 ¯ ¯
ϕ −T (U −1 s)U Rds, (2.3)
0
ϕ −1 (u(i)) = [ϕ −1 (u1 (i)), ϕ −1 (u2 (i)), . . . , ϕ −1 (um (i))]T ,
where R is positive definite and assumed to be diagonal for simplicity of analysis,
s ∈ Rm , ϕ ∈ Rm , ϕ(·) is a bounded one-to-one function satisfying |ϕ(·)| ≤ 1 and
belonging to C p (p ≥ 1) and L2 (Ω). Moreover, it is a monotonic increasing odd
function with its first derivative bounded by a constant M. Such a function is easy
to find, and one example is the hyperbolic tangent function ϕ(·) = tanh(·). It should
be noticed that, by the definition above, W (u(i)) is ensured to be positive definite
because ϕ −1 (·) is a monotonic odd function and R is positive definite.
According to Bellman’s principle of optimality, the optimal value function J ∗ (x)
should satisfy the following HJB equation:
∞ u(i)
J ∗ (x(k)) = min x T (i)Qx(i) + 2 ¯ ¯
ϕ −T (U −1 s)U Rds
u(·) 0
i=k
u(k)
= min x T (k)Qx(k) + 2 ¯ ¯
ϕ −T (U −1 s)U Rds
u(k) 0
+ J ∗ (x(k + 1)) . (2.4)
The optimal control law u∗ (x) should satisfy
u(k)
u∗ (x(k)) = arg min x T (k)Qx(k) + 2 ¯ ¯
ϕ −T (U −1 s)U Rds
u(k) 0
∗
+ J (x(k + 1)) . (2.5)
The optimal control problem can be solved if the optimal value function J ∗ (x)
can be obtained from (2.4). However, there is currently no method for solving this
value function of the constrained optimal control problem. Therefore, in the next
subsection we will discuss how to utilize the iterative ADP algorithm to seek the
near-optimal control solution.
4. 30 2 Optimal State Feedback Control for Discrete-Time Systems
2.2.2 Infinite-Horizon Optimal State Feedback Control via DHP
Since direct solution of the HJB equation is computationally intensive, we develop
in this subsection an iterative ADP algorithm, based on Bellman’s principle of opti-
mality and the greedy iteration principle.
First, we start with initial value function V0 (·) = 0 which is not necessarily the
optimal value function. Then, we find the law of single control vector v0 (x) as fol-
lows:
u(k)
v0 (x(k)) = arg min x T (k)Qx(k) + 2 ¯ ¯
ϕ −T (U −1 s)U Rds
u(k) 0
+ V0 (x(k + 1)) , (2.6)
and we update the value function by
v0 (x(k))
V1 (x(k)) = x T (k)Qx(k) + 2 ¯ ¯
ϕ −T (U −1 s)U Rds. (2.7)
0
Therefore, for i = 1, 2, . . . , the iterative ADP algorithm iterates between
u(k)
vi (x(k)) = arg min x T (k)Qx(k) + 2 ¯ ¯
ϕ −T (U −1 s)U Rds
u(k) 0
+ Vi (x(k + 1)) (2.8)
and
u(k)
Vi+1 (x(k)) = min x T (k)Qx(k) + 2 ¯ ¯
ϕ −T (U −1 s)U Rds
u(k) 0
+ Vi (x(k + 1)) . (2.9)
It can be seen that, based on (2.8), (2.9) can further be written as
vi (x(k))
Vi+1 (x(k)) = x T (k)Qx(k) + 2 ¯ ¯
ϕ −T (U −1 s)U Rds + Vi (x(k + 1)) ,
0
(2.10)
where x(k + 1) = f (x(k)) + g(x(k))vi (x(k)).
In summary, in this iterative algorithm, the value function sequence {Vi } and con-
trol law sequence {vi } are updated by implementing the recurrent iteration between
(2.8) and (2.10) with the iteration number i increasing from 0 to ∞.
To further explain the iteration process, next we are ready to analyze this iterative
algorithm. First, based on (2.10) we obtain
5. 2.2 Infinite-Horizon Optimal State Feedback Control Based on DHP 31
vi−1 (x(k+1))
Vi (x(k + 1)) = x T (k + 1)Qx(k + 1) + 2 ¯ ¯
ϕ −T (U −1 s)U Rds
0
+ Vi−1 (x(k + 2)) , (2.11)
where x(k + 2) = f (x(k + 1)) + g(x(k + 1))vi−1 (x(k + 1)). Then, by further ex-
panding (2.10), we have
vi (x(k))
Vi+1 (x(k)) = x T (k)Qx(k) + 2 ¯ ¯
ϕ −T (U −1 s)U Rds
0
vi−1 (x(k+1))
+ x T (k + 1)Qx(k + 1) + 2 ¯ ¯
ϕ −T (U −1 s)U Rds
0
+ · · · + x T (k + i)Qx(k + i)
v0 (x(k+i))
+2 ¯ ¯
ϕ −T (U −1 s)U Rds + V0 (x(k + i + 1)) , (2.12)
0
where V0 (x(k + i + 1)) = 0.
From (2.12), it can be seen that during the iteration process, the control actions
for different control steps obey different control laws. After the iteration number
i + 1, the obtained control law sequence is (vi , vi−1 , . . . , v0 ). With the iteration
number i increasing to ∞, the obtained control law sequence has a length of ∞. For
the infinite-horizon problem, both the optimal value function and the optimal control
law are unique. Therefore, it is desired that the control law sequence will converge
when the iteration number i → ∞. In the following, we will prove that both the
value function sequence {Vi } and the control law sequence {vi } are convergent.
In this subsection, in order to prove the convergence characteristics of the itera-
tive ADP algorithm for the constrained nonlinear system, we first present two lem-
mas before presenting our theorems. For convenience, the nonquadratic functional
¯ ¯
2 0 ϕ −T (U −1 s)U Rds will be written as W (u(k)) in the sequel.
u(k)
Lemma 2.2 Let {μi } be an arbitrary sequence of control laws, and {vi } be the
control law sequence as in (2.8). Let Vi be as in (2.9) and Λi be
Λi+1 (x(k)) = x T (k)Qx(k) + W (μi (x(k))) + Λi (x(k + 1)). (2.13)
If V0 (·) = Λ0 (·) = 0, then Vi (x) ≤ Λi (x), ∀i.
Proof It is clear from the fact that Vi+1 is the result of minimizing the right hand
side of (2.9) with respect to the control input u(k), while Λi+1 is a result of arbitrary
control input.
Lemma 2.3 Let the sequence {Vi } be defined as in (2.9). If the system is control-
lable, then there is an upper bound Y such that 0 ≤ Vi (x(k)) ≤ Y, ∀i.
6. 32 2 Optimal State Feedback Control for Discrete-Time Systems
Proof Let {ηi (x)} be a sequence of stabilizing and admissible control laws, and let
V0 (·) = P0 (·) = 0, where Vi is updated by (2.9) and Pi is updated by
Pi+1 (x(k)) = x T (k)Qx(k) + W (ηi (x(k))) + Pi (x(k + 1)). (2.14)
From (2.14), we further obtain
Pi (x(k + 1)) = x T (k + 1)Qx(k + 1) + W (ηi−1 (x(k + 1)))
+ Pi−1 (x(k + 2)). (2.15)
Thus, the following relation can be obtained:
Pi+1 (x(k)) = x T (k)Qx(k) + W (ηi (x(k)))
+ x T (k + 1)Qx(k + 1) + W (ηi−1 (x(k + 1)))
+ Pi−1 (x(k + 2))
= x T (k)Qx(k) + W (ηi (x(k)))
+ x T (k + 1)Qx(k + 1) + W (ηi−1 (x(k + 1)))
+ x T (k + 2)Qx(k + 2) + W (ηi−2 (x(k + 2)))
+ Pi−2 (x(k + 3))
.
.
.
= x T (k)Qx(k) + W (ηi (x(k)))
+ x T (k + 1)Qx(k + 1) + W (ηi−1 (x(k + 1)))
+ x T (k + 2)Qx(k + 2) + W (ηi−2 (x(k + 2)))
+ ...
+ x T (k + i)Qx(k + i) + W (η0 (x(k + i)))
+ P0 (x(k + i + 1)), (2.16)
where P0 (x(k + i + 1)) = 0.
Let li (x(k)) = x T (k)Qx(k)+W (ηi (x(k))), and then (2.16) can further be written
as
i
Pi+1 (x(k)) = li−j (x(k + j ))
j =0
i
= x T (k + j )Qx(k + j ) + W (ηi−j (x(k + j )))
j =0
7. 2.2 Infinite-Horizon Optimal State Feedback Control Based on DHP 33
i
≤ lim x T (k + j )Qx(k + j ) + W ηi−j (x(k + j )) . (2.17)
i→∞
j =0
Note that {ηi (x)} is an admissible control law sequence, i.e., x(k) → 0 as k →
∞. Therefore there exists an upper bound Y such that
i
∀i : Pi+1 (x(k)) ≤ lim li−j (x(k + j )) ≤ Y. (2.18)
i→∞
j =0
Combining with Lemma 2.2, we obtain
∀i : Vi+1 (x(k)) ≤ Pi+1 (x(k)) ≤ Y. (2.19)
This completes the proof.
Next, Lemmas 2.2 and 2.3 will be used in the proof of our main theorems.
Theorem 2.4 (cf. [17]) Define the value function sequence {Vi } as in (2.10) with
V0 (·) = 0, and the control law sequence {vi } as in (2.8). Then, we can conclude that
{Vi } is a nondecreasing sequence satisfying Vi+1 (x(k)) ≥ Vi (x(k)), ∀i.
Proof For convenience of analysis, define a new sequence {Φi } as follows:
Φi+1 (x(k)) = x T (k)Qx(k) + W (vi+1 (x(k))) + Φi (x(k + 1)), (2.20)
where Φ0 (·) = V0 (·) = 0. The control law sequence {vi } is updated by (2.8) and the
value function sequence {Vi } is updated by (2.10).
In the following, we prove that Φi (x(k)) ≤ Vi+1 (x(k)) by mathematical induc-
tion.
First, we prove that it holds for i = 0. Noticing that
V1 (x(k)) − Φ0 (x(k)) = x T (k)Qx(k) + W (v0 (x(k))) ≥ 0, (2.21)
thus for i = 0, we have
V1 (x(k)) ≥ Φ0 (x(k)). (2.22)
Second, we assume that it holds for i − 1. That is to say, for any x(k), we have
Vi (x(k)) ≥ Φi−1 (x(k)). Then, for i, since
Φi (x(k)) = x T (k)Qx(k) + W (vi (x(k))) + Φi−1 (x(k + 1)) (2.23)
and
Vi+1 (x(k)) = x T (k)Qx(k) + W (vi (x(k))) + Vi (x(k + 1)) (2.24)
hold, we obtain
Vi+1 (x(k)) − Φi (x(k)) = Vi (x(k + 1)) − Φi−1 (x(k + 1)) ≥ 0, (2.25)
8. 34 2 Optimal State Feedback Control for Discrete-Time Systems
i.e., the following equation holds:
Φi (x(k)) ≤ Vi+1 (x(k)). (2.26)
Therefore, (2.26) is proved for any i by mathematical induction.
Furthermore, from Lemma 2.2 we know that Vi (x(k)) ≤ Φi (x(k)). Therefore we
have
Vi (x(k)) ≤ Φi (x(k)) ≤ Vi+1 (x(k)). (2.27)
The proof is completed.
Next, we are ready to exploit the limit of the value function sequence {Vi } when
i → ∞.
(l)
Let {ηi } be the lth admissible control law sequence, similar to the proof of
Lemma 2.3, we can construct the associated sequence Pi(l) (x) as follows:
(l) (l) (l)
Pi+1 (x(k)) = x T (k)Qx(k) + W (ηi (x(k))) + Pi (x(k + 1)), (2.28)
(l)
with P0 (·) = 0.
(l) (l)
Let li (x(k)) = x T (k)Qx(k) + W (ηi (x(k))). Then, the following relation can
be obtained similarly:
i
(l) (l)
Pi+1 (x(k)) = li−j (x(k + j )). (2.29)
j =0
Let i → ∞; we have
i
(l)
P∞ (x(k)) = lim
(l)
li−j (x(k + j )). (2.30)
i→∞
j =0
Combining (2.29) with (2.30), we obtain
(l)
Pi+1 (x(k)) ≤ P∞ (x(k)).
(l)
(2.31)
(l)
Theorem 2.5 (cf. [17]) Define P∞ (x(k)) as in (2.30), and the value function se-
quence {Vi } as in (2.10) with V0 (·) = 0. For any state vector x(k), define J ∗ (x(k)) =
(l)
infl {P∞ (x(k))}, which can be considered as the “optimal” value function starting
from x(k) under all admissible control law sequences with length of ∞. Then, we
can conclude that J ∗ is the limit of the value function sequence {Vi }.
(l)
Proof According to the definition of P∞ (x(k)), the associated control law sequence
(l) (l)
{ηi (x)} is admissible. Thus, it is guaranteed that limi→∞ i =0 li−j (x(k + j )) is
j
9. 2.2 Infinite-Horizon Optimal State Feedback Control Based on DHP 35
(l)
finite, i.e., P∞ (x(k)) is finite. Hence for any l, there exists an upper bound Yl such
that
(l)
Pi+1 (x(k)) ≤ P∞ (x(k)) ≤ Yl .
(l)
(2.32)
Combining with Lemma 2.2, we further obtain
(l)
∀l, i : Vi+1 (x(k)) ≤ Pi+1 (x(k)) ≤ Yl . (2.33)
Since J ∗ (x(k)) = infl {P∞ (x(k))}, for any > 0, there exists a sequence of
(l)
(K)
admissible control laws {ηi } such that the associated value function satisfies
P∞ (x(k)) ≤ J ∗ (x(k)) + . According to (2.33), we have Vi (x(k)) ≤ Pi (x(k))
(K) (l)
for any l and i. Thus, we obtain limi→∞ Vi (x(k)) ≤ P∞ (x(k)) ≤ J ∗ (x(k)) + .
(K)
Noting that is chosen arbitrarily, we have
lim Vi (x(k)) ≤ J ∗ (x(k)). (2.34)
i→∞
(l)
On the other hand, since Vi+1 (x(k)) ≤ Pi+1 (x(k)) ≤ Yl , ∀l, i, we have limi→∞
Vi (x(k)) ≤ infl {Yl }. According to the definition of admissible control law sequence,
the control law sequence associated with the value function limi→∞ Vi (x(k)) must
be an admissible control law sequence, i.e., there exists a sequence of admissible
(N ) (N )
control laws {ηi } such that limi→∞ Vi (x(k)) = P∞ (x(k)). Combining with the
definition J ∗ (x(k)) = infl {P∞ (x(k))}, we can obtain
(l)
lim Vi (x(k)) ≥ J ∗ (x(k)). (2.35)
i→∞
Therefore, combining (2.34) with (2.35), we can conclude that limi→∞ Vi (x(k))
= J ∗ (x(k)), i.e., J ∗ is the limit of the value function sequence {Vi }.
The proof is completed.
Next, let us consider what will happen when we make i → ∞ in (2.9). The left
hand side is simply V∞ (x). But for the right hand side, it is not obvious to see since
the minimum will reach at different u(k) for different i. However, the following
result can be proved.
Theorem 2.6 For any state vector x(k), the “optimal” value function J ∗ (x) satis-
fies the HJB equation
J ∗ (x(k)) = inf x T (k)Qx(k) + W (u(k)) + J ∗ (x(k + 1)) .
u(k)
Proof For any u(k) and i, according to (2.9), we have
Vi (x(k)) ≤ x T (k)Qx(k) + W (u(k)) + Vi−1 (x(k + 1)). (2.36)
10. 36 2 Optimal State Feedback Control for Discrete-Time Systems
According to Theorems 2.4 and 2.5, the value function sequence {Vi } is a non-
decreasing sequence satisfying limi→∞ Vi (x(k)) = J ∗ (x(k)), hence the relation
Vi−1 (x(k + 1)) ≤ J ∗ (x(k + 1)) holds for any i. Thus, we obtain
Vi (x(k)) ≤ x T (k)Qx(k) + W (u(k)) + J ∗ (x(k + 1)). (2.37)
Let i → ∞; we have
J ∗ (x(k)) ≤ x T (k)Qx(k) + W (u(k)) + J ∗ (x(k + 1)). (2.38)
Since u(k) in the above equation is chosen arbitrarily, the following equation holds:
J ∗ (x(k)) ≤ inf x T (k)Qx(k) + W (u(k)) + J ∗ (x(k + 1)) . (2.39)
u(k)
On the other hand, for any i the value function sequence satisfies
Vi (x(k)) = min x T (k)Qx(k) + W (u(k)) + Vi−1 (x(k + 1)) . (2.40)
u(k)
Combining with Vi (x(k)) ≤ J ∗ (x(k)), ∀i, we have
J ∗ (x(k)) ≥ inf x T (k)Qx(k) + W (u(k)) + Vi−1 (x(k + 1)) . (2.41)
u(k)
Let i → ∞; then we obtain
J ∗ (x(k)) ≥ inf x T (k)Qx(k) + W (u(k)) + J ∗ (x(k + 1)) . (2.42)
u(k)
Combining (2.39) and (2.42), we have
J ∗ (x(k)) = inf x T (k)Qx(k) + W (u(k)) + J ∗ (x(k + 1)) . (2.43)
u(k)
The proof is completed.
According to Theorems 2.4 and 2.5, we can conclude that Vi (x(k)) ≤ Vi+1 (x(k)),
∀i and limi→∞ Vi (x(k)) = J ∗ (x(k)). Furthermore, according to Theorem 2.6, we
have J ∗ (x(k)) = infu(k) {x T (k)Qx(k) + W (u(k)) + J ∗ (x(k + 1))}. Therefore, we
can conclude that the value function sequence {Vi } converges to the optimal value
function of the discrete-time HJB equation, i.e., Vi → J ∗ as i → ∞. Since the value
function sequence is convergent, according to (2.5) and (2.8), we can conclude that
the corresponding control law sequence {vi } converges to the optimal control law
u∗ as i → ∞.
It should be mentioned that the value function Vi (x) we constructed is a new
function that is different from ordinary cost function. Via Lemma 2.3 and Theo-
rem 2.4, we have showed that for any x(k) ∈ Ω, the function sequence {Vi (x(k))} is
a nondecreasing sequence, which will increase its value with an upper bound. This
11. 2.2 Infinite-Horizon Optimal State Feedback Control Based on DHP 37
is in contrast to other work in the literature, e.g., [5], where the value functions are
constructed as a nonincreasing sequence with lower bound. Moreover, it should be
noted that we do not require every control law in the sequence {vi } to be admissible.
What we need is a control law sequence to be admissible, i.e., the resultant sequence
of control vectors can stabilize the system.
Next, we are ready to discuss the implementation of the iterative ADP algorithm.
(1) Derivation of the iterative DHP algorithm. First, we assume that the value
function Vi (x) is smooth. In order to implement the iteration between (2.8) and
(2.10), for i = 0, 1, . . . , we further assume that the minimum of the right hand side
of (2.8) can be exactly solved by letting the gradient of the right hand side of (2.8)
with respect to u(k) equal to zero, i.e.,
∂ x T (k)Qx(k) + W (u(k)) ∂x(k + 1) T
∂Vi (x(k + 1))
+ = 0. (2.44)
∂u(k) ∂u(k) ∂x(k + 1)
Therefore, for i = 0, 1, . . . , the corresponding control law vi (x) can be obtained by
solving the above equation, i.e.,
¯ 1 ¯ ∂Vi (x(k + 1))
vi (x(k)) = U ϕ − (U R)−1 g T (x(k)) . (2.45)
2 ∂x(k + 1)
From (2.45), we find that the control law vi (x) at each step of iteration has to
be computed by ∂Vi (x(k + 1))/∂x(k + 1), which is not an easy task. Furthermore,
at each iteration step of value function Vi+1 (x(k)) in (2.10), there exists an integral
v (x(k)) −T ¯ −1 ¯
term 2 0 i ϕ (U s)U Rds to compute, which is a large computing burden.
Therefore, in the following we will present another method called iterative DHP
algorithm to implement the iterative ADP algorithm.
Define the costate function λ(x) = ∂V (x)/∂x. Here, we assume that the value
function V (x) is smooth so that λ(x) exists. Then, the recurrent iteration between
(2.8) and (2.10) can be implemented as follows.
First, we start with an initial costate function λ0 (·) = 0. Then, for i = 0, 1, . . . ,
by substituting λi (x) = ∂Vi (x)/∂x into (2.45), we obtain the corresponding control
law vi (x) as
¯ 1 ¯
vi (x(k)) = U ϕ − (U R)−1 g T (x(k))λi (x(k + 1)) . (2.46)
2
∂Vi+1 (x(k))
For λi+1 (x(k)) = , according to (2.10) we can obtain
∂x(k)
∂ x T (k)Qx(k) + W (vi (x(k)))
λi+1 (x(k)) =
∂x(k)
∂vi (x(k)) T
∂ x T (k)Qx(k) + W (vi (x(k)))
+
∂x(k) ∂vi (x(k))
T
∂x(k + 1) ∂Vi (x(k + 1))
+
∂x(k) ∂x(k + 1)
12. 38 2 Optimal State Feedback Control for Discrete-Time Systems
T T
∂vi (x(k)) ∂x(k + 1) ∂Vi (x(k + 1))
+
∂x(k) ∂vi (x(k)) ∂x(k + 1)
∂ x T (k)Qx(k) + W (vi (x(k)))
=
∂x(k)
∂vi (x(k)) T
∂ x T (k)Qx(k) + W (vi (x(k)))
+
∂x(k) ∂vi (x(k))
T
∂x(k + 1) ∂Vi (x(k + 1))
+
∂vi (x(k)) ∂x(k + 1)
T
∂x(k + 1) ∂Vi (x(k + 1))
+ . (2.47)
∂x(k) ∂x(k + 1)
According to (2.44) and (2.45), we have
∂ x T (k)Qx(k) + W (vi (x(k))) ∂x(k + 1) T
∂Vi (x(k + 1))
+ = 0. (2.48)
∂vi (x(k)) ∂vi (x(k)) ∂x(k + 1)
Therefore (2.47) can further be written as
∂ x T (k)Qx(k) + W (vi (x(k)))
λi+1 (x(k)) =
∂x(k)
T
∂x(k + 1) ∂Vi (x(k + 1))
+ , (2.49)
∂x(k) ∂x(k + 1)
i.e.,
T
∂x(k + 1)
λi+1 (x(k)) = 2Qx(k) + λi (x(k + 1)). (2.50)
∂x(k)
Therefore, the iteration between (2.46) and (2.50) is an implementation of the
iteration between (2.8) and (2.10). From (2.46) the control law vi can directly be
obtained by the costate function. Hence the iteration of value function in (2.10)
can be omitted in the implementation of this iterative algorithm. Considering the
principle of DHP algorithm in Chap. 1, we call such iterative algorithm as iterative
DHP algorithm.
Next, we present a convergence analysis of the iteration between (2.46) and
(2.50).
Theorem 2.7 Define the control law sequence {vi } as in (2.8), and update the value
function sequence {Vi } by (2.10) with V0 (·) = 0. Define the costate function se-
quence {λi } as in (2.50) with λ0 (·) = 0. Then, the costate function sequence {λi }
and the control law sequence {vi } are convergent as i → ∞. The optimal value λ∗
is defined as the limit of the costate function λi when vi approaches the optimal
value u∗ .
13. 2.2 Infinite-Horizon Optimal State Feedback Control Based on DHP 39
Proof According to Theorems 2.4–2.6, we have proved that limi→∞ Vi (x(k)) =
J ∗ (x(k)), and J ∗ (x(k)) satisfies the corresponding HJB equation, i.e.,
J ∗ (x(k)) = inf {x T (k)Qx(k) + W (u(k)) + J ∗ (x(k + 1))}.
u(k)
Therefore, we conclude that the value function sequence {Vi } converges to the
optimal value function of the DTHJB equation, i.e., Vi → J ∗ as i → ∞. With
λi (x(k)) = ∂Vi (x(k))/∂x(k), we conclude that the corresponding costate function
sequence {λi } is also convergent with λi → λ∗ as i → ∞. Since the costate func-
tion is convergent, we can conclude that the corresponding control law sequence
{vi } converges to the optimal control law u∗ as i → ∞.
Remark 2.8 In the iterative DHP algorithm, via the costate sequence (2.50), the cor-
responding control law sequence can be directly obtained by (2.46), which does not
require the computation of ∂Vi (x(k + 1))/∂x(k + 1). Furthermore, in (2.10) there
v (x(k)) −T ¯ −1 ¯
is an integral term 2 0 i ϕ (U s)U Rds to compute at each iteration step,
which is not an easy task. However, in (2.50) the integral term has been removed,
which greatly reduces the computational burden. On the other hand, in order to com-
pute the costate function by (2.50), the internal dynamics f (x(k)) and g(x(k)) of
the system are needed. In the implementation part of the algorithm, a model network
is constructed to approximate the nonlinear dynamics of the system, which avoids
the requirement of known f (x(k)) and g(x(k)).
(2) RBFNN implementation of the iterative DHP algorithm. In the iterative DHP
algorithm, the optimal control is difficult to solve analytically. For example, in
(2.46), the control at step k is a function of costate at step k + 1. A closed-form
explicit solution is difficult to solve, if not impossible. Therefore we need to use
parametric structures, such as fuzzy models [15] or neural networks, to approxi-
mate the costate function and the corresponding control law in the iterative DHP
algorithm. In this subsection, we choose radial basis function (RBF) NNs to ap-
proximate the nonlinear functions.
An RBFNN consists of three-layers (input, hidden and output). Each input value
is assigned to a node in the input layer and passed directly to the hidden layer with-
out weights. Nodes at the hidden layer are called RBF units, determined by a vector
called center and a scalar called width. The Gaussian density function is used as an
activation function for the hidden neurons. Then, linear output weights connect the
hidden and output layers. The overall input–output equation of the RBFNN is given
as
h
y i = bi + wj i φj (X), (2.51)
j =1
where X is the input vector, φj (X) = exp(− X − Cj 2 /σj 2 ) is the activation func-
tion of the j th RBF unit in the hidden layer, Cj ∈ Rn is the center of the j th RBF
14. 40 2 Optimal State Feedback Control for Discrete-Time Systems
unit, h is the number of RBF units, bi and wj i are the bias term and the weight be-
tween hidden and output layer, and yi is the ith output in the m-dimensional space.
Once the optimal RBF centers are established over a wide range of operating points
of the plant, the width of the ith center in the hidden layer is calculated by the
following formula:
h n
1
σi = ( cki − ckj ), (2.52)
h
j =1 k=1
where cki and ckj are the kth value of the center of the ith and j th RBF units,
respectively. In (2.51) and (2.52), · represents the Euclidean norm. To avoid
the extensive computational complexity during training, the batch mode k-means
clustering algorithm is used to calculate the centers of the RBF units.
In order to implement the iterative ADP algorithm, i.e., implement the iteration
between (2.46) and (2.50), we employ RBFNNs to approximate the costate func-
tion λi (x) and the corresponding control law vi (x) at each iteration step i. In the
implementation of the iterative DHP algorithm, there are three networks, which are
model network, critic network and action network, respectively. All the neural net-
works are chosen as RBF networks. The inputs of the model network are x(k) and
vi (x(k)) and the inputs of the critic network and action network are x(k + 1) and
x(k), respectively. The diagram of the whole structure is shown in Fig. 2.1.
For unknown plants, before carrying out the iterative DHP algorithm, we first
train the model network. For any given x(k) and vi (x(k)), we obtain x(k + 1), and
ˆ ˆ
the output of the model network is denoted
x(k + 1) = wm φ(Im (k)),
ˆ T
(2.53)
where Im (k) = [x T (k)viT (x(k))]T is the input vector of the model network.
ˆ
We define the error function of the model network as
em (k) = x(k + 1) − x(k + 1).
ˆ (2.54)
The weights in the model network are updated to minimize the following perfor-
mance measure:
1 T
Em (k) = e (k)em (k). (2.55)
2 m
The weight updating rule for model network is chosen as a gradient-based adapta-
tion rule
∂Em (k)
wm (k + 1) = wm (k) − αm , (2.56)
∂wm (k)
where αm is the learning rate of the model network.
After the model network is trained, its weights are kept unchanged.
15. 2.2 Infinite-Horizon Optimal State Feedback Control Based on DHP 41
Fig. 2.1 The structure
diagram of the iterative DHP
algorithm
The critic network is used to approximate the costate function λi+1 (x). The out-
put of the critic network is denoted
ˆ
λi+1 (x(k)) = wc(i+1) φ(x(k)).
T
(2.57)
The target costate function is given as in (2.50). Define the error function for the
critic network as
ˆ
ec(i+1) (k) = λi+1 (x(k)) − λi+1 (x(k)). (2.58)
The objective function to be minimized for the critic network is
1 T
Ec(i+1) (k) = ec(i+1) (k)ec(i+1) (k). (2.59)
2
The weight updating rule for the critic network is a gradient-based adaptation given
by
∂Ec(i+1) (k)
wc(i+1) (j + 1) = wc(i+1) (j ) − αc , (2.60)
∂wc(i+1) (j )
16. 42 2 Optimal State Feedback Control for Discrete-Time Systems
where αc > 0 is the learning rate of the critic network, and j is the inner-loop itera-
tion step for updating the weight parameters.
In the action network, the state x(k) is used as the input of the network and the
output can be formulated as
ˆ
vi (x(k)) = wai φ(x(k)).
T
(2.61)
The target value of the control vi (x(k)) is obtained by (2.46). So we can define the
error function of the action network as
eai (k) = vi (x(k)) − vi (x(k)).
ˆ (2.62)
The weights of the action network are updated to minimize the following perfor-
mance error measure:
1 T
Eai (k) = eai (k)eai (k). (2.63)
2
The updating algorithm is then similar to the one for the critic network. By the
gradient descent rule, we obtain
∂Eai (k)
wai (j + 1) = wai (j ) − αa , (2.64)
∂wai (j )
where αa > 0 is the learning rate of the action network, and j is the inner-loop
iteration step for updating the weight parameters.
From the neural-network implementation, we can find that in this iterative DHP
ˆ
algorithm, ∂Vi (x(k + 1))/∂x(k + 1) is replaced by λi (x(k + 1)), which is just the
output of the critic network. Therefore, it is more accurate than computing by back-
propagation through the critic network as in [1].
(3) Design procedure of the approximate optimal controller. Based on the itera-
tive DHP algorithm, the design procedure of the optimal control scheme is summa-
rized as follows:
a c ¯
1. Choose imax , jmax , jmax , εm , ε0 , U , αm , αc , αa and the weight matrices Q
and R.
2. Construct the model network x(k + 1) = wm φ(Im (k)) with the initial weight
ˆ T
parameters wm0 chosen randomly from [−0.1, 0.1] and train the model network
with a random input vector uniformly distributed in the interval [−1, 1] and
arbitrary initial state vector in [−1, 1] till the given accuracy εm is reached.
3. Set the iteration step i = 0. Set the initial weight parameters of critic network
wc0 as zero so that the initial value of the costate function λ0 (·) = 0, and ini-
tialize the action network with the weight parameters wa0 chosen randomly in
[−0.1, 0.1].
4. Choose an array of state vector x(k) = (x (1) (k), x (2) (k), . . . , x (p) (k)) ran-
domly from the operation region and compute the corresponding output target
vi (x(k)) = (vi (x (1) (k)), vi (x (2) (k)), . . . , vi (x (p) (k))) by (2.46), where the state
17. 2.2 Infinite-Horizon Optimal State Feedback Control Based on DHP 43
vector at the next time instant
x(k + 1) = x (1) (k + 1), x (2) (k + 1), . . . , x (p) (k + 1)
is computed by the model network (2.53). With the same state vector x(k) =
(x (1) (k), x (2) (k), . . . , x (p) (k)) and
x(k + 1) = x (1) (k + 1), x (2) (k + 1), . . . , x (p) (k + 1) ,
compute the resultant output target
λi+1 (x(k)) = λi+1 (x (1) (k)), λi+1 (x (2) (k)), . . . , λi+1 (x (p) (k))
by (2.50).
5. Set wc(i+1) = wci . With the data set (x (j ) (k), λi+1 (x (j ) (k))), j = 1, 2, . . . , p,
update the weight parameters of the critic network wc(i+1) by (2.60) for jmax c
steps to get the approximate costate function λi+1 . ˆ
6. With the data set (x (j ) (k), vi (x (j ) (k))), j = 1, 2, . . . , p, update the weight pa-
a
rameters of the action network wai by (2.64) for jmax steps to get the approxi-
ˆ
mate control law vi .
7. If
λi+1 (x(k)) − λi (x(k)) 2
< ε0 ,
go to Step 9; otherwise, go to Step 8.
8. If i > imax , go to Step 9; otherwise, set i = i + 1 and go to Step 4.
9. Set the final approximate optimal control law u∗ (x) = vi (x).
ˆ ˆ
10. Stop.
As stated in the last subsection, the iterative algorithm will be convergent with
λi (x) → λ∗ (x) and the control sequence vi (x) → u∗ (x) as i → ∞. However, in
practical applications, we cannot implement the iteration till i → ∞. Actually, we
iterate the algorithm for a max number imax or with a pre-specified accuracy ε0 to
test the convergence of the algorithm. In the above procedure, there are two levels
of loops. The outer loop starts from Step 3 and ends at Step 8. There are two inner
c
loops in Steps 5 and 6, respectively. The inner loop of Step 5 includes jmax iterative
a
steps, and the inner loop of Step 6 includes jmax iterative steps. The state vector
x(k) is chosen randomly at Step 4. Suppose that the associated random probability
density function is nonvanishing everywhere. Then we can assume that all the states
will be explored. So we know that the resulting networks tend to satisfy the formulas
ˆ ˆ
(2.46) and (2.50) for all state vectors x(k). The limits of λi and vi will approximate
the optimal ones λ∗ and u∗ , respectively. The parameters ε0 and imax are chosen
by the designer. The smaller the value of ε0 is set, the more accurate the costate
function and the optimal control law will be. If the condition set in Step 7 is satisfied,
it implies that the costate function sequence is convergent with the pre-specified
18. 44 2 Optimal State Feedback Control for Discrete-Time Systems
accuracy. The larger the value of imax in Step 8 is set, the more accurate the obtained
ˆ
control law v(x) will be at the price of increased computational burden.
2.2.3 Simulations
In this section, two examples are provided to demonstrate the effectiveness of the
control scheme developed in this subsection.
Example 2.9 (Nonlinear Discrete-Time System) Consider the following nonlinear
system [5]:
x(k + 1) = f (x(k)) + g(x(k))u(k), (2.65)
−0.8x 0
where f (x(k)) = sin(0.8x (k)−x 2 (k) , g(x(k)) = −x2 (k) , and assume that
1 2 (k))+1.8x2 (k)
the control constraint is set to |u| ≤ 0.3.
Define the cost functional as
∞ u(i)
J (x(k), u(·)) = x T (i)Qx(i) + 2 ¯ ¯
tanh−T (U −1 s)U Rds , (2.66)
i=k 0
¯
where U = 0.3, and the weight matrices are chosen as Q = 1 0 and R = [0.5].
01
First, we perform the simulation of iterative ADP algorithm. In this iterative
algorithm, we choose RBFNNs as the critic network, the action network and the
model network with the structure 2–9–2, 2–9–1 and 3–9–2, respectively. The train-
ing sets are selected as −1 ≤ x1 ≤ 1 and −1 ≤ x2 ≤ 1, which is the operation
region of the system. It should be mentioned that the model network should be
trained first. The initial state vectors are chosen randomly from [−1, 1]. Under the
learning rate of αm = 0.1, the model network is trained until the given accuracy
εm = 10−6 is reached. After the training of the model network is completed, the
weights are kept unchanged. Then, the critic network and the action network are
trained with the learning rates αa = αc = 0.1 and the inner-loop iteration number
jmax = jmax = 2000. Meanwhile the pre-specified accuracy ε0 is set to 10−20 . De-
c a
note the outer loop iteration number as L. After implementing the outer loop itera-
tion for L = imax = 100, the convergence curves of the costate function are shown in
Fig. 2.2. It can be seen that the costate function is basically convergent with the outer
loop iteration L > 15. In order to compare the different actions of the control laws
obtained under different outer loop iteration numbers, for the same initial state vec-
tor x1 (0) = 0.5 and x2 (0) = 0.5, we apply different control laws to the plant for 30
time steps and obtain the simulation results as follows. The state curves are shown
in Figs. 2.3 and 2.4, and the corresponding control inputs are shown in Fig. 2.5. It
can be seen that the system responses are improved when the outer loop iteration
number L is increased. When L > 80, the system responses only improve slightly
in performance.
19. 2.2 Infinite-Horizon Optimal State Feedback Control Based on DHP 45
Fig. 2.2 The convergence process of the costate function at x = (0.3, −0.5), x = (−0.2, 0.2),
x = (0.8, 0.6)
Fig. 2.3 The state trajectory x1 for L = 2, 30, 80, 100
It should be mentioned that in order to show the convergence characteristics of
the iterative process more clearly, we set the required accuracy ε0 to a very small
number 10−20 and we set the max iteration number to twice of what is needed.
20. 46 2 Optimal State Feedback Control for Discrete-Time Systems
Fig. 2.4 The state trajectory x2 for L = 2, 30, 80, 100
Fig. 2.5 The control input u for L = 2, 30, 80, 100
In this way, the given accuracy ε0 did not take effect even when the max iteration
number is reached. Therefore, it seems that the max iteration number imax becomes
the stopping criterion in this case. If the designer wants to save the running time, the
21. 2.2 Infinite-Horizon Optimal State Feedback Control Based on DHP 47
Fig. 2.6 The state variables curves without considering the actuator saturation in the controller
design
pre-specified accuracy ε0 can be set to a normal value so that the iterative process
will be stopped once the accuracy ε0 is reached.
Moreover, in order to make comparison with the controller designed without
considering the actuator saturation, we also present the system responses obtained
by the controller designed regardless of the actuator saturation. However, the ac-
tuator saturation is actually existing, therefore in the simulation if the control in-
put overrun the saturation bound, it is limited to the bound value. After simula-
tion, the state curves are as shown in Fig. 2.6, and the control curve is shown in
Fig. 2.7.
From the simulation results, we can see that the iterative costate function se-
quences do converge to the optimal ones with very fast speed, which also indicates
the validity of the iterative ADP algorithm for dealing with constrained nonlinear
systems. Comparing Fig. 2.5 with Fig. 2.7, we can see that in Fig. 2.5 the restriction
of actuator saturation has been overcome successfully, but in Fig. 2.7 the control
input has overrun the saturation bound and therefore be limited to the bound value.
From this point, we can conclude that the present iterative ADP algorithm is effec-
tive in dealing with the constrained optimal control problem.
Example 2.10 (Mass–Spring System) Consider the following discrete-time nonlin-
ear mass–spring system:
x1 (k + 1) = 0.05x2 (k) + x1 (k),
(2.67)
x2 (k + 1) = −0.0005x1 (k) − 0.0335x1 (k) + 0.05u(k) + x2 (k),
3
where x(k) is the state vector, and u(k) is the control input.
22. 48 2 Optimal State Feedback Control for Discrete-Time Systems
Fig. 2.7 The control input curve without considering the actuator saturation in the controller de-
sign
Define the cost functional as
∞ u(i)
J (x(k), u(·)) = x T (i)Qx(i) + 2 ¯ ¯
tanh−T (U −1 s)U Rds , (2.68)
i=k 0
¯
where the control constraint is set to U = 0.6, and the weight matrices are chosen
as Q = 0 0.5 and R = [1]. The training sets are −1 ≤ x1 ≤ 1 and −1 ≤ x2 ≤ 1.
0.5 0
The critic network, the action network and the model network are chosen as RBF
neural networks with the structure of 2–16–2, 2–16–1 and 3–16–2, respectively.
In the training process, the learning rates are set to αa = αc = 0.1. The other pa-
rameters are set the same as those in Example 2.9. After implementing the outer
loop iteration for L = imax = 300, the convergence curves of the costate function
are shown in Fig. 2.8. It can be seen that the costate function is basically conver-
gent with the outer loop iteration L > 200. In order to compare the different actions
of the control laws obtained under different outer loop iteration numbers, for the
same initial state vector x1 (0) = −1 and x2 (0) = 1, we apply different control laws
to the plant for 300 time steps and obtain the simulation results as follows. The
state curves are shown in Figs. 2.9, 2.10, and the corresponding control inputs are
shown in Fig. 2.11. It can be seen that the closed-loop system is divergent when
using the control law obtained by L = 2, and the system’s responses are improved
when the outer loop iteration number L is increased. When L > 200, the system
23. 2.2 Infinite-Horizon Optimal State Feedback Control Based on DHP 49
Fig. 2.8 The convergence process of the costate function at x = (−0.5, 0.2), x = (0.4, −0.6),
x = (0, −0.3)
Fig. 2.9 The state trajectory x1 for L = 2, 10, 30, 200
responses basically remain unchanged with no significant improvement in perfor-
mance.
In order to make comparison with the controller without considering the actua-
tor saturation, we also present the controller designed by iterative ADP algorithm
24. 50 2 Optimal State Feedback Control for Discrete-Time Systems
Fig. 2.10 The state trajectory x2 for L = 2, 10, 30, 200
Fig. 2.11 The control input u for L = 2, 10, 30, 200
regardless of the actuator saturation. The state curves are shown in Fig. 2.12 and the
control curve is shown in Fig. 2.13.
From the simulation results, we can see that the iterative costate function se-
quence does converge to the optimal one very fast. Comparing Fig. 2.11 with
25. 2.2 Infinite-Horizon Optimal State Feedback Control Based on DHP 51
Fig. 2.12 The state curves without considering the actuator saturation in controller design
Fig. 2.13 The control curves without considering the actuator saturation in controller design
Fig. 2.13, we can find that in Fig. 2.11 the restriction of actuator saturation has
been overcome successfully, which further verifies the effectiveness of the present
iterative ADP algorithm.
26. 52 2 Optimal State Feedback Control for Discrete-Time Systems
2.3 Infinite-Horizon Optimal State Feedback Control Based
on GDHP
2.3.1 Problem Formulation
In this section, we will study the discrete-time nonlinear systems described by
x(k + 1) = f (x(k)) + g(x(k))u(k), (2.69)
where x(k) ∈ Rn is the state vector and u(k) ∈ Rm is the control vector, f (·) and
g(·) are differentiable in their arguments with f (0) = 0. Assume that f + gu is
Lipschitz continuous on a set Ω in Rn containing the origin, and that the system
(2.69) is controllable in the sense that there exists a continuous control law on Ω
that asymptotically stabilizes the system.
Let x(0) be an initial state and define u0 −1 = (u(0), u(1), u(N − 1)) be a control
N
sequence with which the system (2.69) gives a trajectory starting from x(0): x(1) =
f (x(0)) + g(x(0))u(0), x(2) = f (x(1)) + g(x(1))u(1), . . . , x(N ) = f (x(N − 1)) +
g(x(N − 1))u(N − 1). We call the number of elements in the control sequence
uN −1 the length of u0 −1 and denote it as |uN −1 |. Then, |uN −1 | = N . The final
0
N
0 0
state under the control sequence u0 −1 can be denoted x (f ) (x(0), uN −1 ) = x(N ).
N
0
When the control sequence starting from u(0) has infinite length, we denote it as
u∞ = (u(0), u(1), . . .) and then the correspondingly final state can be written as
0
x (f ) (x(0), u∞ ) = limk→∞ x(k).
0
Definition 2.11 A nonlinear dynamical system is said to be stabilizable on a
compact set Ω ∈ Rn , if for all initial conditions x(0) ∈ Ω, there exists a con-
trol sequence u∞ = (u(0), u(1), . . .), u(i) ∈ Rm , i = 0, 1, . . . , such that the state
0
x (f ) (x(0), u∞ ) = 0.
0
Let u∞ = (u(k), u(k + 1), . . .) be the control sequence starting at k. It is desired
k
∞
to find the control sequence uk which minimizes the infinite-horizon cost functional
given by
∞
J (x(k), u∞ ) =
k γ i−k l(x(i), u(i)), (2.70)
i=k
where l is the utility function, l(0, 0) = 0, l(x(i), u(i)) ≥ 0 for ∀ x(i), u(i), and γ is
the discount factor with 0 < γ ≤ 1. Generally speaking, the utility function can be
chosen as the quadratic form as follows:
l(x(i), u(i)) = x T (i)Qx(i) + uT (i)Ru(i).
For optimal control problems, the designed feedback control must not only sta-
bilize the system on Ω but also guarantee that (2.70) is finite, i.e., the control must
be admissible.
27. 2.3 Infinite-Horizon Optimal State Feedback Control Based on GDHP 53
It is noted that a control law sequence {ηi } = (ηN , . . . , η1 , η0 ), N → ∞, is called
admissible if the resultant control sequence (u(0), u(1), . . . , u(N )) stabilizes sys-
tem (2.69) with any initial state x(0) and guarantees that J (x(0), uN ) is finite.
0
In this case, it should be mentioned that each control action obeys a different
control law, i.e., the control action u(i) is produced by the control law ηN −i or
u(i) = ηN −i (x(i)), for i = 0, 1, . . . , N , N → ∞.
Let
Ax(k) = u∞ : x (f ) (x(k), u∞ ) = 0
k k
be the set of all infinite-horizon admissible control sequences of x(k). Define the
optimal value function as
J ∗ (x(k)) = inf J (x(k), u∞ ) : u∞ ∈ Ax(k) .
∞ k k (2.71)
uk
Note that (2.70) can be written as
∞
J (x(k), u∞ ) = x T (k)Qx(k) + uT (k)Ru(k) + γ
k γ i−k−1 l(x(i), u(i))
i=k+1
∞
= x T (k)Qx(k) + uT (k)Ru(k) + γ J (x(k + 1), uk+1 ). (2.72)
According to Bellman’s optimality principle, it is known that, for the case of infinite-
horizon optimization, the optimal value function J ∗ (x(k)) is time invariant and sat-
isfies the DTHJB equation
J ∗ (x(k)) = min x T (k)Qx(k) + uT (k)Ru(k) + γ J ∗ (x(k + 1)) . (2.73)
u(k)
The optimal control u∗ satisfies the first-order necessary condition, which is
given by the gradient of the right hand side of (2.73) with respect to u(k) as
∂ x T (k)Qx(k) + uT (k)Ru(k) ∂x(k + 1) T
∂J ∗ (x(k + 1))
+γ = 0.
∂u(k) ∂u(k) ∂x(k + 1)
Then, we obtain
γ ∂J ∗ (x(k + 1))
u∗ (x(k)) = − R −1 g T (x(k)) . (2.74)
2 ∂x(k + 1)
By substituting (2.74) into (2.73), the DTHJB equation becomes
γ 2 ∂J ∗ (x(k + 1)) T
J ∗ (x(k)) = x T (k)Qx(k) + g(x(k))R −1
4 ∂x(k + 1)
∂J ∗ (x(k + 1))
× g T (x(k)) + γ J ∗ (x(k + 1)) (2.75)
∂x(k + 1)
28. 54 2 Optimal State Feedback Control for Discrete-Time Systems
where J ∗ (x(k)) is the optimal value function corresponding to the optimal control
law u∗ (x(k)). When dealing with the linear quadratic regulator (LQR) optimal con-
trol problems, this equation reduces to the Riccati equation which can be efficiently
solved. In the general nonlinear case, however, the HJB equation cannot be solved
exactly.
2.3.2 Infinite-Horizon Optimal State Feedback Control Based
on GDHP
Four parts are included in this subsection. In the first part, the unknown nonlinear
system is identified via an NN system identification scheme with stability proof.
The iterative ADP algorithm is introduced in the second part, while in the third
part, the corresponding convergence proof is developed. Then, in the fourth part,
the implementation of the iterative ADP algorithm based on NN is described in
detail.
2.3.2.1 NN Identification of the Unknown Nonlinear System
For the design of the NN identifier, a three-layer NN is considered as the function
approximation structure. Let the number of hidden-layer neurons be denoted by l,
∗
the ideal weight matrix between the input layer and hidden layer be denoted by νm ,
and the ideal weight matrix between the hidden layer and output layer be denoted
∗
by ωm . According to the universal approximation property [8] of NN, the system
dynamics (2.69) has a NN representation on a compact set S, which can be written
as
∗T ∗T
x(k + 1) = ωm σ νm z(k) + θ (k). (2.76)
In (2.76), z(k) = [x T (k) uT (k)]T is the NN input, θ (k) is the bounded NN func-
tional approximation error according to the universal approximation property, and
[σ (¯ )]i = (ezi − e−¯ i )/(ezi + e−¯ i ), i = 1, 2, . . . , l, are the activation functions se-
z ¯ z ¯ z
¯ ∗T
lected in this work, where z(k) = νm z(k), z(k) ∈ Rl . Additionally, the NN activa-
¯
tion functions are bounded such that σ (¯ (k)) ≤ σM for a constant σM .
z
In the system identification process, we keep the weight matrix between the input
layer and the hidden layer as constant while only tune the weight matrix between
the hidden layer and the output layer. So, we define the NN system identification
scheme as
x(k + 1) = ωm (k)σ (¯ (k)),
ˆ T
z (2.77)
ˆ
where x(k) is the estimated system state vector, and ωm (k) is the estimation of the
∗
constant ideal weight matrix ωm .
29. 2.3 Infinite-Horizon Optimal State Feedback Control Based on GDHP 55
Denote x(k) = x(k) − x(k) as the system identification error. Combining (2.76)
˜ ˆ
and (2.77), we can obtain the identification error dynamics as
x(k + 1) = ωm (k)σ (¯ (k)) − θ (k),
˜ ˜T z (2.78)
˜ ∗
where ωm (k) = ωm (k) − ωm . Let ψ(k) = ωm (k)σ (¯ (k)). Then, (2.78) can be rewrit-
˜T z
ten as
x(k + 1) = ψ(k) − θ (k).
˜ (2.79)
The weights in the system identification process are updated to minimize the
following performance measure:
1
˜
E(k + 1) = x T (k + 1)x(k + 1).
˜ (2.80)
2
Using the gradient-based adaptation rule, the weights can be updated as
∂E(k + 1)
ωm (k + 1) = ωm (k) − αm
∂ωm (k)
= ωm (k) − αm σ (¯ (k))x T (k + 1),
z ˜ (2.81)
where αm > 0 is the NN learning rate.
We now give the following assumption before presenting the asymptotic stability
˜
proof of the state estimation error x(k).
Assumption 2.12 The NN approximation error term θ (k) is assumed to be upper
˜
bounded by a function of the state estimation error x(k) such that
θ T (k)θ (k) ≤ θMk = δ x T (k)x(k),
˜ ˜ (2.82)
where δ is the constant target value with δM as its upper bound, i.e., δ ≤ δM .
Next, the stability analysis of the present NN-based system identification scheme
is presented by using the Lyapunov theory.
Theorem 2.13 (cf. [10]) Let the identification scheme (2.77) be used to identify
the nonlinear system (2.69), and let the parameter update law given in (2.81) be
˜
used for tuning the NN weights. Then, the state estimation error dynamics x(k) is
˜
asymptotically stable while the parameter estimation error ωm (k) is bounded.
Proof Consider the following positive definite Lyapunov function candidate:
Lk = L1k + L2k , (2.83)
where
L1k = x T (k)x(k),
˜ ˜
30. 56 2 Optimal State Feedback Control for Discrete-Time Systems
1
L2k = ˜T ˜
tr ωm (k)ωm (k) .
αm
Taking the first difference of the Lyapunov function (2.83) and substituting the iden-
tification error dynamics (2.79) and the NN weight update law (2.81) reveal that
ΔL1k = x T (k + 1)x(k + 1) − x T (k)x(k)
˜ ˜ ˜ ˜
= ψ T (k)ψ(k) − 2ψ T (k)θ (k) + θ T (k)θ (k) − x T (k)x(k)
˜ ˜
1
ΔL2k = tr ωm (k + 1)ωm (k + 1) − ωm (k)ωm (k)
˜T ˜ ˜T ˜
αm
1
= tr − 2αm ψ(k)x T (k + 1)
˜
αm
+ αm x(k + 1)σ T (¯ (k))σ (¯ (k))x T (k + 1)
2
˜ z z ˜
= −2ψ T (k)x(k + 1) + αm σ T (¯ (k))σ (¯ (k))x T (k + 1)x(k + 1).
˜ z z ˜ ˜
After applying the Cauchy–Schwarz inequality ((a1 + a2 + · · · + an )T (a1 + a2 +
· · · + an ) ≤ n(a1 a1 + a2 a2 + · · · + an an )) to ΔL2k , we have
T T T
ΔL2k ≤ −2ψ T (k)(ψ(k) − θ (k))
+ 2αm σ T (¯ (k))σ (¯ (k)) ψ T (k)ψ(k) + θ T (k)θ (k) .
z z
Therefore, we can find that
ΔLk ≤ −ψ T (k)ψ(k) + θ T (k)θ (k) − x T (k)x(k)
˜ ˜
+ 2αm σ T (¯ (k))σ (¯ (k)) ψ T (k)ψ(k) + θ T (k)θ (k) .
z z
Considering σ (¯ (k)) ≤ σM and (2.82), we obtain
z
ΔLk ≤ − 1 − 2αm σM
2
ψ(k) 2
− 1 − δM − 2αm δM σM
2
˜
x(k) 2 . (2.84)
Define αm ≤ ρ 2 /(2σM ); then (2.84) becomes
2
ΔLk ≤ − 1 − ρ 2 ψ(k) 2
˜
− 1 − δM − δM ρ 2 x(k) 2
2
˜T
= − 1 − ρ 2 ωm (k)σ (¯ (k))
z
˜
− 1 − δM − δM ρ 2 x(k) 2 . (2.85)
From (2.85), we can conclude that ΔLk ≤ 0 provided 0 < δM < 1 and
1 − δM 1 − δM
max − , −1 ≤ ρ ≤ min ,1 ,
δM δM
31. 2.3 Infinite-Horizon Optimal State Feedback Control Based on GDHP 57
where ρ = 0. As long as the parameters are selected as discussed above, ΔLk ≤ 0 in
˜ ˜
(2.85), which shows stability in the sense of Lyapunov. Therefore, x(k) and ωm (k)
˜ ˜
are bounded, provided x0 and ωm (0) are bounded in the compact set S. Furthermore,
by summing both sides of (2.85) to infinity and taking account of ΔLk ≤ 0, we have
∞
ΔLk = lim Lk − L0 < ∞.
k→∞
k=0
This implies that
∞
2
˜T
1 − ρ 2 ωm (k)σ (¯ (k))
z ˜
+ 1 − δM − δM ρ 2 x(k) 2
< ∞.
k=0
Hence, it can be concluded that the estimation error approaches zero, i.e.,
x(k) → 0 as k → ∞.
˜
Remark 2.14 According to Theorem 2.13, after a sufficient learning session, the NN
system identification error converges to zero, i.e., we have
f (x(k)) + g(x(k))u(k) = ωm (k)σ (¯ (k)),
ˆ T
z (2.86)
ˆ
where g(x(k)) denotes the estimated value of the control coefficient matrix g(x(k)).
Taking the partial derivative of both sides of (2.86) with respect to u(k) yields
T
∂ ωm (k)σ (¯ (k))
z
g(x(k)) =
ˆ
∂u(k)
∂σ (¯ (k)) ∗T ∂z(k)
z
= ωm (k)
T
ν , (2.87)
∂ z(k) m ∂u(k)
¯
where
∂z(k) 0n×m
= ,
∂u(k) Im
and Im is the m × m identity matrix.
Next, this result will be used in the derivation and implementation of the iterative
ADP algorithm for the optimal control of unknown discrete-time nonlinear systems.
2.3.2.2 Derivation of the Iterative ADP Algorithm
In this part, we mainly present the iterative ADP algorithm. First, we start with the
initial value function V0 (·) = 0, and then solve for the law of single control vector
v0 (x(k)) as follows:
v0 (x(k)) = arg min x T (k)Qx(k) + uT (k)Ru(k) + γ V0 (x(k + 1)) . (2.88)
u(k)
32. 58 2 Optimal State Feedback Control for Discrete-Time Systems
Once the control law v0 (x(k)) is determined, we update the cost function as
V1 (x(k)) = min x T (k)Qx(k) + uT (k)Ru(k) + γ V0 (x(k + 1))
u(k)
= x T (k)Qx(k) + v0 (x(k))Rv0 (x(k)).
T
(2.89)
Therefore, for i = 1, 2, . . . , the iterative ADP algorithm can be used to implement
the iteration between the control law
vi (x(k)) = arg min x T (k)Qx(k) + uT (k)Ru(k) + γ Vi (x(k + 1))
u(k)
γ ∂Vi (x(k + 1))
= − R −1 g T (x(k))
ˆ (2.90)
2 ∂x(k + 1)
and the value function
Vi+1 (x(k)) = min x T (k)Qx(k) + uT (k)Ru(k) + γ Vi (x(k + 1))
u(k)
= x T (k)Qx(k) + viT (x(k))Rvi (x(k)) + γ Vi (x(k + 1)). (2.91)
In the above recurrent iteration, i is the iteration index of the control law and
value function, while k is the time index of the system’s control and state trajec-
tories. The value function and control law are updated until they converge to the
optimal ones. In the following part, we will present a proof of convergence of the
iteration between (2.90) and (2.91) with the value function Vi → J ∗ and the control
law vi → u∗ as i → ∞.
2.3.2.3 Convergence Analysis of the Iterative ADP Algorithm
Lemma 2.15 Let {μi } be an arbitrary sequence of control laws and {vi } be the
control law sequence described in (2.90). Define Vi as in (2.91) and Λi as
Λi+1 (x(k)) = x T (k)Qx(k) + μT (x(k))Rμi (x(k)) + γ Λi (x(k + 1)).
i (2.92)
If V0 (x(k)) = Λ0 (x(k)) = 0, then Vi (x(k)) ≤ Λi (x(k)), ∀i.
Proof It can easily be derived noticing that Vi+1 is the result of minimizing the right
hand side of (2.91) with respect to the control input u(k), while Λi+1 is a result of
arbitrary control input.
Lemma 2.16 Let the value function sequence {Vi } be defined as in (2.91). If the
system is controllable, then there is an upper bound Y such that 0 ≤ Vi (x(k)) ≤ Y ,
∀i.
33. 2.3 Infinite-Horizon Optimal State Feedback Control Based on GDHP 59
Proof Let {ηi (x)} be a sequence of admissible control laws, and let V0 (·) = Z0 (·) =
0, where Vi is updated as in (2.91) and Zi is updated by
Zi+1 (x(k)) = x T (k)Qx(k) + ηi (x(k))Rηi (x(k)) + γ Zi (x(k + 1)).
T
(2.93)
It is clear that
Zi (x(k + 1)) = x T (k + 1)Qx(k + 1) + ηi−1 (x(k + 1))Rηi−1 (x(k + 1))
T
+ γ Zi−1 (x(k + 2)). (2.94)
Noticing that l(x(k), ηi (x(k))) = x T (k)Qx(k) + ηi (x(k))Rηi (x(k)), we can further
T
obtain
Zi+1 (x(k)) = l(x(k), ηi (x(k))) + γ l(x(k + 1), ηi−1 (x(k + 1)))
+ γ 2 Zi−1 (x(k + 2))
= l(x(k), ηi (x(k))) + γ l(x(k + 1), ηi−1 (x(k + 1)))
+ γ 2 l(x(k + 2), ηi−2 (x(k + 2))) + γ 3 Zi−2 (x(k + 3))
.
.
.
= l(x(k), ηi (x(k))) + γ l(x(k + 1), ηi−1 (x(k + 1)))
+ γ 2 l(x(k + 2), ηi−2 (x(k + 2)))
+ · · · + γ i l(x(k + i), η0 (x(k + i)))
+ γ i+1 Z0 (x(k + i + 1)), (2.95)
where Z0 (x(k + i + 1)) = 0. Then, (2.95) can be written as
i
Zi+1 (x(k)) = γ j l(x(k + j ), ηi−j (x(k + j )))
j =0
i
= γ j x T (k + j )Qx(k + j ) + ηi−j (x(k + j ))Rηi−j (x(k + j ))
T
j =0
i
≤ lim γ j x T (k + j )Qx(k + j )
i→∞
j =0
+ ηi−j (x(k + j ))Rηi−j (x(k + j )) .
T
(2.96)
34. 60 2 Optimal State Feedback Control for Discrete-Time Systems
Since {ηi (x)} is an admissible control law sequence, we have x(k) → 0 as k → ∞,
and there exists an upper bound Y such that
i
Zi+1 (x(k)) ≤ lim γ j l(x(k + j ), ηi−j (x(k + j ))) ≤ Y, ∀i. (2.97)
i→∞
j =0
By using Lemma 2.15, we obtain
Vi+1 (x(k)) ≤ Zi+1 (x(k)) ≤ Y, ∀i. (2.98)
Based on Lemmas 2.15 and 2.16, we now present our main theorems.
Theorem 2.17 Define the value function sequence {Vi } as in (2.91) with V0 (·) =
0, and the control law sequence {vi } as in (2.90). Then, {Vi } is a monotonically
nondecreasing sequence satisfying Vi+1 ≥ Vi , ∀i.
Proof Define a new sequence
Φi+1 (x(k)) = x T (k)Qx(k) + vi+1 (x(k))Rvi+1 (x(k)) + γ Φi (x(k + 1))
T
(2.99)
with Φ0 (·) = V0 (·) = 0. Let the control law sequence {vi } and the value function
sequence {Vi } be updated as in (2.90) and (2.91), respectively.
In the following part, we prove that Φi (x(k)) ≤ Vi+1 (x(k)) by mathematical
induction.
First, we prove that it holds for i = 0. Considering
V1 (x(k)) − Φ0 (x(k)) = x T (k)Qx(k) + v0 (x(k))Rv0 (x(k)) ≥ 0
T
then, for i = 0, we get
V1 (x(k)) ≥ Φ0 (x(k)). (2.100)
Second, we assume that it holds for i − 1, i.e., Vi (x(k)) ≥ Φi−1 (x(k)), ∀x(k). Then,
for i, noticing that
Vi+1 (x(k)) = x T (k)Qx(k) + viT (x(k))Rvi (x(k)) + γ Vi (x(k + 1))
and
Φi (x(k)) = x T (k)Qx(k) + viT (x(k))Rvi (x(k)) + γ Φi−1 (x(k + 1)),
we get
Vi+1 (x(k)) − Φi (x(k)) = γ (Vi (x(k + 1)) − Φi−1 (x(k + 1))) ≥ 0
35. 2.3 Infinite-Horizon Optimal State Feedback Control Based on GDHP 61
i.e.,
Vi+1 (x(k)) ≥ Φi (x(k)). (2.101)
Thus, we complete the proof through mathematical induction.
Furthermore, from Lemma 2.15 we know that Vi (x(k)) ≤ Φi (x(k)), therefore,
we have
Vi+1 (x(k)) ≥ Φi (x(k)) ≥ Vi (x(k)). (2.102)
We have reached the conclusion that the value function sequence {Vi } is a mono-
tonically nondecreasing sequence with an upper bound, and therefore, its limit ex-
ists. Now, we can derive the following theorem.
Theorem 2.18 For any state vector x(k), define
lim Vi (x(k)) = V∞ (x(k))
i→∞
as the limit of the value function sequence {Vi }. Then, the following equation holds:
V∞ (x(k)) = min x T (k)Qx(k) + uT (k)Ru(k) + γ V∞ (x(k + 1)) .
u(k)
Proof For any u(k) and i, according to (2.91), we can derive
Vi (x(k)) ≤ x T (k)Qx(k) + uT (k)Ru(k) + γ Vi−1 (x(k + 1)).
Combining with
Vi (x(k)) ≤ V∞ (x(k)), ∀i (2.103)
which is obtained from Theorem 2.17, we have
Vi (x(k)) ≤ x T (k)Qx(k) + uT (k)Ru(k) + γ V∞ (x(k + 1)), ∀i.
Let i → ∞, we can acquire
V∞ (x(k)) ≤ x T (k)Qx(k) + uT (k)Ru(k) + γ V∞ (x(k + 1)).
Note that in the above equation, u(k) is chosen arbitrarily; thus, we obtain
V∞ (x(k)) ≤ min x T (k)Qx(k) + uT (k)Ru(k) + γ V∞ (x(k + 1)) . (2.104)
u(k)
On the other hand, since the value function sequence satisfies
Vi (x(k)) = min x T (k)Qx(k) + uT (k)Ru(k) + γ Vi−1 (x(k + 1))
u(k)
36. 62 2 Optimal State Feedback Control for Discrete-Time Systems
for any i, considering (2.103), we have
V∞ (x(k)) ≥ min x T (k)Qx(k) + uT (k)Ru(k) + γ Vi−1 (x(k + 1)) , ∀i.
u(k)
Let i → ∞; we get
V∞ (x(k)) ≥ min x T (k)Qx(k) + uT (k)Ru(k) + γ V∞ (x(k + 1)) . (2.105)
u(k)
Based on (2.104) and (2.105), we can acquire the conclusion that V∞ (x(k)) =
minu(k) {x T (k)Qx(k) + uT (k)Ru(k) + γ V∞ (x(k + 1))}.
Next, we will prove that the value function sequence {Vi } converges to the opti-
mal value function J ∗ (x(k)) as i → ∞.
Theorem 2.19 (cf. [10]) Define the value function sequence {Vi } as in (2.91) with
V0 (·) = 0. If the system state x(k) is controllable, then J ∗ is the limit of the value
function sequence {Vi }, i.e.,
lim Vi (x(k)) = J ∗ (x(k)).
i→∞
(l)
Proof Let {ηi } be the lth admissible control law sequence. We construct the asso-
(l)
ciated sequence {Pi (x)} as follows:
(l) (l)T (l) (l)
Pi+1 (x(k)) = x T (k)Qx(k) + ηi (x(k))Rηi (x(k)) + γ Pi (x(k + 1)) (2.106)
(l)
with P0 (·) = 0. Similar to the derivation of (2.95), we get
i
(l) (l)
Pi+1 (x(k)) = γ j l x(k + j ), ηi−j (x(k + j )) . (2.107)
j =0
Using Lemmas 2.15 and 2.16, we have
(l)
Vi+1 (x(k)) ≤ Pi+1 (x(k)) ≤ Yl , ∀l, i (2.108)
(l)
where Yl is the upper bound associated with the sequence {Pi+1 (x(k))}. Denote
(l)
lim Pi (x(k)) = P∞ (x(k));
(l)
i→∞
then, we obtain
V∞ (x(k)) ≤ P∞ (x(k)) ≤ Yl , ∀l.
(l)
(2.109)
37. 2.3 Infinite-Horizon Optimal State Feedback Control Based on GDHP 63
Let the corresponding control sequence associated with (2.107) be
(l) k+i
ˆ
uk = (l)
u(k),(l) u(k + 1), . . . ,(l) uk+i
ˆ ˆ ˆ
(l) (l) (l)
= ηi (x(k)), ηi−1 (x(k + 1)), . . . , η0 (x(k + i)) ;
then we have
i
(l) (l)
J x(k),(l) uk+i =
ˆk γ j l x(k + j ), ηi−j (x(k + j )) = Pi+1 (x(k)). (2.110)
j =0
(l)
Letting i → ∞, and denoting the admissible control sequence related to P ∞ (x(k))
with length ∞ as (l) u∞ , we get
ˆk
∞
J x(k),(l) u∞ =
ˆk γ j l x(k + j ),(l) u(k + j ) = P∞ (x(k)).
ˆ (l)
(2.111)
j =0
Then, according to the definition of J ∗ (x(k)) in (2.71), for any ε > 0, there exists a
(M)
sequence of admissible control laws {ηi } such that the associated cost function
∞
J x(k),(M) u∞ =
ˆk γ j l x(k + j ),(M) u(k + j ) = P∞ (x(k))
ˆ (M)
(2.112)
j =0
satisfies J (x(k),(M) u∞ ) ≤ J ∗ (x(k)) + ε. Combining with (2.109), we have
ˆk
V∞ (x(k)) ≤ P∞ (x(k)) ≤ J ∗ (x(k)) + ε.
(M)
(2.113)
Since ε is chosen arbitrarily, we get
V∞ (x(k)) ≤ J ∗ (x(k)). (2.114)
(l)
On the other hand, because Vi+1 (x(k)) ≤ Pi+1 (x(k)) ≤ Yl , ∀l, i, we can get
V∞ (x(k)) ≤ infl {Yl }. According to the definition of admissible control law se-
quence, the control law sequence associated with the cost function V∞ (x(k)) must
be an admissible control law sequence. We can see that there exists a sequence of
(N ) (N )
admissible control laws {ηi } such that V∞ (x(k)) = P∞ (x(k)). Combining with
(2.111), we get V∞ (x(k)) = J (x(k), (N ) u∞ ). Sine J ∗ (x(k)) is the infimum of all
ˆk
admissible control sequences starting at k with length ∞, we obtain
V∞ (x(k)) ≥ J ∗ (x(k)). (2.115)
Based on (2.114) and (2.115), we can conclude that J ∗ is the limit of the value
function sequence {Vi }, i.e., V∞ (x(k)) = J ∗ (x(k)).
From Theorems 2.17 and 2.18, we can derive that the limit of the value function
sequence {Vi } satisfies the DTHJB equation, i.e., V∞ (x(k)) = minu(k) {x T (k)Qx(k)
38. 64 2 Optimal State Feedback Control for Discrete-Time Systems
+ uT (k)Ru(k) + γ V∞ (x(k + 1))}. Besides, from Theorem 2.19, we can get the
result that V∞ (x(k)) = J ∗ (x(k)). Therefore, we can find that the cost function se-
quence {Vi (x(k))} converges to the optimal value function J ∗ (x(k)) of the DTHJB
equation, i.e., Vi → J ∗ as i → ∞. Then, according to (2.74) and (2.90), we can con-
clude the convergence of the corresponding control law sequence. Now, we present
the following corollary.
Corollary 2.20 Define the value function sequence {Vi } as in (2.91) with V0 (·) = 0,
and the control law sequence {vi } as in (2.90). If the system state x(k) is control-
lable, then the sequence {vi } converges to the optimal control law u∗ as i → ∞,
i.e.,
lim vi (x(k)) = u∗ (x(k)).
i→∞
Remark 2.21 Like (2.95), when we further expand (2.91), we obtain a control law
sequence (vi , vi−1 , . . . , v0 ) and the resultant control sequence (vi (x(0)), vi−1 (x(1)),
. . . , v0 (x(i))). With the iteration number increasing to ∞, the derived control law
sequence has the length of ∞. Then, using the corresponding control sequence, we
obtain a state trajectory. However, it is not derived from a single control law. For
infinite-horizon optimal control problem, what we should get is a unique optimal
control law under which we can obtain the optimal state trajectory. Therefore, we
only use the optimal control law u∗ obtained in Corollary 2.20 to produce a control
sequence when we apply the algorithm to practical systems.
2.3.2.4 NN Implementation of the Iterative ADP Algorithm Using GDHP
Technique
When the controlled system is linear and the cost function is quadratic, we can
obtain a linear control law. In the nonlinear case, however, this is not necessarily
true. Therefore, we need to use function approximation structure, such as NN, to
approximate both vi (x(k)) and Vi (x(k)).
Now, we implement the iterative GDHP algorithm in (2.90) and (2.91). In the
iterative GDHP algorithm, there are three networks, which are model network, critic
network and action network. All the networks are chosen as three-layer feedforward
NNs. The input of the critic network and action network is x(k), while the input
ˆ
of the model network is x(k) and vi (x(k)). The diagram of the whole structure is
shown in Fig. 2.14, where
T
∂ x(k + 1) ∂ x(k + 1) ∂ vi (x(k))
ˆ ˆ ˆ
DER = + .
∂x(k) ˆ
∂ vi (x(k)) ∂x(k)
The training of the model network is completed after the system identification
process and its weights are kept unchanged. Then, according to Theorem 2.13, when
ˆ
given x(k) and vi (x(k)), we can compute x(k + 1) by (2.77), i.e.,
ˆ
∗T
x(k + 1) = ωm (k)σ νm [x T (k) viT (x(k))]T .
ˆ T
ˆ
39. 2.3 Infinite-Horizon Optimal State Feedback Control Based on GDHP 65
Fig. 2.14 The structure diagram of the iterative GDHP algorithm
As a result, we avoid the requirement of knowing f (x(k)) and g(x(k)) during the
implementation of the iterative GDHP algorithm.
Next, the learned NN system model will be used in the process of training critic
network and action network.
The critic network is used to approximate both Vi (x(k)) and its derivative
∂Vi (x(k))/∂x(k), which is named the costate function and denoted λi (x(k)). The
output of the critic network is denoted
ˆ
Vi (x(k)) 1T
ωci
= σ νci x(k) = ωci σ νci x(k) ,
T T T
(2.116)
ˆ
λi (x(k)) 2T
ωci
where
ωci = ωci ωci ,
1 2
i.e.,
ˆ
Vi (x(k)) = ωci σ νci x(k)
1T T
(2.117)
and
ˆ
λi (x(k)) = ωci σ νci x(k) .
2T T
(2.118)
The target function can be written as
ˆ ˆ
Vi+1 (x(k)) = x T (k)Qx(k) + viT (x(k))Rvi (x(k)) + γ Vi (x(k + 1)) (2.119)
40. 66 2 Optimal State Feedback Control for Discrete-Time Systems
and
∂ x T (k)Qx(k) + viT (x(k))Rvi (x(k)) ˆ ˆ
∂ Vi (x(k + 1))
λi+1 (x(k)) = +γ
∂x(k) ∂x(k)
T
∂vi (x(k))
= 2Qx(k) + 2 Rvi (x(k))
∂x(k)
T
∂ x(k + 1) ∂ x(k + 1) ∂ vi (x(k))
ˆ ˆ ˆ
+γ + ˆ ˆ
λi (x(k + 1)). (2.120)
∂x(k) ˆ
∂ vi (x(k)) ∂x(k)
Then, we define the error function for training the critic network as
1 ˆ
ecik = Vi (x(k)) − Vi+1 (x(k)) (2.121)
and
2 ˆ
ecik = λi (x(k)) − λi+1 (x(k)). (2.122)
The objective function to be minimized in the critic network training is
Ecik = (1 − β)Ecik + βEcik ,
1 2
(2.123)
where
1 1T 1
Ecik = ecik ecik
1
(2.124)
2
and
1 2T 2
Ecik = ecik ecik .
2
(2.125)
2
The weight updating rule for training the critic network is also gradient-based adap-
tation given by
1
∂Ecik 2
∂Ecik
ωci (j + 1) = ωci (j ) − αc (1 − β) +β (2.126)
∂ωci (j ) ∂ωci (j )
1
∂Ecik 2
∂Ecik
νci (j + 1) = νci (j ) − αc (1 − β) +β (2.127)
∂νci (j ) ∂νci (j )
where αc > 0 is the learning rate of the critic network, j is the inner-loop iteration
step for updating the weight parameters, and 0 ≤ β ≤ 1 is a parameter that adjusts
how HDP and DHP are combined in GDHP. For β = 0, the training of the critic
network reduces to a pure HDP, while β = 1 does the same for DHP.
In the action network, the state x(k) is used as input to obtain the optimal control.
The output can be formulated as
ˆ
vi (x(k)) = ωai σ νai x(k) .
T T
(2.128)