System Identi Cation Data-Driven Modelling of Dynamic Systems - Paul M.J. Van Den Hof
System Identi Cation Data-Driven Modelling of Dynamic Systems - Paul M.J. Van Den Hof
System Identi Cation Data-Driven Modelling of Dynamic Systems - Paul M.J. Van Den Hof
Lecture Notes
February 2012
@ TUD:
Delft Center for Systems and Control
Delft University of Technology
Mekelweg 2, 2628 CD Delft
The Netherlands
Tel. +31-15-2784509
Note
The author acknowledges discussions with and contributions from many colleagues, among
which Xavier Bombois (Delft University of Technology), co-teacher of system identication
courses during many years, and Raymond de Callafon (University of California, San Diego).
1996-2012.
c Copyright Paul Van den Hof. All rights reserved. No part of this work may be reproduced, in
any form or by any means, without permission from the author.
Contents
1 Introduction 1
1.1 Model building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 System identication - data-driven modelling . . . . . . . . . . . . . . . . . 3
1.3 The identication procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Historical overview and present highlights . . . . . . . . . . . . . . . . . . . 12
i
ii
List of Symbols
Derivative with respect to
C Set of complex numbers
One-step-ahead prediction error
e White noise stochastic process
E Expectation
E Generalized expectation of a quasi-stationary process
fe Probability density function of e
(t) Regression vector
G0 Data generating input-output system
G Set of input-output models
H0 Noise shaping lter in data generating system
M Model set
N Set of natural numbers, 1, 2,
N Gaussian or normal distribution
N Number of data
Radial frequency
s Sampling (radial) frequency
u (Auto-)spectral density of signal u
yu Cross-spectral density of signals y and u
(t, ) Partial derivative of output predictor w.r.t.
R Set of real nunbers
Ru ( ) Autocovariance function of u
Ryu ( ) Crosscovariance function of y and u
S Data generating system (G0 , H0 )
e2 Variance of white noise source e
Parameter vector
N Estimated parameter based on N data
Limiting parameter estimate (N )
Domain of parameter vector
Ts Sampling interval
u Input signal
v Output disturbance signal
V () Limiting quadratic cost function
y Output signal
yt {y(s), s t 1}
y(t|t 1) One-step-ahead output predictor
Z Set of integer numbers
ZN Set of input/output data {(y(t), u(t), t = 0, N 1}
convolution
() complex conjugate
Chapter 1
Introduction
mental models. These reect the intuitive notion that people have of a system be-
haviour. Take e.g. the human operator that is able to control a complicated system
1
2 Version 5 February 2012
very well just by experience (captain of a huge tanker, people driving a car or bicycle,
pilot steering a airplane).
software models. These reect system descriptions that are contained in (extensive)
software programs, possible including if-then rules, discrete-event types of mechanisms
and look-up tables (schedule of a railway company).
graphical models. System descriptions in the form of (nonlinear) characteristics and
graphs, Bode and Nyquist plots, impulse and step responses.
mathematical models. System descriptions in the form of mathematical relations as
e.g. (partial) dierential and dierence equations, fuzzy type of models and neural
networks.
In this book we will concentrate on models of the last two categories, and in particular on
mathematical models, while the main emphasis will be on linear time-invariant models in
the form of dierence equations.
Model-based engineering is by far the dominant engineering paradigm to systematic design,
operation and maintenance of engineering systems, and within the engineering eld this is
present in a wide variety of disciplines. In all those areas where dynamics plays a role, the
system and control eld has a growing role to play through its formalism of deriving and
handling models of dynamical systems.
In this respect the particular characteristics of the systems and control eld, contain:
A framework for dynamic modelling that runs across many disciplines; mechanical
engineering systems can be connected to electrical systems, to optical, ow or chemical
systems. This is achieved by considering dynamic systems in a conceptual way, e.g.
by causal input-output mappings (e.g. represented by transfer functions), that simply
can be connected to each other.
The understanding of interconnections of systems is a central topic in systems and
control theory and has led to the important concept of feedback, which is considered
as the crown-juwelry of the eld.
The ability to characterize and handle uncertainties and the ability to create techno-
logical systems that are robust, i.e that can operate under uncertain circumstances
and in uncertain environments. In this respect note that the accuracy with which
a laser spot is positioned on the information track of a DVD-player, largely out-
reaches the tolerance in the specications of the several mechanical parts of which
the equipment is produced. This is only possible on the basis of accurate sensing
(measurements) and on-line (feedback) control.
Within the area of systems and control engineering, one will generally deal with models of
dynamic systems, having dierent purposes in mind. You could speak about Model-Based
X where X stands for several options:
Control design. The design of a feedforward and/or feedback control system that
achieves stabilization, disturbance rejection, tracking of reference trajectories etc.; in
other words a control system that improves or optimizes the performance/eciency
of a dynamical system.
Having illustrated the several options for Model-Based X, we still have not said anything
about the way in which models are constructed.
In engineering systems the dynamical systems mostly concern dynamical relationships be-
tween physical quantities. This implies that one can often use the basic laws of physics
(rst principles) to arrive at a model of the system. In this case the laws of physics are
the main source for the model to be constructed. Complementary to this, measurement
data of the input and output variables of the system, can contain all relevant information
of the underlying system dynamics. So, rather than building models in theory, information
from experimental data can be an eective approach to building models of the actual and
emerging behaviour of dynamical systems. This leads us to an area which has become
known as system identication.
In many situations the rst approach is followed, e.g. in chemical process industry, but
also in mechanical type of systems. Here we generally have good knowledge about the
4 Version 5 February 2012
basic principles that govern the system behaviour. This, however, notwithstanding the
fact that in many situations physical modelling is not the only approach that we have
to take into our luggage in order to arrive at highly accurate models. Even in physical
modelling we generally will need numerical values of (physical) coecients that have to be
substituted in the model relations. Exact values of masses, stinesses, material properties,
properties of chemical reactions, or other (presumed) constants need to be known. And
not only the direct relation between input variables (causes) and output variables (eects)
is of importance here. Our dynamical system will practically never perform as a perfect
set of deterministic equations; in real-life the system variables will be subject to all kinds
of disturbances that inuence the input-output relations within a dynamical system. For
arriving at accurate models, and for their appropriate use, not only the direct input/output
relation is of importance but one also needs to quantify what kind of disturbances act on
the system. These disturbances will limit the validity of deterministic models in any type
of application.
Generally one can say that for accurate modelling of systems the handling and analysis of
real-life measurements is very important.
What are specic situations in which rst principles modelling alone has its limitations:
First principles modelling is impossible. This happens e.g. in the situation that we
simply have no rst principles relations available to construct our model. Consider
e.g. econometric modelling, modelling of human behaviour.
First principles models are too complex. Modelling huge chemical plants will most
often lead to huge models, that in order to be complete, require such a high level
of complexity (high number of equations) that the model becomes intractable for
particular applications.
First principles models are unreliable. This can happen e.g. if the rst principles
relations do not describe the behaviour of the system in enough detail. This does not
imply that the resulting models are bad, they may just not be accurate enough for
the application that one has in mind.
One may need real data experiments to validate models that are being constructed
on rst principles. Even if the models are accurate, the model designer may want
to get serious condence in his model, by confronting it with real measurement data
(validation).
In the situations sketched above, experimental data of the process can be used in order to
arrive at an appropriate model. The situation that a model is identied purely on the basis
of data, and without taking particular account of the physical structure, is referred to as
black box identication. This in contrast with the white box approach, that is taken in the
case of pure physical modelling.
Some examples of processes are given for which physical modelling does not suce to
construct models that are suitable for the desired application. In all examples the intended
application of the models is to use them as a basis for model-based control design.
Example 1.2.1 Glass tube production process. The industrial process under consideration
is a glass tube manufacturing process, schematically depicted in Figure 1.11 . By direct
1
This picture is made available by H. Falkus, Department of Electrical Engineering, Eindhoven University
of Technology.
Chapter 1 5
Reservoir
Mandrel pressure
Mandrel
Melting vessel
Melted glass
Power supply
Pressure
melting vessel Shaping part
Wall-thickness
Drawing machine
Drawing speed
electric heating, quartz sand is melted and ows down through a ring-shaped hole along
the accurately positioned mandrel. Under pressure, gas is led through the hollow mandrel.
The glass tube is pulled down due to gravity and supported by a drawing machine.
Shaping of the tube takes place at, and just below the end of the mandrel. The longitudinal
shape of the tube is characterized by two important dimensions, which are taken as most
important output variables to be controlled: tube diameter and tube wall-thickness. The
purpose of this process is to produce glass tubes with a prespecied tube diameter and wall-
thickness with high accuracy, i.e. allowing only small variations around the prespecied
nominal values.
Both output variables are inuenced by many process conditions such as:
drawing speed,
Some of these have a small bandwidth (power and composition of raw materials), minorly
inuence the glass quality (composition of raw materials), or have extremely large delay
6 Version 5 February 2012
times involved (power, melting vessel pressure and composition of raw materials). Therefore
these are not well suited for control of the tube dimensions.
Shaping of the glass tube clearly is a multivariable process with a high degree of interaction.
Increase of the mandrel pressure results in an increase of the tube diameter and a decrease
of the tube wall-thickness. Increase of the drawing speed causes a decrease of both diameter
and wall-thickness. A physical model of this shaping part can be obtained by deriving the
physical laws of the shaping process, describing the shaping of the tube in detail and over
the full range of possible operating points, determined by various values of tube diameter
and wall-thickness. However, this physical model is very complex and has physical param-
eters included with numerical values that are unknown for the dierent operating points.
Besides, for complete modelling of this process by physical laws, there is simply not enough
knowledge available of all physical details that play a role in this shaping process. Therefore
one has to rely on experimental data to arrive at an appropriate description of the dynamic
behaviour of this process.
In terms of a block diagram, the considered process is reected in Figure 1.2.
Here two input variables are considered, being the mandrel pressure and the drawing speed;
there are also two output variables, being the wall diameter and thickness, and two distur-
bance signals that act on the output variables reecting all kinds of disturbances that act
upon the system. These disturbances incorporate not only the measurement disturbances,
but also several aspects of the system that are simply discarded when using only the two
input variables as the source of explaining the two output variables of the process.
disc
1,6
m
laserspot
Figure 1.4: Schematic depiction (enlarged) of an audio disk surface. For DVD and Blu-Ray
the track width reduces to 0.74 and 0.32 m.
that is externally available is the radial error signal, indicating whether the laser spot is
covering a track correctly. This error signal is processed by an optical measurement unit
and converted into a current signal.
In considering the dynamical properties of this system, one has to realize what kind of
eects play a role. The process as indicated in the block diagram 1.5 contains dynamic
properties of both the actuator and of the whole mechanical structure on which the actuator
is mounted. Note that because of the high-precision that is required in the motion control
of this mechanism, the dynamical properties of the environment play an important role.
The radial distance between two dierent tracks is 1.6m, and the required accuracy of the
positioning control is 0.1m. The importance of the dynamic properties of the mechanical
structure on which the pick-up mechanism is mounted, is reected in the fact that each
separate CD-player shows dierent dynamic behaviour in the form of dierent location of
resonant modes.
8 Version 5 February 2012
disturbance
position setpoint
+ radial error
actuator actuator OPU signal
input +
Figure 1.5: Block diagram of radial system of CD-mechanism; OPU = optical pick-up unit.
10 7
10 6
10 5
10 4
10 3
10 2 10 3 10 4
Figure 1.6: Measured frequency response (amplitude Bode plot) of the radial positioning
mechanism in a CD player optical pick-up unit. Frequency axis is in [Hz].
Using physical models for characterizing the dynamic behaviour of this system shows severe
problems, due to the extremely high accuracy of modelling that is required, and due to the
inuential role of the (mechanical) environment. Identifying dynamic models on the basis
of measurement data can lead to models having the accuracy that is required for this
high-performance control problem. An typical example of a measured amplitude Bode plot
(frequency response) of the radial positioning mechanism in a CD-player is depicted in
Figure 1.6. It matches the frequency response of a double integrator (transfer from force
to position) with on top of that high order exibilities in the construction, which appear at
higher frequencies.
The two processes considered are two examples of situations in which identication of mod-
els from measurement data contributes essentially to the construction of accurate models.
A (partial) combination of physical modelling and black box modelling is also possible, and
in fact also quite appealing. This approach is often denoted by the term grey box modelling.
In this case physical quantities are identied from data, but the basic model structure is
directed by the rst principles relations of the process. One of the properties of black
box models is that they can provide a compact system description without making explicit
statements concerning every single physical subsystem. The system is considered as one
unity that exhibits itself though its external variables (input and output quantities).
As mentioned before, in system identication we consider the dynamical system to be
modelled to exhibit itself through its external (measurable) variables as its input and output
Chapter 1 9
signals. Lets take a look at the dierent signals that we can distinguish:
Measurable output signals. These are measurable signals that can be considered as
consequences or responses of the system behaviour; they can not be manipulated
directly by the experimentor.
Measurable input signals. These are measurable control signals that act as a cause
of the system response. Generally they can be manipulated to a certain extent. If they
can not be manipulated they are sometimes considered as (measurable) disturbance
signals.
Non-measurable disturbances. These reect non-measurable disturbances that act on
the system. These disturbances can not be manipulated nor inuenced.
It is not very simple to formally dene what we mean by the system or the model.
Actually what we call a system or a model is determined by a number of (external) variables
or signals that exhibit a specic relation with each other. How we will deal with systems
and models in a more formal way will be specied in a later stage.
Identication criterion. Given measurement data {y(t), u(t)}t=1,,N , each model (and
therefore each parameter ) will generate a residual signal
v
u + y
S
+
M()
()
This residual signal (t, ) can be used as a basis for constructing the identied
model(s), and this can be done in several dierent ways:
The most standard and classical way is to construct a cost function, which is mini-
mized over . The most popular choice is a least-squares criterion:
1 2
N
VN () = (t, )
N t=1
This general paradigm shows that we consider models in two dierent ways.
For identication purposes we consider a model as a mapping from measured data to some
kind of residual signal, i.e. a mapping from (y, u) to . On the other hand, if we consider a
model as an abstraction of the data generating system, we interpret it as a mapping from
Chapter 1 11
input and disturbance signals to the output, i.e. a mapping from (u, v) to y. This duality
is reected in the block diagrams as depicted in Figure 1.8, and will play an important role
in the sequel of this course.
v
u u + y
M S
y +
Figure 1.8: Two appearances of models; (a) model for identication, (b) model as reection
of the data generating system.
The dierent aspects that are crucial in any identication procedure are reected in a
scheme in Figure 1.9, which is due to Ljung (1987).
prior knowledge /
intended model application
Experiment
design
Identification
Data Model Set
Criterion
Construct model
NOT OK
intended Validate
model application Model
OK
Model Set. It has to be specied beforehand within which set of models one is going
to evaluate the most accurate model for the process at hand. In the model set several
basic properties of the models have to be xed, as e.g. linearity/nonlinearity, time-
invariance, discrete/continuous-time, and other structural properties (as e.g. the
order) of the models.
Identication criterion. Given measurement data and a model set, one has to specify
in which way the optimal model from the model set is going to be determined. In
applying the criterion, the models in the model set are going to be confronted with
the measurement data.
In all three dierent aspects, a prior knowledge about the system to be identied can play
an important role.
Given specic choices for the three phenomena described above, it is generally a matter of
numerical optimization to construct an identied model.
Additionally the nal question that has to be dealt with, is the question whether one is
satised with the model obtained. This latter step in the procedure is indicated by the
term model validation. The question whether one is satised with the result will in many
situations be very much dependent on the question what the model is intended for. The
ultimate answer to the validation question is then, that one is satised with the model if in
the intended model application one is satised with the result. If the model is invalidated,
then a redesign of the identication experiment or adjustment of model set and identication
criterion may lead to an improved model.
The several aspects briey indicated here, will be subject of further analysis in the several
chapters to come.
In the 1980s, this basic assumption has been relaxed, giving more attention to the more real-
istic situation that system identication generally comes down to approximate modelling
rather than exact modelling. Issues of approximation have become popular, a develop-
ment which was pulled mainly by the Swedish school of researchers in Lund, Linkoping and
Uppsala. See e.g. Ljung and Caines (1979), Ljung and Soderstrom (1983), Wahlberg and
Ljung (1985). A good overview of this development, which turned parameter estimation
into system identication is given in the works of Ljung (1999) whose book rst appeared
in 1987, and Soderstrom and Stoica (1989).
Interest in the issue of approximation made people move away from notions as consistency,
and made them pay attention to the type of approximation that becomes involved. A
related issue that comes into the picture is the issue of the intended application or goal
of the model. As identifying a system no longer means nding an exact representation,
but rather nding an approximation, then specic model goals might dictate which type
of approximations are desirable. Or in other words: which aspects of the system dynamics
will be incorporated in the model, and which aspects will be neglected. The intended model
goal will have to point to the right way to go here.
Especially in the area of approximate modelling, the 1990s have shown an increasing inter-
est in identifying approximate models that are suitable for serving as a basis for model-based
control design. This means that, although one realizes that models obtained are only ap-
proximative, one would like to obtain models that are accurate descriptions of the system
dynamics in those aspects of the system that are specically important from a control-
design point of view. Surveys of this development are given in Gevers (1993), Van den Hof
and Schrama (1995), Albertos and Sala (2002), Hjalmarsson (2005) and Gevers (2005).
Another area of interest which is extremely relevant from an applications point of view,
is the question concerning the accuracy of identied (approximate) models. Experimental
data provides us with information concerning the dynamical system; besides the problem
of extracting an appropriate model from the measured data, it is important to be able to
make statements concerning the accuracy and reliability of this result. This area, sometimes
denoted as model uncertainty estimation has been a part of the classical analysis in the
form of providing condence intervals for parameter estimates, however, always restricted
to the situation in which consistent models were estimated. In an approximative setting
of identication, this issue is still an important subject of research, being closely related
to the question of model (in)validation, and to the goal-oriented design of experiments, see
e.g. Bombois et al.(2006).
Whereas most identication techniques are developed and analyzed in the time domain, the
frequency domain also oers a multitude of methods and tools, and sometimes particular
advantages. In the course of years the dierence between the two domains has become less,
as been more been characterized as a dierence between the used excitation signals, being
periodic or not. An account of this development is given in Schoukens and Pintelon (1991)
and Pintelon and Schoukens (2001).
As indicated before, several choices can be made for the identication criterion. In bounded
error modelling, or set-membership identication, typically hard-bounded specications are
given for the residual signals in order to invalidate the accompanying models. Unlike the
least-squares prediction error approaches, these methods do not lead to a single model
estimate, but they principally deliver a set of (invalidated) models. In this sense they
directly point to a model uncertainty quantication. More details on this area can be
found in Milanese and Vicino (1991) and Milanese et al. (1996). A principal problem here
14 Version 5 February 2012
is the choice of the residual bounding; an inappropriate choice can easily lead to overly
conservative model uncertainty bounds.
In so-called subspace identication the identication criterion is not expressed explicitly.
This popular area which has emerged during the nineties (Van Overschee and De Moor,
1996; Verhaegen, 1994) connects with realization theory and essentially encompasses pro-
jections of signal spaces onto subspaces that represent limited-dimensional linear systems.
A principal tool in these operations is the singular value decomposition. One of the advan-
tages of the approach is that handling of multivariable systems is practically as simple as
handling scalar systems.
Important challenges are present in the problem of identifying of models with nonlinear
dynamics. Whereas is many applications it suces to consider linear models of a linearized
nonlinear plant, the challenge to express the nonlinear dynamical phenomena of the plant
into a nonlinear model often enhances the capabilities of the model, e.g. when designing a
control system that moves the plant through several operating regimes. For contributions
in this area see Suykens and Vandewalle (1998), Nelles (2001), Toth (2010). In this book,
attention will mainly be focussed on linear models.
Bibliography
P. Albertos and A. Sala (Eds.)(2002). Iterative Identication and Control. Springer Verlag,
London, UK, ISBN 1-85233-509-2.
K.J. Astrom and T. Bohlin (1965). Numerical identication of linear dynamic systems
from normal operating records. Proc. IFAC Symp. Selft-Adaptive Contr. Systems,
Teddington, England, pp. 96-110.
K.J. Astrom and P. Eykho (1971). System identication - a survey. Automatica, vol. 7,
pp. 123-162.
X. Bombois, G. Scorletti, M. Gevers, P.M.J. Van den Hof and R. Hildebrand (2006). Least
costly identication experiment for control. Automatica, Vol. 42, no. 10, pp. 1651-1662.
P. Eykho (1974). System Identication: Parameter and State Estimation. Wiley & Sons,
London.
K.F. Gauss (1809). Theoria motus corporum celestium. English translation: Theory of the
Motion of the Heavenly Bodies. Dover, New York, 1963.
M. Gevers (1993). Towards a joint design of identication and control? In: H.L. Trentel-
man and J.C. Willems (Eds.), Essays on Control: Perspectives in the Theory and its
Applications. Proc. 1993 European Control Conference, Groningen, The Netherlands,
Birkhauser, Boston, pp. 111-151.
M. Gevers (2005). Identication for control: from the early achievements to the revival of
experiment design. European J. Control, Vol. 11, no. 4-5, pp. 335-352.
M. Gevers (2006). A personal view of the development of system identication. IEEE
Control Systems Magazine, Vol. 26, no. 6, pp. 93-105.
G.C. Goodwin and R.L. Payne (1977). Dynamic System Identication: Experiment Design
and Data Analysis. Academic Press, New York.
P.S.C. Heuberger, P.M.J. Van den Hof and B. Wahlberg (Eds.) (2005). Modelling and
Identication with Rational Orthogonal Basis Functions. Springer Verlag, London, UK,
2005.
H. Hjalmarsson, N. Gevers, S. Gunnarsson, O. Lequin (1998). Iterative feedback tuning:
Chapter 1 15
theory and applications. IEEE Control Systems Magazine, Vol. 18, no. 4, pp. 26-41.
H. Hjalmarsson (2004). From experiment design to closed-loop control. Automatica, Vol.
41, no. 3, pp. 393-438.
R. Johansson (1993). System Modelling and Identication. Prentice-Hall, Englewood Clis,
NJ.
L. Ljung (1999). System Identication - Theory for the User. Prentice-Hall, Englewood
Clis, NJ. Second edition, 1999.
L. Ljung and P.E. Caines (1979). Asymptotic normality of prediction error estimators for
approximate system models. Stochastics, vol. 3, pp. 29-46.
L. Ljung and T. Soderstrom (1983). Theory and Practice of Recursive Identication. MIT
Press, Cambridge, MA, USA.
M. Milanese and M. Vicino (1990). Optimal estimation theory for dynamic systems with
set membership uncertainty: an overview. Automatica, vol. 27, pp. 997-1009.
M. Milanese, J. Norton, H. Piet-Lahanier and E. Walter (Eds.) (1996). Bounding Ap-
proaches to System Identication. Plenum Press, New York.
O. Nelles (2001). Nonlinear System Identication - From Classical Approaches to Neural
Networks and Fuzzy Models. Springer Verlag, Inc., New York, ISBN 3-540-67369-5.
J.P. Norton (1986). An Introduction to Identication. Academic Press, London, UK.
R. Pintelon and J. Schoukens (2001). System Identication - A Frequency Domain Ap-
proach. IEEE Press, Piscataway, NJ, USA, ISBN 0-7803-6000-1.
J. Schoukens and R. Pintelon (1991). Identication of Linear Systems - A Practical Guide
to Accurate Modeling. Pergamon Press, Oxford, UK.
T. Soderstrom and P. Stoica (1989). System Identication. Prentice-Hall, Hemel Hemp-
stead, U.K.
J.A.K. Suykens and J.P.L. Vandewalle (Eds.) (1998). Nonlinear Modelling - Advanced
Black-Box Techniques. Kluwer Academic Publ., Dordrecht, The Netherlands.
R. Toth (2010). Modeling and Identication of Linear Parameter-Varying Systems - An
Orthornomal Basis Function Approach. Springer Verlag, Berlin, Germany.
P.M.J. Van den Hof and R.J.P. Schrama (1995). Identication and control - closed loop
issues. Automatica, vol. 31, pp. 1751-1770.
P. Van Overschee and B.L.R. de Moor (1996). Subspace Identication for Linear Systems.
Kluwer Academic Publ., Dordrecht, The Netherlands.
M. Verhaegen (1994). Identication of the deterministic part of MIMO state space models
given in innovations form from input-output data. Automatica, Vol. 30, no. 1, pp. 61-74.
M. Verhaegen and V. Verdult (2007). Filtering and System Identication - a Least squares
Approach. Cambridge University Press, Cambridge, UK.
E. Walter and L. Pronzato (1997). Identication of Parametric Models from Experimental
Data. Springer, Berlin, 1997.
B. Wahlberg and L. Ljung (1986). Design variables for bias distribution in transfer function
estimation. IEEE Trans. Automat. Control, vol. AC-31, no. 2, pp. 134-144.
Y. Zhu (2001). Multivariable System Identication for Process Control. Elsevier Science
Ltd., Oxford, UK, ISBN 0-08-043985-3.
16 Version 5 February 2012
Chapter 2
2.1 Introduction
System identication is dealing with signals and systems. Based on measured signals of a
physical process the aim is to arrive at a model description of this process in the form of a
dynamical system.
For building up the framework used for handling signals and systems, attention will be paid
to several analysis tools. Basic information content of signals will be examined in terms of
the frequency components that are present in a signal (Fourier series, Fourier transforms),
and in the distribution of energy and/or power of signals over frequency.
First attention will be given to continuous-time systems and related signals. However,
since all of signal processing performed by digital hardware has to be done in discrete-time
(sampled signals), attention will be focused on sampling continuous-time signals and on the
evaluation of relevant properties of discrete-time signals and systems.
The Discrete-Time Fourier Transform (DTFT) for innite sequences as well as the Discrete
Fourier Transform (DFT) for nite sequences will be summarized, and specic attention
will be given to both deterministic signals and stochastic processes. For the analysis of
identication methods in later chapters it will appear attractive to be able to also deal with
signals that are composed of both deterministic and stochastic components. In view of this,
the notion of quasi-stationary processes/signals is discussed.
The treatment of the material will be done in a summarizing style rather than on an
introductory level. It is assumed that the reader has a basic knowledge of signals and
systems theory.
y(t) = G(p)u(t), t IR
17
18 Version 13 February 2012
where s is the Laplace variable, being a complex indeterminate, and Y (s), U (s) the Laplace
transforms of output and input respectively. The transfer function G(s) is a complex
function, and poles and zeros of G(s) in the complex plain give insight in the dynamic
properties of the system. Some relevant properties of the dynamical system G:
Linearity. Linearity of the dynamical system is induced by the ordinary linear dier-
ential equation that governs the dynamical system.
Causality. The mechanism that y(t) does not depend on u( ) for > t. This is
reected by the condition that g(t) = 0, t < 0. In terms of G(s) this is reected by
the condition lim|s| G(s) < .
The frequency response of G(s) is given by G(i), IR. It determines in which way
sinusoidal input signals are processed by the system. For u(t) = sin(t) the stationary
response of the system is
y(t) = |G(i)| sin(t + )
with = arg G(i).
A direct relation between input and output signals can also be obtained by convolution:
y(t) = g( )u(t )d
0
where g( ) is the inverse Laplace transform of G(s), i.e. they are related by the Laplace
transform relations
G(s) = g(t)est dt.
+i
g( ) = G(s)es ds.
i
The above result can simply be veried by substituting the Fourier series of u into the
expression for the power of the signal. This shows that every exponential function in u has
an independent contribution to the power of the signal.
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1
0 10 20 30 40 50 60 70 80 90 100
1.5
0.5
-0.5
-1
-1.5
-2
0 10 20 30 40 50 60 70 80 90 100
2.5
1.5
0.5
-0.5
-1
-1.5
-2
-2.5
0 10 20 30 40 50 60 70 80 90 100
Figure 2.1: Three dierent types of signals: (a) nite-energy signal, (b) periodic nite-power
signal, and (c) random-like signal (e.g. a realization of a stationary stochastic process.
If we assume that u(t) is real-valued, the consequence is that the Fourier series coecients
will satisfy ck = ck .
form pair2 :
U () = u(t)eit dt (2.4)
1
u(t) = U ()eit d. (2.5)
2
The signal u has to satisfy certain conditions in order for the integral (2.4) to exist. The
Fourier transform exists for signals that satisfy the Dirichlet conditions 3 , being a set of
sucient conditions.
The Dirichlet conditions imply that the signals have to satisfy |u(t)|dt < . This
signal class contains at least all energy-signals. Note that in order for a signal to have nite
energy it is necessary that |u(t)| 0 for t .
Additionally analytical power-signals, including periodic signals, can be transformed, pro-
vided that the mathematical framework for representing the transforms is extended to
incorporate Dirac impulse functions, satisfying c (t) = 0, t = 0 and c (t)dt = 1. With
the notion analytical signal is meant that the signals are characterized by analytical ex-
pressions. The standard tables for Fourier transform and its basic properties can be applied.
This includes all kinds of periodic signals, the step function, the sign function etcetera.
For the above classes of signals the Fourier transform can be applied. When transforming
signals from the second category (nite power signals), the Fourier transform will generally
be unbounded, i.e. it will contain Dirac impulses. This is caused by the fact that the energy
content of these signals is unbounded.
The rules for calculating Fourier transforms of specic signals and the specic relations
between transformed signals can be found in any book on Fourier analysis are all based on
the transform pair (2.4),(2.5). Note that the Fourier transform is actually obtained from
the Fourier series, by periodically extending a nite-time signal, obtaining its Fourier series
coecients and taking the limit with the period length tending to innity.
For non-analytical signals, i.e. signals that can not be constructed easily by a mathematical
expression, and having innite energy, the Fourier transform will not exist. This happens
e.g. for realizations of stochastic processes, like the signal depicted in Figure 2.1(c). These
kind of signals will be discussed separately in section 2.3.3.
Finite-time signals
When considering continuous-time signals over a nite-time the corresponding Fourier
transform is denoted by
T
UT () := u(t)eit dt. (2.6)
0
2
In this work the argument is used in U () for convenience; the connection with the Laplace transform
U (s) appears more natural in the notation U (i).
3
The Dirichlet conditions for general (non-periodic) signals are that on any nite internal the signal u:
(a) has at most a nite number of discontinuities, (b) has at most a nite number of maxima and minima,
and (c) is bounded, and additionally u should be absolutely integrable, i.e. |u(t)|dt < , See Phillips
et al. (2008). If the right hand side of (2.5) is evaluated in a point t where u is discontinuous, then it equals
1
2
[u(t ) + u(t+ )], with u(t ) = lim t u( ) and u(t+ ) = lim t u( ).
22 Version 13 February 2012
This concerns nite-time signals that are dened over the time-interval [0, T ]. The transform
comes down to the formal Fourier transform of an innite-signal that is created by extending
the nite-time signal to innity with zeros padded on both sides of the time-interval.
Periodic signals
For a periodic signal with period T0 the coecients of the Fourier series can be directly
related to a nite-time Fourier transform. Directly from (2.2) it follows that
1
ck = UT (k0 ). (2.7)
T0 0
For periodic signals, the expressions for the Fourier transform can be shown to be directly
related to the Fourier series coecients. Equating the Fourier series (2.1) with the inverse
Fourier transform (2.5) it follows that for this periodic signal u the Fourier transform
satises
U () = 2 ck c ( k0 ). (2.8)
k=
This shows that in the Fourier transform of a periodic signal the integral expression in (2.4)
reduces to a summation.
U () = A[c ( 0 ) + c ( + 0 )].
showing that a discrete number of frequencies k0 contribute to the power of the signal.
showing two equivalent expressions for the energy of a signal. This directly leads to the
following notion (see e.g. Phillips et al., 2008).
Chapter 2 23
Proposition 2.3.2 2(Energy Spectral Density Function.) Let u(t) be a nite energy
signal, i.e. u(t) dt = Eu < . Then
1
Eu = u ()d
2
where the Energy Spectral Density u () is given by
u () = |U ()|2 .
With the same line of reasoning a similar expression can be given for the situation of power
signals4 .
Proposition 2.3.3 (Power Spectral Density Function.) Let u(t) be a nite power sig-
T /2
nal of length T , i.e. T1 T /2 |u(t)|2 dt = Pu < . Then
1
Pu = u ()d
2
where the Power Spectral Density u () is given by
1
u () = |UT ()|2 (2.10)
T
with UT () as dened in (2.6).
Note that for the particular situation of a periodic signal expression (2.10) is less suitable.
In this case (2.9) directly shows that
u () = 2 |ck |2 c ( k0 ).
k=
(a) If we restrict attention to one single realization of the process, the signal {x(t)},
t IR, can be considered as a (deterministic) nite-power signal, and the analysis as
presented in the previous part of this section applies.
(b) If we want to focus on the properties of the stochastic process, and not on one single
(random) realization of the process, we have to analyze in which sense properties
of the stochastic process can be recovered from discrete samples of the stochastic
process.
In this subsection attention will be paid to the situation (b) described above. For situation
(a) we refer to the previous sections.
In this section a stationary stochastic process x will be considered, i.e.
(a) For every t IR, x(t) is a random variable with a xed probability density function
that governs the outcome of x(t).
(b) E(x(t)) and E(x(t)x(t )) are independent of t for each value of IR.
The considered notion of stationarity (limited to restrictions on the rst two moments
E(x(t)) and E(x(t)x(t ))) is mostly referred to as wide-sense stationarity, in contrast
with a more strict notion that is related to the time-invariance of the probability density
functions.
Some useful notions related to stationary stochastic processes are:
Note that the notions used here, are very similar to the corresponding notions for deter-
ministic signals. For instance, x () here indicates the distribution of the average-power
(over the ensemble) of the stochastic process over dierent frequencies.
Given a LTIFD system relating input and output signals according to y(t) = G(p)u(t) with
u, y zero-mean stationary stochastic processes, then the spectral densities satisfy:
y () = |G(i)|2 u () (2.13)
yu () = G(i) u (). (2.14)
The basic tool for analyzing the frequency content of a discrete-time signal is the discrete-
time Fourier analysis, i.e. the discrete-time Fourier series and the discrete-time Fourier
transform. The Fourier series refers to periodic signals, showing that any periodic signal
can be written as a summation of harmonic functions (sinusoids). The Fourier transform
is a generalization that can also handle non-periodic signals.
26 Version 13 February 2012
0 1
N
2
iN k
ud (k) = a e 0 (2.15)
=0
N0 1
1 i 2 k
a = ud (k)e N0 . (2.16)
N0
k=0
The power of periodic signals can again be written directly as a function of the Fourier
coecients:
N0 1 0 1
1
N
2
Pu = ud (k) = |a |2 .
N0
k=0 =0
This shows that every exponential function in u has an independent contribution to the
power of the signal, which is simply a summation of the contributions of each separate
frequency. As the Fourier coecients a are periodic with period N0 , the sum on the right
hand side can be taken over any N0 consecutive values of
.
Note that the discrete-time Fourier transform (DTFT) transforms a discrete sequence of
time-domain samples, into a function Us () that takes its values continuously over s.
By construction (since k is integer valued) the transform Us () is a periodic function with
period 2/Ts = s . Corresponding to this, the integral in (2.18) is taken over any range of
with length 2/Ts , being the period length of the integrand.
Finite-time signals
When considering discrete time signals over a nite time, the corresponding Fourier trans-
form is denoted by:
N 1
UN () := ud (k)eikTs . (2.19)
k=0
5 N0 1 i s kTs
The given expression for ud (k) actually has resulted from u(kTs ) = =0 a e N0 , which shows
that the eect of the sampling interval Ts is canceled out in the exponent.
Chapter 2 27
Periodic signals
For a periodic signal with period N0 the coecients of the Fourier series can be directly
related to a nite-time Fourier transform taken over one period of the periodic signal.
Directly from (2.16) it follows that
1
a = UN (
0 ). (2.20)
N0 0
Additionally the expressions for the Fourier transform can be shown to be directly related
to the Fourier series coecients. Equating the Fourier series (2.15) with the inverse Fourier
transform (2.18) it follows that for this periodic signal u the Fourier transform satises
2
Us () = ak c ( k0 ) (2.21)
Ts
k=
2
with 0 = N0 Ts . In this expression the -functions serve to replace the integral expression
in (2.18).
with 0 = 2/(N0 Ts ), i.e. there are N0 samples in a single period of the signal. We consider
N to be a multiple of N0 , N = rN0 , with r N. Then
A i(0 )kTs
N 1
UN () = e + ei(0 +)kTs . (2.22)
2
k=0
u () = |Us ()|2 .
28 Version 13 February 2012
1
u () = |UN ()|2
N
The proof is added in the appendix.
As in the case of continuous-time signals, the (discrete-time) Fourier transform of sampled-
data signals constitutes a way to characterize the distribution of energy and/or power of
the corresponding signals over the dierent frequencies.
For nite power signals the quantity N1 |UN ()|2 is referred to as the periodogram of the
(nite-time) discrete-time signal. This periodogram determines the distribution of power
over frequencies.
For periodic signals the power spectral density can again be computed directly on
Nthe basis
0 1
of the discrete-time Fourier coecients of the signals. Since in this case Pu = =0 |a |2
it follows from combination of (2.20) and (2.24) that
2
u () = |ak |2 c ( k0 ).
Ts
k=
u(t) - s
u(kT
- s)
Ts
sampler
The signal u is a continuous-time signal taking values for all t IR, while its sampled
version ud (k) = u(kTs ) is a discrete sequence dened for integer values k Z.
As a rst step the Fourier transforms of the continuous-time and discrete-time signals will
be related to each other. Under the condition that ud (k) = u(kTs ) it follows from the
Fourier transform expressions that
Us () = u(kTs )eikTs . (2.25)
k=
Chapter 2 29
By substituting the inverse Fourier transform relation (2.5) for u(kTs ) it follows that
1
Us () = U () ei()kTs d (2.26)
2 k=
1
= U () eikTs (2.27)
2
k=
where is the convolution operator, and U () the Fourier transform of the underlying
continuous-time signal.
Since eikTs = F ( c
(t kTs ))6 , it follows
from linearity of the Fourier Transform that
ikT
k= e
s
=F k= c (t kTs ) .
The summation of Dirac functions, being the so-called comb of Dirac,
p(t) := c (t kTs )
k=
with T /2
1 1
ck = p(t)eiks t dt = .
Ts T /2 Ts
Therefore p(t) = k= T1s eiks t and since from (2.27) it follows that Us () = F(u(t)p(t)),
the expression for Us () then becomes
1 iks t it
Us () = u(t) e e dt (2.28)
Ts
k=
1
= U ( ks ). (2.29)
Ts
k=
U()
Us() U()/Ts
-2s -s s/2 s 2s
Figure 2.3: Fourier transform U () of continuous-time signal and of its sampled version
Us ().
When dealing with a signal u(t) that is not band-limited, i.e. its Fourier transform does
not satisfy (2.30), then the Fourier transform Us is still constructed according to (2.29),
but in that case the shifted versions of U () will be folded on top of each other in Us ().
This eect is called aliasing. The main consequence of this is that low-pass ltering of us
will not recover the original U () anymore.
The basic consequence of the results in this section is that when sampling a continuous-time
signal, one has to take care that the signal that is sampled does not contain any frequency
components for frequencies higher than the Nyquist frequency s /2. Only in that case
all information of the continuous-time signal can be recovered from the sampled signal.
This leads to the standard use of -continuous-time- anti-aliasing lters before a sampling
operation.
The exact reconstruction of a continuous-time signal from its sampled data, as referred to
in the Shannon Theorem, can be done by using the inverse relation between Us and U .
Provided that the continuous-time signal is band-limited, as mentioned above, it follows
that
U () = Ts Us ()H()
where H() is the low-pass lter with cut-o frequency N , determined by
1 N N
H() =
0 || > N
N
u(t) = Ts sinc(N t) F 1 (Us ()) .
7
The sinc function is dened by sinc(t) = sin(t)
t
for t = 0, and sinc(0) = 1.
8
F 1 is the inverse Fourier transform.
Chapter 2 31
This expression shows how the continuous-time signal u can be completely calculated by us-
ing only the sampled values {u(kTs )}k=,, of the original continuous-time signal. Note
that it requires an innite number of sampled data points to fully recover the underlying
continuous-time signal, even if u takes values for only a nite time period. It also shows
that the sinc-functions actually form a basis for the set of signals that are bandlimited in
frequency.
Impulse sampling
The ideal sampling and reconstruction scheme discussed above is often referred to as impulse
sampling and reconstruction. This is motivated by the fact that the Fourier transform Us
of the sampled signal can be interpreted as the (continuous-time) Fourier transform of the
(continuous-time) signal
us (t) = u(t)c (t kTs ).
k=
This latter signal is related to the original continuous-time signal u through multiplication
with a so-called comb of Dirac pulses. The related operations are schematically depicted in
Figure 2.4.
c (t kTs )
k=
? ur (t)
- us (t)- -
u(t)
Low-pass lter
In this Figure the -articial- signal us (t) is a continuous-time signal that only takes values
unequal to zero when t = kTs for integer values k Z. At these values of t the signal
becomes unbounded, being a -function with a weight that is equal to u(kTs ).
32 Version 13 February 2012
Ts
A verication of the validity of this transform pair is added in the appendix. Considering
this transform pair, a few remarks have to be made.
Note that while UN () takes its values on a continuous region of , only N discrete
values of UN are necessary for reconstructing the original signal ud . These N discrete
values are N points within one period of the periodic function UN ().
The DTFT is periodic with period 2/Ts .
The sequence {UN (), = N s ,
= 0, N 1} is dened as the Discrete Fourier
Transform (DFT) of the signal ud (k), k = 0, N 1. It is given by
N 1
s 2
UN ( )= ud (k)ei N k ,
= 0, N 1.
N
k=0
The inverse DFT, dened by (2.35), also denes a time-domain sequence outside the
interval [0, N 1]. Actually it induces a periodic extension of the original time-
sequence ud (k), as the reconstructed signal (2.35) is periodic with period N .
Because of reasons of symmetry, the DTFT satises
UN () = UN () .
As a result the DTFT is completely determined by UN () for in the interval
[0, /Ts ]. This implies that the one-to-one mapping between time- and frequency
domain actually takes place between N real-valued time-domain samples, and N/2
complex-valued frequency domain samples.
In very many situations discrete-time signals are being analyzed without taking account of
the fact that they originate from sampled continuous-time signals. Similar to the situation
of the previous section, this implies that in that case the expressions for the DTFT are used
for Ts = 1:
N 1
UN () = ud (k)eik . (2.36)
k=0
N 1
1 2
i 2 k
ud (k) = UN ( )e N . (2.37)
N N
=0
In many books on discrete-time signal processing this is the only situation that is con-
sidered. Discrete-time Fourier transforms, spectral densities, periodograms will then be
considered generally over the frequency interval [0, ], being half of a single period
of the corresponding periodic function in the frequency domain. Whenever we connect a
sampling time to the discrete-time signal, then = gets the interpretation of being equal
to half of the (radial) sampling frequency.
Spectral properties of nite-time sampled signals
Similar to the situation of innite-time signals, we can exploit Parsevals relation for quan-
tifying the energy and power of nite-time (deterministic) sampled signals.
Consider the Discrete Fourier Transform as discussed above. Then
1 N 1
1
N
ks 2
ud (k)2 = |UN ( )| (2.38)
N N
k=0 k=0
N 1 N 1
1 1 1 ks 2
ud (k)2 = | UN ( )| . (2.39)
N N N N
k=0 k=0
It may be clear that the rst expression is used for signals having the character of having
nite energy, while the second expression is specially used for nite power signals. Note
that over a nite time-interval this distinction is not really relevant as the operation of
dividing by a nite N is just a matter of scaling. The main dierence has to be found in the
corresponding asymptotic analysis, when N . Note that the expressions above actually
are alternatives for the integral expressions for signal power as presented in Proposition
2.4.3. For nite time signals, there is no need to take the integral over the power spectral
density as in (2.24); the power also results from summing the squared magnitude of the
DFT over an equidistant frequency grid.
34 Version 13 February 2012
E[x(t) xr (t)]2 = 0.
satises
Rxd (k) = Rx (kTs )
where Rx is the correlation function of the continuous-time process. The discrete-time
correlation function is simply a sampled version of the continuous-time one.
The power spectral density of xd is dened by the DTFT of Rxd according to:
xd () = Rxd (k)eikTs (2.41)
k=
Ts
Rxd (k) = xd ()eikTs d. (2.42)
2 2/Ts
As a consequence the mean power of the sampled signal, Exd (k)2 = Ex(kTs )2 satises:
Ts
Exd (k)2 = Rxd (0) = x ()d. (2.43)
2 2/Ts d
Chapter 2 35
There has already been discussed that we can analyze a realization of a stochastic process
x in two dierent ways: either as a deterministic nite-power signal where its nite-time
power spectrum is governed by the periodogram
1
|XN ()|2
N
which according to the proof of Proposition 2.4.3 leads to the expression
x () = RxN ( )ei ;
=
In the former situation, it is not trivial that the spectral properties, i.c. the periodogram,
converge for N . However it can be veried that in this situation under minor condi-
tions on the stochastic process,
1
lim E |XN ()|2 = x (). (2.47)
N N
For more details see Ljung (1999; 2D.3, p. 51).
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0.5 0.5
1 1
1.5 1.5
2 2
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
1.5
0.5
0.5
1.5
2
0 10 20 30 40 50 60 70 80 90 100
analyzing and processing measurement signals we have to face the situation that we have to
deal with both types of signals at the same time. For instance, consider the situation that
we excite a process with a user-designed signal, like e.g. a sum of sinusoids, a step function
etc., and we observe the output signal that is produced by the system as a response to this
input signal.
The observation of the output signal will generally not by exact, i.e. the measured output
signal will be contaminated by noise. The typical behaviour of a disturbance or noise signal
is that it is (a) not measurable and (b) changes with dierent realizations, i.e. when the
experiment is repeated a dierent disturbance signal will be present but probably with the
same statistical properties.
As a consequence one will frequently have to deal with signals of the form
where w(k) is some deterministic signal, being the result of a measurable signal that is
processed through a (linear) system, and v(k) being a realization of a zero-mean stationary
stochastic process.
In order to analyze the signal y(k), e.g. for determining its frequency content, or the
distribution of its energy/power over frequency, we have to determine what its principal
characteristics are.
If we consider y(k) to be a stochastic process, then this process will generally be nonsta-
tionary. This can be understood by realizing that
Ey(k) = w(k),
For dealing with these types of signals in a way that facilitates its analysis, a generalized
expectation operation is introduced, denoted by E 9 being dened by
N 1
1
Ey(k) = lim Ey(k).
N N
k=0
Ey(k) = Ey(k).
Note again that the expectation E is taken over the stochastic components in the signal,
while the time-averaging is taken over the deterministic components. For a deterministic
sequence to be quasi-stationary it has to be bounded, in the sense that the limit
N 1
1
lim u(k)u(k )
N N
k=0
Correlation function:
Ry ( ) := E[y(k)y(k )]
9
Note that the notation used can be slightly misleading. The expression Ey(k) is not a function of k
anymore, since an averaging over k is involved.
38 Version 13 February 2012
Cross-correlation function:
Ryu ( ) := E[y(k)u(k )]
Using these respective terms, one has to realize that they are formally correct only in the
case that we are dealing with a stationary stochastic process.
The generalized expectation operator also induces a generalized notion of power of a signal:
Py := E[y 2 (k)].
In the case of a stationary stochastic process this equals the ensemble-average power of the
process, while for a deterministic sequence it reects the time-average power of the signal.
It follows directly from the above relations and the inverse DFT that
2 1
E[y (k)] = y ()d,
2
and thus the power spectral density y () clearly reects the distribution of the power of
y over frequency.
Relating this notion of power of a quasi-stationary signal to the previously used notion of
power of a deterministic nite-power signal (the latter being dened on the basis of the
periodogram), it appears that for y being quasi-stationary:
1
E |YN ()|2 w y () for N ,
N
meaning that the convergence is weakly, i.e. it holds under some restrictive conditions.
For more details on this see Ljung (1999, p.38,39).
Two quasi-stationary signals are called uncorrelated if their cross-correlation function equals
zero.
with w(k) a quasi-stationary deterministic signal with spectrum w () and v(k) a zero-
mean stationary stochastic process with spectrum v (). Then
1
(a) E[w(k)v(k )] = limN N1 N k=0 w(k)Ev(k ) = 0, as v is zero-mean. So by
denition w and v are uncorrelated.
Chapter 2 39
y(t) = Gc (p)u(t),
where Gc (s) is a rational transfer function that species a linear, time-invariant, nite-
dimensional system.
If the input and output signals related to this system are sampled with a radial sampling
frequency s , the rst question to be answered is: Can we nd a discrete-time systems
relation between the sampled signals u(kTs ) and y(kTs )?
If the continuous-time signals u and y are band-limited, the sampling and reconstruction
analysis as presented in section 2.4.3 can be used to analyze the above question. Under the
condition that U () = 0 for || > s /2, the use of the reconstruction equation for u (2.33)
leads to (see the Appendix for a detailed analysis)
y(kTs ) = gd (
)u((k
)Ts ) (2.48)
=
s
with gd (
) = g(
Ts )sinc( )d. (2.49)
2
This shows two important things. Firstly the equivalent discrete-time system will in general
not be causal, i.e. gd (
) = 0 for
< 0; secondly it will generally not have a nite-dimensional
40 Version 13 February 2012
k
representation, which means that the discrete-time system Gd (z) = k= gd (k)z can
not simply be written as a rational transfer function
b0 z nb + bnb
Gd (z) = . (2.50)
z na + a1 z n 1 + + ana
The above two aspects show the shortcomings of this way of arriving at a discrete-time
representation of the concerned system. One of the reasons for these diculties is that
the sampling and reconstruction result of section 2.4.3 for band-limited signals requires an
innite number of data for reconstruction of the continuous-time signal.
Now one can again raise the question whether we can formulate a discrete-time system
relation between the sampled-data inputs and outputs, given a continuous-time system
y(t) = Gc (p)u(t).
Writing
y(kTs ) = gc ( )u(kTs )d (2.51)
=0
Ts
= gc ( )u(kTs )d (2.52)
=1 =(1)Ts
= gd (
)ud (k
) (2.53)
=1
This equivalent discrete-time system appears to be causal (gd (k) = 0, k < 0), and it can
also be shown to have a nite-dimensional representation. This latter phenomenon can be
characterized easily when using state space representations of both continuous-time and
discrete-time systems:
Given a continuous-time system G(s) being represented by the state space representation:
Then the zero order hold equivalent discrete-time system for a sampling period Ts is given
by
Ad = eAc Ts (2.60)
Ts
Bd = eAc d Bc . (2.61)
0
If the continuous-time system has state-space dimension n, then the equivalent discrete-time
system has the same state-space dimension.
For more details see e.g. the textbook by Astrom and Wittenmark (1984).
Frequency domain interpretation
A frequency domain formulation of the sampled system can be obtained by applying a
DTFT to (2.55). For u being a deterministic signal and under the assumption of zero
initial conditions (i.e. u(t) = 0 for t < 0)it follows that
Y () = Gd (eiTs )U ()
with Gd (eiTs ) = k= gd (k)e
ikTs the DTFT of g . The notation for the argument of
d
Gd has been chosen dierent from the argument of U and Y ; this is done to allow a simple
relation with the z-transform of gd , as indicated in the next subsection.
Similar to the situation of sampled-data signal spectra, the frequency response Gd (eiTs )
of the sampled-data system is periodic as a function of with period 2/Ts = s .
qud (k) = ud (k + 1)
q 1 ud (k) = ud (k 1)
with
G(q) = g(
)q .
=0
42 Version 13 February 2012
In the notation g(k) the subscript d is discarded for simplicity of notation. The sequence
{g(k)}k=0,1, is the pulse response of the system. With slight abuse of notation, we will
also refer to G(q) as the transfer function of the system. Strictly speaking, however, the
transfer function is dened by the function G(z):
G(z) = g(k)z k (2.62)
k=0
This condition means that the series expansion (2.62) is convergent for |z| 1, which
implies that G(z) is analytic (i.e. has no poles) in the region given by |z| 1, i.e. on and
outside the unit circle in the complex plain.
Additionaly a discrete-time system will be called monic if g(0) = 1.
Consequently
y(k) = |G(ei )| cos(k + ) (2.66)
with = arg[G(ei )].
The complex-valued function G(ei ) is referred to as the frequency response of the discrete-
time system. It evaluates the transfer function in the complex plain over the unit circle
z = ei . Note the dierence with a continuous-time system, where the frequency response
is reected by the evaluation of G(s) over the imaginary axis s = i.
For discrete-time systems with a real-valued pulse response (and thus real-valued coe-
cients) it holds that G(ei ) = G(ei ) and so for reasons of symmetry, full information on
the frequency response of the system is obtained by G(ei ) for [0, ]. In Figure 2.6 this
is illustrated for a rst order system, given by
zb
G(z) = . (2.67)
za
Chapter 2 43
ei
\
B
B\
B \
B \
B 1 \ 2
? Bc \
b a
Figure 2.6: Zero/pole location and evaluation of frequency response of rst order discrete-
time system
The rst equation generates the amplitude Bode plot, whereas the second denes the phase
Bode plot. For the considered rst order system these Bode plots are given in Figure 2.7,
where the values b = 0.3 and a = 0.8 are chosen. Note that the frequency function is given
for frequencies up to = .
In this discrete-time case, unlike the situation in the continuous-time case, there is no
asymptotic point in frequency where tends to. Note that in the discrete-time case the
phase contribution of every (real) zero varies between 00 and +1800 , while the contribu-
tion of each (real) pole varies between 00 and 1800 . For a complex conjugate pair and
zeros/poles it can simply be veried that the contribution to the phase in = is given
by respectively +3600 and 3600 .
20
10
Gain dB
-10 -2 -1 0 1
10 10 10 10
Frequency (rad/sec)
0
Phase deg
-20
-40
-60 -2 -1 0 1
10 10 10 10
Frequency (rad/sec)
Figure 2.7: Bode-amplitude and Bode-phase plot of rst order discrete-time system
and consequently
uy () = G (ei )u () = G(ei )u (). (2.76)
The combination of the above results is shown in Figure 2.9, and lead to the expressions:
Figure 2.8: Cross-correlation function of two signals that are related through a dynamical
system G.
Chapter 2 45
Figure 2.9: Auto-correlation function of two signals that are related through a dynamical
system G.
G1 y1
u
G2 y2
Figure 2.10: Two signals y1 , y2 originating from the same source signal u.
If there are two dynamical systems G1 , G2 as depicted in Figure 2.10, that generate y1 , y2 ,
then application of the formulas above lead to the results:
y2 = G2 G1
1 y1
and therefore
y2 y1 = G2 G1
1 y1 (2.79)
= G2 G1 2
1 |G1 | u = G2 G1
1 [G1 G1 ]u (2.80)
= G2 G1 u (2.81)
so that
y2 y1 () = G2 (ei )G1 (ei )u ().
If u is a deterministic sequence for which the DTFT exists, then additionally, under the
assumption of zero initial conditions (i.e. u(t) = 0, t < 0):
Y () = G(ei )U ().
y = fft(x)
46 Version 13 February 2012
A DFT is calculated for a discrete-time signal present in vector x with length N according
to
N
2
XN (k) = x(j)ei N (j1)(k1) (2.82)
j=1
x = ifft(y)
[SYSD] = c2d(SYSC,Ts,Method)
[SYSC] = d2c(SYSD,Method)
Besides the zero-order hold equivalence, there are more possibilities for converting continuous-
time and discrete-time systems.
Frequency responses of discrete-time systems can be obtained by the commands bode and
freqresp.
Chapter 2 47
CTFT it 1
U () = u(t)e dt u(t) = U ()eit d
2
Ts
DTFT
Us () := ud (k)eikTs ud (k) = Us ()eikTs d
2 2/Ts
k=
1
DTFT, Ts = 1
Us () := ud (k)e ik ud (k) = Us ()eik d
2 2
k=
1 N 1
1
N
DFT
s 2
s i 2 k
UN ( )= ud (k)ei N k ud (k) = UN ( )e N
N N N
k=0 =0
2.9 Summary
In this section a brief review and summary of the basic concepts in signals and systems
analysis has been presented and the appropriate notation has been set. In order to deal
with signals that are composed of both a deterministic and a stochastic component (as
will happen in many engineering applications of measured signals) the notion of quasi-
stationary signals has been discussed. For a detailed treatment of the fundamentals the
reader is referred to more specialized textbooks, as listed in the bibliography.
48 Version 13 February 2012
Appendix
Proof of (2.25); DTFT of periodic signal.
Combining the two expressions:
Ts
ud (k) = Us ()eikTs d (2A.1)
2 2
Ts
0 1
N
2
iN k
ud (k) = a e 0 . (2A.2)
=0
showing that
N0 1
2
Us () = a c ( 0
).
Ts
=0
Extending the above expression to the full domain of this leads to
2
Us () = a c ( 0
).
Ts
=
with
gd (
) = gc ( )sinc(N (
Ts ))d (2A.5)
= gc (
Ts )sinc(N )d (2A.6)
where the latter equation follows from the fact that the integral expression is a convolution,
which can be rewritten by change of variables
Ts .
Appendix 49
Example 2A.2 (DTFT of a periodic signal) Let u(t) be a periodic signal with length
N = rN0 and basic period N0 . Then
rN 0 1
UN () = u(t)eit
t=0
0 1
r N
= u(m)ei[(1)N0 +m]
=1 m=0
r 0 1
N
i(1)N0
= e u(m)eim . (2A.7)
=1 m=0
showing that
/Ts
Ts 1
RuN (0) = |UN ()|2 d.
2 /Ts N
50 Version 13 February 2012
With Lemma 2A.1 the sum of exponentials will equal N (k m) 10 , which proves the
validity of the transform pair.
Bibliography
K.J. Astrom and B. Wittenmark (1984). Computer Controlled Systems: Theory and Design.
Prentice Hall Inc., Englewood Clis, NJ.
R.L. Fante (1988). Signal Analysis and Estimation - An Introduction. John Wiley & Sons,
Inc., New York.
G.M. Jenkins and D.G. Watts (1968). Spectral Analysis and its Applications. Holden-Day,
Oakland, CA.
E.W. Kamen and B.S. Heck (2007). Fundamentals of Signals and Systems Using the Web
and Matlab. Prentice Hall, Upper Saddle River, NJ, 3/e.
L. Ljung (1999). System Identication - Theory for the User. Prentice-Hall, Englewood
Clis, NJ, 2/e.
L. Ljung and T. Glad (1994). Modeling of Dynamic Systems. Prentice Hall, Englewood
Clis, NJ.
A.V. Oppenheim and R.W. Schafer (1989). Discrete-Time Signal Processing. Prentice Hall,
Englewood Clis, NJ.
C.L. Phillips, J.M. Parr and E.A. Riskin (2008). Signals, Systems and Transformations.
Prentice Hall, Englewood Clis, NJ, 4/e.
K.S. Shanmugan and A.M. Breipohl (1988). Random Signals - Detection, Estimation and
Data Analysis. John Wiley & Sons, Inc., New York.
C.W. Therrien (1992). Discrete Random Signals and Statistical Signal Processing. Prentice
Hall, Englewood Clis, NJ.
10
() is the discrete pulse function, i.e. (k) = 1 for k = 0 and (k) = 0 elsewhere.
Chapter 3
Identication of nonparametric
models
3.1 Introduction
The identication of nonparametric system models is very often a rst step in obtaining
experimental information on the dynamic properties of a dynamical system. Most common
examples of nonparametric system properties are:
Frequency response in terms of the Bode amplitude and phase plot;
Nyquist curve;
Step and/or pulse response.
These phenomena are helpful in assessing the properties of a dynamical system, and or often
used by system engineers for purposes of systems analysis and/or synthesis. They are
referred to as nonparametric models, whenever they are not constructed from a model
representation with a limited number of coecients, as e.g. a state space model with a
limited state dimension. This despite of the fact that every frequency response or time
signal is stored digitally in our computer memory by a nite number of coecients.
In order to specify nonparametric models of dynamic systems on the basis of measurement
data, attention has to be given to related nonparametric representations of signal properties,
such as correlation functions and signal spectra. Therefore the estimation of correlation
functions and spectra of signals is a necessary part of nonparametric identication.
In this chapter, we will consider a data generating system of the form
y(t) = G0 (q)u(t) + v(t) (3.1)
where v is a zero-mean stationary stochastic process with spectral density v () and u is a
quasi-stationary deterministic signal, being uncorrelated to v. All statistical properties of
estimators will be based on the stochastic process v only, and not on realizations of input
signals. The input signal u is considered to be a xed measured data sequence; this still
allows u to be generated as a realization of a stochastic process1 .
In the sequel of this chapter, several principles will be discussed to obtain information on
G0 from measurement samples of u and y.
1
For the situation of stochastic input signals the reader is referred to e.g. Priestley (1981) and Broersen
(1995).
51
52 Version 14 February 2012
or equivalently,
Ryu ( ) = g0 (k)Ru ( k) (3.3)
k=0
where {g0 (k)}k=0,, is the pulse response sequence of G0 . Equation (3.3) is known as the
Wiener-Hopf equation.
Note that by correlating the input and output signals of G0 the disturbance signal v has
been eliminated from the equations.
If the input signal is a white noise process, i.e. Ru ( ) = u2 ( ), then it follows straightfor-
wardly that
In this case the set of equations (3.3) can be written into a nite matrix equation:
Ryu (0) Ru (0) Ru (1) Ru (ng 1) g0 (0)
..
Ryu (1) Ru (1) Ru (0) .
g0 (1)
.. = . . . (3.7)
. .. . .. . .. .. .
.
Ryu (ng 1) Ru (ng 1) Ru (0) g0 (ng 1)
If there are only N measurement samples of the signal available, the most straightforward
estimate of the correlation function seems to be
N 1
1
RuN ( ) := u(t)u(t ), | | N 1 (3.9)
N t=0
known as the sample correlation function. Note that in the above expression for the sample
correlation, the eective summation interval has length N , as we cannot use measure-
ments of u(t) for t < 0 or t N . The following properties can be formulated for this
estimate, provided that u is not deterministic.
N | |
(a) E RuN ( ) = Ru ( ) and thus is the estimate biased for = 0. However asymp-
N
totically, for N the bias will disappear, i.e. limN E RuN ( ) = Ru ( ).
(c) var(RuN ( )) = O(1/N ), meaning that for large enough N the variance becomes pro-
portional to 1/N , and thus the variance tends to zero for innite number of data.
The exact expressions for (co)variance of the estimates are rather complicated. Ap-
proximate expressions for large N are known for several specic situations, e.g. in
54 Version 14 February 2012
More complex expressions are available for u being a ltered white noise process with
bounded moments up to order 4 (see Soderstrom and Stoica, 1989, pp. 570).
If u is deterministic, then the above analysis does not apply, as in this case simply Ru ( ) =
limN RuN ( ), being a non-stochastic variable, provided that the limit exists.
Whereas the sample correlation estimate is not unbiased for nite N , it has an additional
property that makes it attractive to use. This is formulated in the following lemma (see
e.g. Oppenheim and Schafer (1989)).
Lemma 3.3.1 Let u be quasi-stationary, dened on the time interval [0, N 1]. Consider
the sample correlation,
N 1
1
RuN ( ) := u(t)u(t ), | | N 1
N
t=0
:= 0 | | N. (3.12)
Then the Discrete-Time Fourier Transform of this sample correlation satises:
1
RuN ( )ei = |UN ()|2 (3.13)
=
N
N 1
with UN () the DTFT of the signal, UN () = t=0 u(t)eit .
This lemma states that the sample correlation function through Fourier transform is related
to the periodogram N1 |UN ()|2 (see section 2.4). Since there exist ecient and fast compu-
tational methods for calculating Fourier transforms of signals (Fast Fourier Transform), it is
not uncommon to calculate the sample correlation function of a signal by rst determining
UN (), and subsequently inverse Fourier transforming the periodogram to obtain RuN ( ).
Chapter 3 55
In line with the previous discussion this estimate is asymptotically unbiased, its variance
decays with 1/N and for the situation of y, u being a multivariate zero-mean stationary
stochastic process, the asymptotic covariance of the estimates satises
1
N
cov{Ryu N
(1 ), Ryu (2 )} {Ru (m)Ry (m+2 1 )+Ryu (m+2 )Ruy (m1 )}. (3.16)
N m=
In the situation that y and u are related through a dynamical system, according to (3.1) a
more specic result holds true, as formulated next.
Proposition 3.3.2 Let u and y be related according to (3.1) with v0 a zero-mean stationary
stochastic process with bounded fourth moment, and u a deterministic signal. Then
N
N (Ryu ( ) ERyu ( )) AsN (0, ) (3.17)
with
:= Rv0 (t)Ru (t) (3.18)
t=
N ( ) it follows that
Proof: Substituting (3.1) into the expression for Ryu
N 1 N 1
1 1
N
Ryu ( ) = G0 (q)u(t)u(t ) + v(t)u(t ).
N N
t=0 t=0
As a result
N 1
1
N
N (Ryu ( ) ERyu ( )) = v(t)u(t ).
N t=0
The asymptotic properties of the right hand side expression are derived in Hakvoort and
Van den Hof (1995), based on the work of Ljung (1987) and Hjalmarsson (1993). 2
In other words, in the considered situation the sample cross-correlation function is asymp-
totically unbiased and its asymptotic variance is given by /N . As the variance decays to
zero for increasing N , the estimate is also consistent. Asymptotically the estimate has a
Gaussian distribution.
With respect to the Fourier transform of the sample cross-correlation function, it can be
veried that as a simple extension of Lemma 3.3.1:
1
N
Ryu ( )ei = YN ()UN () . (3.19)
=
N
56 Version 14 February 2012
3.3.4 Summary
For (jointly) quasi-stationary signals u and y, sample auto- and
cross-correlation functions
N 1
1
RyN ( ) := y(t)y(t ) (3.20)
N
t=0
N 1
1
N
Ryu ( ) := y(t)u(t ) (3.21)
N
t=0
u(t) = c cos(t).
Given the system relation (3.1) the response of the system, after transient eects have
disappeared, will be given by
and similarly
N 1 N 1
1 1
yc (N ) = ca cos(t + ) cos(t) + v(t) cos(t) (3.27)
N t=0 N t=0
N 1 N 1
ca 1 ca 1
= cos + cos(2t + ) + v(t) cos(t). (3.28)
2 N 2 N
t=0 t=0
It can be veried that for N the second terms on the right hand side of (3.26)
and (3.28) will tend to zero. This will also hold for the third terms in these expressions2 ,
provided that the noise signal v does not contain any pure sinusoids with frequency .
In this situation we obtain:
ca
ys (N ) sin (3.29)
2
ca
yc (N ) cos (3.30)
2
motivating the estimate:
2
Re G(ei ) = a cos = yc (N ) (3.31)
c
2
Im G(ei ) = a sin = ys (N ) (3.32)
c
or similarly:
2
|G(ei )| = a := [ys (N )2 + yc (N )2 ] (3.33)
c
ys (N )
arg(G(ei )) = := arctan . (3.34)
yc (N )
Note that by using the relation:
1
YN () = yc (N ) iys (N ) (3.35)
N
2
The variance of the third terms decay with 1/N provided that =0 |Rv ( )| < (Ljung, 1999).
58 Version 14 February 2012
it follows that
YN ()
G(ei ) = . (3.36)
N c/2
An estimate has been constructed for the frequency response at the particular frequency .
Again, by repeating the experiment for several dierent values of , one can obtain insight
into the frequency response of the system over a specied frequency region.
As we can observe from example 2.4.1, for the considered input signal, it follows that
UN () = N c/2, showing that the constructed estimate for one frequency can also be written
as
YN ()
G(ei ) = . (3.37)
UN ()
The above estimate will be shown to have a close relationship with the estimate that can
be obtained through Fourier analysis, as discussed next.
is dened as the Empirical Transfer Function Estimate (ETFE), see Ljung (1987). This
name refers to the fact that an estimate of the input/output transfer function is obtained
by simply taking the quotient of the Fourier transforms of output and input signal, given a
single data sequence of input and output signals. No assumptions on the underlying data
generating system have been imposed, except for its linearity.
For analyzing the properties of this estimate, we rst have to consider the following result
(for the formal statement and proof see Theorem 3A.1).
with
2k
RN () = 0 for all = N , k Z if u is a periodic signal with
period N ;
The term RN () reects the contribution of data samples that are outside the measurement
interval [0, N 1]. This contribution will be exactly zero (no leakage) whenever the signal
u on the interval [0, N 1] is periodically extended outside this interval.
A second result that we need for the analysis of the ETFE estimate is related to the noise
contribution.
Chapter 3 59
Let v(t) = H(q)e(t), with H stable and e a zero-mean white noise pro-
cess. Then
2k
EVN () = 0 for all = , k Z. (3.40)
N
1
E |VN ()|2 v () for N . (3.41)
N
Again, the formal statement and proof are added in Theorem 3A.2.
Going back to the systems equation (3.1):
Bias of ETFE-estimates
If during the experiment the input signal is taken as a signal in the interval [0, N 1]
that is continued periodically outside this interval, then RN () = 0 for frequencies in the
frequency grid N . Consequently
E GN (ei ) = G0 (ei )
and so the ETFE estimate is unbiased at those frequencies in the frequency grid N where
UN () = 0.
If knowledge about this periodic extension of the input signal is lacking, the ETFE estimate
will incur a bias error, reected by
RN () c1 / N
|E GN (e ) G0 (e )| =
i i .
UN () 1 U ()
N N
The bias error will vanish asymptotically at those frequencies where 1 UN () = 0, i.e.
N
those frequencies where u () = 0.
Variance of ETFE-estimates
Since
VN () 2
E(|GN (e ) E GN (e )| ) = E(
i i 2 )
UN ()
60 Version 14 February 2012
and u is considered a deterministic sequence with power spectral density u () = N1 |UN ()|2 ,
the statistical properties of VN (), formulated in (3.41), then lead to the property that for
all N :
v ()
lim var(GN (ei )) = .
N u ()
With increasing r, the periodogram N1 |UN ()|2 will tend to zero at frequencies outside the
frequency grid = 2k
N0 , k integer, while for the frequencies within this grid the periodogram
will increase like r. As a result, the variance of the ETFE in these frequency points will
decay like 1/r and so tend to 0 for N .
Stated dierently, because of the periodic nature of the input signal, the asymptotic power
spectral density function u () of u will only have nonzero contributions in the frequency
grid N0 . In those frequencies where u () = 0, it will contain Dirac pulses, and therefore
the variance of the ETFE-estimate will tend to zero. In combination with the fact that the
estimate is asymptotically unbiased, it follows that GN (ei ) is consistent in this case.
ETFE-estimate from general quasi-stationary deterministic input signals
For general input signals the ETFE estimate remains (asymptotically) unbiased. For in-
v ()
creasing values of N , the variance contribution approaches u ()
for all N . However
now, for increasing N this expression is not guaranteed to converge to 0; it is equal to the
noise-to-input signal ratio at the particular frequencies considered. The appealing result is
thus that the variance of an ETFE-estimate at a particular frequency is determined by the
noise-to-signal ratio at that frequency. For increasing N the frequency grid becomes more
dense, i.e. an estimate is obtained at an increasing number of frequencies, but the variance
of these estimates does not improve. Note that in this situation the frequency resolution is
given by 2/N .
The dierence between the two experimental situations discussed above is quite remarkable.
An unbiased ETFE is obtained either at a xed and limited number of frequencies with
decaying variance, or at a growing number of frequencies3 , but with non-vanishing variance.
A possible bias problem induced by leakage can also be avoided by applying tapering , i.e.
by taking care that the considered input signal is preceded with Ng zeros and ends also
with Ng zeros where Ng is the length of the pulse response of the system (see Theorem
3A.1). An example of this situation is depicted in Figure 3.1 where a tapered input signal
is sketched for N = 200 and Ng = 50.
3
neglecting the frequencies at which a bias error occurs.
Chapter 3 61
3
50 0 50 100 150 200
6
50 0 50 100 150 200
Figure 3.1: Tapered input signal (upper gure) and output signal (lower gure) for N = 200,
Ng = 50.
Summary of results
The ETFE-estimate
YN ()
GN (ei ) =
UN ()
has the following properties:
Example 3.4.1 The eect of choosing dierent input signals for obtaining an ETFE is
illustrated by applying the estimator to simulation data obtained from the system
b1 z 1 + + b5 z 5
G0 (z) = (3.45)
1 + a1 z 1 + + a5 z 5
62 Version 14 February 2012
1 1.38z 1 + 0.4z 2
H0 (z) = , (3.46)
1 1.9z 1 + 0.91z 2
while the white noise signal e0 has variance 0.0025, leading to a signal-to-noise ratio at the
output of 11.6dB, being equivalent to around 30% noise disturbance in amplitude on the
noise-free output signal. As input signal we have chosen two dierent options:
1 1
10 10
0 0
10 10
1 1
10 10
2 2
10 10
3 3
10 10
3 2 1 0 1 3 2 1 0 1
10 10 10 10 10 10 10 10 10 10
Figure 3.2: Left: Bode-amplitude plot of ETFE based on random input with length N =
2048 (dotted), and of data generating system G0 (solid-line); Right: Bode-amplitude plot
of ETFE based on periodic input with N = 2048 and period length N0 = 128 (o), and of
data generating system G0 (solid-line).
(a) A zero mean unit variance white noise signal with length N = 2048;
(b) A zero mean unit variance white noise signal with length N0 = 128, but repeated 16
times, arriving at an input signal with length N = 2048.
The results of the ETFE are sketched in Figure 3.2. It shows the ETFE obtained in situation
(a) compared to the frequency response of G0 . Estimates are obtained for 1024 frequency
points in the interval [0, ]. As is clearly illustrated by the result, the ETFE is very erratic.
The result for situation (b) is given in the right plot, where estimates are obtained for
(only) 64 frequencies in the interval [0, ]. However in this situation the reduced variance
of the estimates is clearly shown.
(3.47) should achieve an averaging of G(ei ) over neighboring points, where the weight
of G(ei ) should be (inversely) related to | |; i.e. it should be maximum for =
and decrease for increasing | |. This phenomenon is due to the fact that G(ei )
is a less reliable estimate of G(ei ) the further is away from ;
In this averaging, the several measurement points G(ei ) should be weighted with a
weight that is inversely proportional to their variance; i.e. ETFE-points that have a
large variance are less reliable, and should contribute less to the nal estimate. Note
that the variance of G(ei ) asymptotically equals v ()/[ N1 |UN ()|2 ].
Combining both arguments, and employing the assumption that v () can be considered
constant over the frequency range + where the window is active, we can
write
W ( )|UN ()|2 GN (ei )d
GN (ei ) = ,
W ( )|UN ()|2 d
W ( )YN ()UN ()d
= , (3.48)
2
W ( )|UN ()| d
In practice the ETFE will be available in a nite number of frequencies. This implies that in
smoothing algorithms the integral expressions in (3.48) can not be calculated exactly; both
numerator and denominator will have to be approximated by a discrete-time convolution
on the frequency response samples of YN ()UN () and |U ()|2 .
N
yu () = G0 (ei )u () (3.50)
and so
yu ()
G0 (ei ) = (3.51)
u ()
for those frequencies where u () > 0. This directly leads to a suggestion for an estimate
of the frequency response through
yu ()
G(ei ) = . (3.52)
u ()
Thus by estimating the two spectral densities in the above expression, a direct frequency
response estimate of the system results.
It will appear that this spectral estimate of a transfer function will have a close relationship
to the (smoothed) ETFE presented in the previous section.
If the signal u contains periodic components, then the periodogram (as well as the spectral
density) will become unbounded for increasing N . This is illustrated in example 2A.2. As
a result, sinusoidal components in a quasi-stationary signal will appear as clear-cut peaks
in the periodogram.
For other situations the following result can be stated.
(a) lim E N
u () = u ();
N
2 2
u () u ()) = (u ()) ;
(b) lim E(N
N
(1)
u (1 ) u (1 )][u (2 ) u (2 )] = RN , and
(c) E[N N
(1)
limN RN = 0 for |1 2 | 2 N.
The theorem shows that in the considered situation the periodogram estimate is an asymp-
totically unbiased estimate of the spectral density. However, its variance is equal to the
square of the spectral density itself, and so it does not tend to zero. This situation is
similar to the situation of ETFEs as discussed in the previous section, where actually the
same mechanisms are present. Theorem 3.5.1 is closely related to Theorem 3A.2 which was
formulated for stationary stochastic processes only. In case u is a deterministic sequence,
the expectation operator in Theorem 3.5.1 can simply be discarded, as in that case the
estimate is not a stochastic variable.
Generally, the periodogram estimate for a (non-periodic) measurement signal will be an
erratic function over frequency, uctuating heavily around its correct value.
The arguments and results that are given for estimating auto spectral densities hold equally
well for estimating cross-spectral densities. In that case
1
N
yu () :=
N
Ryu ( )ei =
YN ()UN (). (3.56)
=
N
We can incorporate this mechanism in the spectral estimate, by applying a so-called lag
window to the correlation function estimate before applying the Fourier transform:
N
u () = w ( ) RuN ( )ei . (3.57)
=
w ( ) = 0 > (3.58)
showing that > 0 is a variable that determines the width of the window. This lag window
causes w ( )RuN ( ) to be regularized to zero. The smaller the value of , the bigger the
part of the sample correlation estimate that is smoothed out. The higher the value of ,
the less smoothing is taking place. A typical choice for a lag window is e.g. a rectangular
form:
w ( ) = 1 0 (3.59)
= 0 > . (3.60)
However more general choices are also possible, having a more smooth decay of the window
towards zero. Three popular choices of windows are sketched in Figure 3.3 and characterized
in Table 3.1. For an extensive list of windows the reader is referred to Jenkins and Watts
(1968), Brillinger (1981) and Priestley (1981).
The three lag-windows are also sketched in Figure 3.3.
Lag windows in spectral estimation
1.5
0.5
0.5
2 1.5 1 0.5 0 0.5 1 1.5 2
times gamma
Figure 3.3: Lag-windows w ( ); rectangular window (solid), Bartlett window (dashed), and
Hamming window (dash-dotted).
|Ru ( )| << Ru (0) for in order to guarantee that interesting dynamics is not
smoothed out.
Chapter 3 67
2 W () w ( ), 0 | |
sin(+ 12 )
Rectangular sin(/2) 1
2
1 sin /2
Bartlett sin /2 1
1 1 1
2 D () + 4 D ( )
Hamming 2 (1 + cos
)
+ 14 D ( + ), where
sin(+ 12 )
D () = sin(/2)
The rst point refers to a sucient reduction of variance, whereas the second point refers
to the avoidance of substantial bias.
The application of a lag window in the time domain, has a direct interpretation as a
smoothing operation in the frequency domain. Using the fact that RuN ( ) and N u () are
related through Fourier transform, and using the fact that multiplication in the time-domain
relates to convolution in the frequency domain, the following derivation becomes trivial
N
u () = w ( ) RuN ( )ei (3.61)
=
where the frequency window W () is the Fourier transform of the lag window w ( ), i.e.
1
w ( ) = W ()ei d, (3.65)
2
Frequency windows in spectral estimation; gamma=20 Hamming frequency window for different resolutions
25 40
35
20
30
15 25
20
10
15
5 10
5
0
0
5 5
1 0.5 0 0.5 1 1 0.5 0 0.5 1
frequency (rad/sec) frequency (rad/sec)
This idea refers to the classical way of reducing the variance of an estimate by taking
averages of several independent estimates. Independence of the several estimates will only
be possible when the several data segments do not overlap. In general the data segments
will be chosen with a length N0 that is a power of 2, in order to facilitate ecient calculation
of the periodograms through Fast Fourier Transform.
Also in this method the conicting aspects of reducing the variance and obtaining a high
frequency resolution are present. Variance reduction is related to the number of averages
r that is achieved, while the frequency resolution is related to the number of data samples
in a data segment. In nding a satisfactory compromise between these choices, the use of
overlapping data segments is also possible.
If u is a stochastic process, this method of periodogram averaging has some relationship
with a windowing operation related to the windows discussed in section 3.5.3. This can be
observed from the following derivation.
Let uj (t) denote the signal in the j-th data segment, then
N0 1
1 2 i 1
|UN0 ,j ()| = N0
Ruj ( )e = uj (t)uj (t )ei , (3.68)
N0 =
N0 = t=0
Chapter 3 69
using the convention that uj (t) := 0 outside the interval [0, N0 1].
As a result we can write
N0 1
1 1 1 1
r r N0
2
|UN0 ,j ()| = uj (t)uj (t )ei (3.69)
r N0 r N0 t=0
j=1 j=1 =N0
which leads to
N0
N0 | |
E N
u () = Ru ( )ei , (3.70)
N0
=N0
here.
When no smoothing is applied to the ETFE and the spectral estimate, both algorithms
lead to one and the same estimated frequency response GN (ei ).
70 Version 14 February 2012
0
10
1
10
2
10 2 1 0 1
10 10 10 10
Amplitude Bodeplot of ETFE
1
10
0
10
1
10
2
10 2 1 0 1
10 10 10 10
Figure 3.5: Frequency response estimates by (smoothed) ETFE. Upper: Bode amplitude
plot of G0 (solid), ETFE (dashed) and smoothed ETFE with = 200 (dotted). Lower:
Bode amplitude plot of G0 (solid), ETFE with = 30 (dashed) and ETFE with = 70
(dash-dotted).
Results are given for several values of the width of the Hamming window that is applied in
order to smooth the ETFEs. As is clear from the upper plot in Figure 3.5 the raw ETFE is
a very erratic function. Smoothing the ETFE with a very narrow frequency lter ( = 200)
reduces the variance drastically.
The lower plot in Figure 3.5 shows that when using a wider frequency window (smaller
values of ), the variance in the estimate is reduced, however at the cost of an increasing
bias. Especially for the given system G0 which has rather sharp curves in its frequency
response, an appropriate choice of window is a rather dicult task.
Figure 3.6 shows similar results, now for frequency response estimates that are obtained
by spectral analysis, with a Hamming window being applied to the sample cross-variance
function. This sample cross-correlation function, together with the Hamming windows that
have been applied, are depicted in Figure 3.7. As is clearly illustrated in this latter gure,
the choice = 10 causes a substantial part of the system dynamics to be ltered out by the
window, leading to a substantial bias. The choice = 30 is the version that is the default
choice suggested by the appropriate MATLAB-routine.
Chapter 3 71
0
10
1
10
2 1 0 1
10 10 10 10
Phase Bodeplot of spectral estimate
50
50
100
150
200 2 1 0 1
10 10 10 10
Figure 3.6: Frequency response estimates by (smoothed) spectral analysis using a Hamming
window with lag width = 10, 30, 70. Bode amplitude (upper) and phase (lower) plot of
G0 (solid) and estimates with = 10 (dashed), = 30 (dash-dotted) and = 70 (dotted).
1.2
0.8
0.6
0.4
0.2
0.2
0 10 20 30 40 50 60 70
specied in (3.1). As the signal v is not directly measurable estimates of v () will have
to be based on input and output measurements also.
As
|yu ()|2
v () = y () . (3.74)
u ()
|N 2
yu ()|
v () = y ()
N N
. (3.75)
N
u ()
The quantity
|yu ()|2
Cyu () := (3.76)
y ()u ()
is dened as the coherency spectrum between y and u. Taking positive real values between
0 and 1, it acts as a kind of frequency-dependent correlation coecient between the input
and output. It is closely related to the signal-to-noise ratio of the system, as it follows from
simple calculations that
v () = y ()[1 Cyu ()2 ]. (3.77)
For Cyu () = 1, the noise spectrum is zero at this frequency, and so the output of the
system is completely determined by the noise free part |G0 (ei )|2 u (). For Cyu () = 0,
the output spectrum equals the noise spectrum, y () = v (), and there is no contribution
of the input signal in the output.
Chapter 3 73
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Frequency
Figure 3.8: Estimated coherency spectrum for data in example 3.5.2. Estimates based
on the Welch method of periodogram averaging with the data sequence divided into 10
segments of 256. The frequency axis is in rad/sec.
For the data as handled in example 3.5.2 the estimated coherence spectrum is given in
Figure 3.8.
Note that in this example, the dips in the estimated coherence spectrum are not caused
by an increasing power of the noise disturbance term at these frequencies, but rather by
two zeros of G0 that are located close to the unit circle, with frequencies 0.09rad/sec and
0.055rad/sec.
For constructing an estimate of the coherency spectrum it is necessary that the auto- and
cross-spectral densities are smoothed versions of the periodogram estimates. If no smooth-
ing is applied, and consequently the (raw) periodograms are taken as spectral estimates,
then substitution into (3.78) shows that
N [ N1 YN ()UN ()]2
Cyu () = 1 2 1 2
= 1. (3.79)
N |YN ()| N |UN ()|
The estimate of the coherency spectrum will in this situation be xed to 1, and it will not
reveal any relevant information.
74 Version 14 February 2012
R = covf(z,M,maxsize)
R = covf(z,M)
[ir,r,cl] = cra(z,M,NA,PLOT)
ir = cra(z)
For output and input data present in z, correlation analysis provides an estimate of the
pulse response of the underlying system delivered in ir, having length M and starting with
the direct feedthrough term.
The applied procedure is:
Estimate an AR-model for the input signal u, i.e. A(q)u(t) = (t) (see chapter 5).
Construct the signals uF (t) = A(q)u(t) and yF (t) = A(q)y(t). A(q) is a (prewhiten-
ing) lter constructed to make uF as white as possible.
[G] = etfe(DATA,M,N)
[G] = etfe(DATA,M)
[G] = etfe(DATA)
Chapter 3 75
For output and input data present in DATA, in a IDDATA-object, a smoothed ETFE of the
frequency response is provided in the IDFRD-object G, which is similar to a spectral analysis
estimate by smoothing the raw spectral estimates with a Hamming window in the frequency
domain. The Hamming window has a lag equal to M. The estimate G corresponds to
()
W () YN ()UN
G(ei ) = ,
W () |UN ()|2
where the Hamming window W () is applied in the (discretized) frequency domain. The
IDDATA- and IDFRD-objets are dened within Matlabs System Identication Tooolbox.
By default M=[ ] and no smoothing is applied; in this case G contains the ETFE of the
process.
N is the number of frequency points; the frequency points are restricted to be linearly
distributed over the interval (0, ]. N must be a power of 2. By default N = 128.
[G] = spa(DATA,M,w,maxsize)
[G] = spa(DATA)
[Gtf,Gnoi,Gio] = spa(DATA,...)
For output and input data present in the IDDATA-object DATA, a spectral estimate of the
frequency response is provided in IDFRD-object G, using a spectral analysis estimates N
yu ()
N
and u (), with a Hamming window with lag = M. Additionally the noise spectrum is
estimated in Gnoi and the spectrum of the joint input and output signals in Gio. The
estimate G corresponds to
w ( )Ryu ( )ei
=
G(ei ) = .
i
w ( )Ru ( )e
=
Pxx = pwelch(x)
[Pxx,w] = pwelch(x)
Given a time-signal in x, an estimate are provided for the spectral density of this signal, by
applying the Welch method of periodogram averaging for spectral estimation. w contains
the frequencies at which the spectral density is estimated.
Cxy = mscohere(x,y)
For time signals present in vectors x and y, mscohere provides the magnitude squared
coherence estimate Cxy, using Welchs averaged periodogram method.
3.7 Summary
In this section basic methods have been discussed to estimate a non-parametric frequency
response (sometimes called a FRF - frequency response function) directly from input and
output data.
For periodic input signals and an integer number of periods, the Emperical Transfer Func-
tion Estimate (ETFE) leads to an unbiased estimate in a xed number of frequencies with a
variance tending to 0 for increasing data length. For general input signals (e.g. deterministic
sequences generated by a random generator) the ETFE will generally be an erratic function
of frequency, which requires a smoothing operation before it can be sensibly interpreted.
Appendix 77
Appendix
k be a dynam-
Theorem 3A.1 (Transformation
of DTFTs) Let G(q) = k=0 g(k)q
ical system satisfying k=0 k|g(k)| < , relating the quasi-stationary signals u(t), y(t)
through:
y(t) = G(q)u(t)
while u satises |u(t)| u for all t (including t < 0). Then
with
2k
(a) If u is periodic with period N , then RN () = 0 for = N .
(b) For general u, |RN ()| c1 and c1 = 2u k|g(k)|.
k=0
Ng k
(c) If G has a nite pulse response: G(q) = k=0 g(k)q then RN () = 0 if u(t) = 0
for t [Ng , 1] [N Ng , N 1].
N 1
YN () = g(k)u(t k)eit
t=0 k=0
N 1
= g(k)eik u(t k)ei(tk)
t=0 k=0
By change of variables: t k :
1k
N
ik
YN () = g(k)e u( )ei .
k=0 =k
2k
which for frequencies = N will satisfy
N 1
N 1
2k 2k
u( )ei N
( N )
= u( )ei N
( )
=N k =N k
78 Version 14 February 2012
1k i =
becoming equal to the third term in (3A.2). As a result for each value of k, N
=k u( )e
N 1 i = U (), showing that for = 2k , Y () = G(ei )U (ei ).
=0 u( )e N N N N
The proof of part (b) can be found in Theorem 2.1 in Ljung (1999). The proof of (c) follows
from a similar analysis and is left as an exercise. 2
v(t) = H(q)e(t)
with e a zero-mean white noise process with variance e2 and bounded moments of all orders,
and H a stable proper transfer function. Let N denote the frequency grid determined by
2k
N := { , k = 0, , N 1}.
N
Then for all N :
Bibliography
D.R. Brillinger (1981). Time Series - Data Analysis and Theory. McGraw-Hill, New York,
expanded edition.
P.M.T. Broersen (1995). A comparison of transfer function estimators. IEEE Trans. In-
strum. Measurem., 44, pp. 657-661.
R.G. Hakvoort and P.M.J. Van den Hof (1995). Consistent parameter bounding identica-
tion for linearly parametrized model sets. Automatica, 31, pp. 957-969.
H. Hjalmarsson (1993). Aspects on Incomplete Modelling in System Identication. Dr.
Dissertation, Dept. Electrical Engineering, Linkoping University, Sweden.
Bibliography 79
G.M. Jenkins and D.G. Watts (1968). Spectral Analysis and its Applications. Holden-Day,
Oakland, CA.
L. Ljung (1999). System Identication - Theory for the User. Prentice-Hall, Englewood
Clis, NJ, Second edition.
L. Ljung and T. Glad (1994). Modeling of Dynamic Systems. Prentice Hall, Englewood
Clis, NJ.
A.V. Oppenheim and R.W. Schafer (1989). Discrete-time Signal Processing. Prentice Hall,
Englewood Clis, NJ.
R. Pintelon and J. Schoukens (2001). System Identication - A Frequency Domain Ap-
proach. IEEE Press, Piscataway, NJ, USA, ISBN 0-7803-6000-1.
M.B. Priestley (1981). Spectral Analysis and Time Series. Academic Press, London, UK.
T. Soderstrom and P. Stoica (1989). System Identication. Prentice-Hall, Hemel Hemp-
stead, U.K.
C.W. Therrien (1992). Discrete Random Signals and Statistical Signal Processing. Prentice
Hall, Englewood Clis, NJ.
80 Version 14 February 2012
Chapter 4
Identication by approximate
realization
4.1 Introduction
In the problem of identifying dynamical models on the basis of experimental data, there has
been made a historic distinction between two dierent approaches. Initially this distinction
was completely based on the type of experimental signals that were assumed available:
of a dynamical system.
The most general (second) situation has led to the development of the class of prediction
error identication methods, which will be extensively treated in Chapter 5.
The situation of availability of transient signals, like pulse responses, of a dynamical process
has been handled classically by a dierent approach, called minimal or approximate realiza-
tion. In this approach a state space model is obtained directly from the measured signals.
Moreover, this estimate is obtained with techniques that result from realization theory
and that rely on numerical linear algebra operations, rather than on criterion optimization
techniques.
In a later extension, approximate realization techniques have also been generalized to handle
general input/output types of signals, rather than only transient signals like pulse responses.
This approach, called subspace identication is attracting an increasing attention, not in the
least because of the relative simplicity of handling multivariable (multi-input, multi-output)
identication problems.
In this chapter, we will rst discuss the construction of models on the basis of transient
signals. To this end the (exact) realization algorithm will be presented in Section 4.2.1.
This algorithm underlies the methods that apply to noise disturbed data, leading to the
approximate realization algorithm, discussed in Section 4.3.
Subspace identication will be discussed in Section 4.4.
81
82 Version 16 November 2005
{g(t)}t=0,, (4.1)
This problem actually is not an identication type of problem. It represents the trans-
formation of one model (pulse response) into another model (state-space form), where the
two models are required to be equivalent, in the sense that they reect the same transfer
function G(z).
As the methods to be discussed are not not limited to scalar systems, we will directly handle
the multivariable situation of m inputs and p outputs, so that u(t) IRm , y(t) IRp ,
g(t) IRpm , G IRpm (z), and the state space matrices: A IRnn , B IRnm ,
C IRpn and D IRpm , where n is the state dimension.
In this multivariable situation the sequence {g(t)} can actually not be thought of as the
output of the system to one single excitation signal. Column number j in g(t) reects the
response of p output signals, to a pulse signal applied to input number j at time t = 0. So ac-
tually m experiments have to be performed in order to construct the sequence {g(t)}t=0, .
This sequence is denoted as the sequence of Markov parameters of the multivariable system
G(z).
The two representations (4.1) and (4.2),(4.3) relate to the corresponding transfer function
according to:
G(z) = g(t)z t (4.4)
t=0
and
G(z) = D + C(zI A)1 B, (4.5)
and by applying pulse input signals to (4.2) it follows that the sequence of Markov param-
eters satises
CAt1 B t1
g(t) = (4.6)
D t=0
A state space model (A, B, C, D) is called a realization of G(z) if it satises (4.5), and it is
called minimal if it additionally has a minimal state dimension n.
Denition 4.2.1 (McMillan degree) The McMillan degree of the dynamical system G(z)
is dened as the state dimension of any minimal realization of G(z).
When equating the two representations (4.4) and (4.5) two statements can be made directly:
Chapter 4 83
The initial state x(0) (4.2) does not play any role in the problem. This is caused
by the fact that model equivalence in considered in terms of transfer functions, and
a transfer function only reects the dynamic system response of systems that are
initially at rest (x(0) = 0).
D = g(0)
is immediate, and so the problem of constructing the matrix D can be separated from
the construction of (A, B, C).
The procedure to construct a minimal state space model from a sequence of Markov pa-
rameters is due to Ho and Kalman (1966), and is based on operations on the matrix
g(1) g(2) g(nc )
..
g(2) g(3) g(4) .
.
Hnr ,nc (G) = g(3) g(4) g(5) .
. , (4.7)
.. .. .. .. ..
. . . . .
g(nr ) g(nr + nc 1)
which is referred as the (block) Hankel matrix of pulse response samples. A Hankel matrix
is characterized by the property that elements on the skew diagonals are the same.
The Hankel matrix (4.7) exhibits two important properties:
1. If G(z) has a realization (A, B, C, D), then
with
C
CA
o,nr (C, A) = .. , (4.9)
.
CAnr 1
c,nc (A, B) = B AB Anc 1 B . (4.10)
Note that if A has dimension n n, then the matrix o,n (C, A) is known as the
observability matrix and c,n (A, B) as the controllability1 matrix of the state space
model (A, B, C, D).
Equation (4.8) directly follows from substituting (4.6) into the Hankel matrix (4.7).
This last property follows from the Cayley-Hamilton theorem C.2. It shows that from
the n + 1-st block row and block column of H, every block row (or column) can be
written as a linear combination of the previous n block rows (or columns). Thus the
rank of H will not increase with increasing dimensions.
In order to construct a minimal state space realization on the basis of a sequence of Markov
parameters, one can now proceed as follows.
Suppose we have given a Hankel matrix of suciently large dimensions such that (4.11)
holds true. Then for any (full rank) matrix decomposition
there exist matrices A, B, C from an n-dimensional state space model, such that
the matrix C can be extracted by taking the rst p rows of the matrix;
the matrix B can be extracted by taking the rst m columns of the matrix.
Chapter 4 85
which is simply obtained from the original Hankel matrix, by shifting the matrix one
block column to the left, or equivalently one block row upwards.
Using the expression (4.6) for g(t), it can directly be veried that with H1 and H2
satisfying (4.17) and (4.18), the shifted Hankel matrix can be written as:
H = H1 A H2 (4.20)
where we have dropped subscripts and arguments to simplify notation. Note that
this shifted Hankel matrix also can be constructed simply on the basis of available
Markov parameters {g(2) g(nr + nc )}.
Because H1 and H2 have full column (resp. row-) rank, it follows that there exist
matrices H1+ , H2+ such that
H1+ is referred to as the left pseudo-inverse of H1 , and H2+ as the right pseudo-inverse
of H2 , and In the nn identity matrix. Note that H1 and H2 do not have any (regular)
inverses because they are not square matrices. The pseudo-inverses are determined
as:
Summarizing, the matrices B and C follow directly from a full rank decomposition of the
Hankel matrix. The matrix A is constructed on the basis of a shifted Hankel matrix.
A reliable numerical procedure for either the full rank decomposition of H and for the
construction of the pseudo-inverses H1+ and H2+ is the singular value decomposition (see
Denition B.1). An SVD provides a decomposition
H = Un n VnT (4.25)
where Un and Vn are unitary matrices, i.e. UnT Un = In and VnT Vn = In , and n is a diagonal
matrix with positive entries 1 s , referred to as the singular values.
86 Version 16 November 2005
The choice
1
H1 = Un n2 (4.26)
1
H2 = n VnT 2
(4.27)
1. Construct a large Hankel matrix with row and column dimensions higher than the
McMillan degree that one expects to be required.
7. Set D = g(0).
The use of the singular value decomposition for the decomposition of the Hankel matrix
was introduced in Zeiger and McEwen (1974).
The state space model that is obtained through this minimal realization algorithm does
not exhibit a specic structure; generally all four matrices (A, B, C, D) will be lled with
estimated coecients. As it is known from linear systems theory that an m input - p output
system with McMillan degree n can be represented by maximally n(p + m) + pm coecients
(see e.g. Guidorzi (1981)), this has motivated work on minimal realization algorithms that
provide state space models with specic (canonical) structures, as developed in Ackermann
(1971) and Bonivento and Guidorzi (1971).
Chapter 4 87
As mentioned above, the state space model provided by the Ho/Kalman algorithm does
not exhibit an explicitly specied structure. However when observing the relations (4.26)
and (4.27) it follows that the obtained realization will satisfy
To o = c Tc = . (4.33)
Note that
r 1
n
To o = (AT )k C T CAk and (4.34)
k=0
c 1
n
c Tc = Ak BB T (AT )k . (4.35)
k=0
If the underlying dynamical system is stable, the sums on the right hand side converge for
nr , nc to
(AT )k C T CAk = Q (4.36)
k=0
Ak BB T (AT )k = P (4.37)
k=0
AT QA + C T C = Q (4.38)
AP AT + BB T = P. (4.39)
For the state space model obtained by the Ho/Kalman algorithm, relation (4.33) holds
true, which shows that when the row and column dimension of the Hankel matrix tends to
innity, the observability and controllability Gramian of the state space model will satisfy
P = Q = , (4.40)
with a diagonal matrix. A realization satisfying this condition is called a balanced real-
ization, a notion which is essentially introduced in Mullis and Roberts (1976) and further
exploited also for model reduction procedures in Moore (1981). In a balanced realization
the states have been transformed to such a form that every state is as controllable as
it is observable. Consequently the states can be order in terms of their contribution to
the input-output properties of the dynamical system. Since its introduction, the notion of
balanced realizations has been explored extensively in several areas of systems and control
theory. For a more extensive handling of balanced realizations see e.g. Skelton (1988).
As can be easily checked this reects the pulse response of a rst order system with one pole
in z = 0.5. An exact minimal realization follows from applying the Ho/Kalman algorithm
described in this section.
First a Hankel matrix is constructed that is suciently large:
1 0.5 0.25
H3,3 = 0.5 0.25 0.125 . (4.41)
0.25 0.125 0.0625
The rank of this Hankel matrix is 1, as the second and third row (column) are obtained
by simply multiplication of the previous row (column) by 0.5. As a result the state space
dimension of any minimal realization will be equal to 1.
A full rank decomposition of this matrix can directly be written down as:
1
H3,3 = 0.5 [1 0.5 0.25] . (4.42)
0.25
The B and C matrix are set equal to the rst elements of the matrices in the decomposition:
B = 1, C = 1. For the construction of A we need the shifted Hankel matrix:
0.5 0.25 0.125
H3,3 = 0.25 0.125 0.0625 , (4.43)
0.125 0.0625 0.03125
When premultiplying left and right hand side with [1 0 0] and postmultiplying with [1 0 0]T ,
the result follows: A = 0.5. Setting D = g(0) = 0 this delivers the solution:
Note that we have constructed the full rank decomposition of H3,3 (4.42) by writing it down
directly. Such a full rank decomposition is by far nonunique. E.g. we can multiply the rst
factor in (4.42) by a constant and at the same time divide the second factor by the same
constant. In that case the realization that is found will be dierent, but the input/output
properties in terms of its Markov parameters and transfer function will be the same.
In more complex situations it is not trivial how to write down a full rank decomposition.
The automated procedure through SVD would in this example lead to:
H3,3 = U V T (4.46)
T
0.8729 0.0000 0.4880 1.3125 0 0 0.8729 0 0.4880
= 0.4364 0.4472 0.7807 0 0 0 0.4364 0.4472 0.7807 ,
0.2182 0.8944 0.3904 0 0 0 0.2182 0.8944 0.3904
Chapter 4 89
or equivalently
0.8729
H3,3 = 0.4364 1.3125 [0.8729 0.4364 0.2182] . (4.47)
0.2182
With this decomposition, the B and C matrices will be constructed as
B = C = 0.8729 1.3125 = 1,
(a) There exists a unique extension {g(t)}t=N +1,, within the class of all extensions
having a minimal McMillan degree;
rank Hnr +1,nc = rank Hnr ,nc +1 = rank Hnr ,nc . (4.49)
In words, the proposition states that for a nite sequence of Markov parameters we can
construct very many extensions to innity, leading to many dierent dynamical systems.
Among all these systems we only look at the systems with minimal McMillan degree. In
this class of systems with minimal McMillan degree there is exactly one unique element
under the condition as formulated in part (b). This latter condition (b) implies that there
exists some kind of linear dependency among the Markov parameters in the given nite
sequence, as a result of which the rank condition can be satised. Part (b) states that when
we have given a Hankel matrix composed of Markov parameters {g(t)}t=0,,N 1 , then the
90 Version 16 November 2005
rank of this Hankel matrix does not increase if we extend the matrix with either one block
row or one block column. This reects the dependency among the Markov parameters
as meant above.
In the situation as formulated in the Proposition, the minimal McMillan degree of the
system that matches the nite sequence of Markov parameters is equal to
This result follows directly from the fact that in the given situation the Ho/Kalman al-
gorithm will provide a minimal realization of the nite sequence of Markov parameters,
having a minimal state dimension equal to this rank.
As a result it follows that we could have applied the minimal realization algorithm on the
basis of only the Markov parameters g(0), g(1) and g(2), employing the Hankel matrices
H1,1 = 1 H 1,1 = 0.5. (4.51)
A full rank decomposition of H1,1 is of course 1 1, leading to C = B = 1 and A = 1 H 1,1
1 = 0.5.
This result is intuitively clear, as it actually states that a rst order system is uniquely
characterized by three coecients. This is in correspondence with the parametrization of
rst order systems in terms of fractions of polynomials:
b0 + b1 z 1
G(z) = (4.52)
1 + a1 z 1
where three coecients represent the rst order system.
{g(t)}t=0,N (4.53)
Chapter 4 91
will generally not be exactly related to the low-order dynamical system that underlies the
Markov parameters.
For instance, suppose that this nite sequence reects the measurement of pulse responses of
a dynamical system with McMillan degree n; then through all kinds of (small) measurement
errors, g(t) will not be exactly equal to the Markov parameters of this n-dimensional system.
Consequently, when constructing a Hankel matrix as discussed in section 4.2.2, the rank
of this Hankel matrix will generically be equal to the smallest row/column dimension of
the matrix; the dependency that is required to hold among the Markov parameters will
not be present. As a result of this phenomenon, it will not be possible to construct an
n-dimensional system that exactly matches the given nite sequence (4.53).
Several algorithms have been developed to deal with this situation. First we will discuss
an algorithm that can be applied to (noise disturbed) pulse response data. With some
modications the algorithm can also be applied to step response data, and this is shown in
section 4.3.3.
Hnr ,nc
on the basis of the available Markov parameters, such that nr + nc = N . What we should
need is an approximating (Hankel) matrix that matches the given Hankel matrix as closely
as possible, and that has a rank that is smaller than min (nr , nc ). If this situation would
be reached, then we could apply the standard Ho/Kalman algorithm again to the approxi-
mating matrix, and construct our realization.
This basic reasoning has led to the following approximate realization algorithm.
Approximate realization algorithm of Kung.
1. Given the nite sequence {g(t)}t=1,N construct a Hankel matrix Hnr ,nc such that
nr + nc = N .
2. Apply an SVD:
Hnr ,nc = U V T (4.54)
and evaluate the singular values 1 min(nr ,nc ) . Evaluate how the singular values
decrease with growing index, and decide for a number n of singular values that are
signicant, and the remaining number of singular values that will be neglected.
3. Construct an approximating rank n matrix according to
H(n) = Un n VnT
! !
In In
by only retaining the signicant singular values. Here Un = U , Vn = V
0 0
!
In
and n = In 0 .
0
92 Version 16 November 2005
4. Apply the Ho/Kalman algorithm to this matrix of reduced rank, i.e. construct the
realization:
1
Construct C from the rst p rows of Un n2 ;
1
Construct B from the rst m columns of n2 VnT ;
Construct A according to
1 1
A = n 2 UnT H Vn n 2 . (4.55)
where H is the shifted Hankel matrix with original Markov parameters.
Actually this algorithm is a small variation to the method of Kung (1978), as discussed
in Damen and Hajdasinski (1982). In the original method of Kung the matrix A is con-
structed from solving an equation based on shifted versions of the (reduced rank) control-
lability/observability matrices, rather than using the original Hankel matrix in a shifted
version. The practical dierences however are moderate.
The approximate realization algorithm contains two steps of approximation. The most
obvious step is step number 3, where a rank reduction of the Hankel matrix is achieved. A
second approximate step is somehow hidden in step number 4 too. This is due to the fact
that the rank reduced matrix H(n) does not necessarily have a (block) Hankel structure.
As a result of this, the matrices constructed in step 4 will not generate exactly the same
Markov parameters as the elements of H(n).
Example 4.3.1 For illustrating the algorithm discussed in this section, we have con-
structed the pulse response sequence of the dynamical system G0 as described in Example
3.4.1, and we have added an additive white noise disturbance to this response. Both exact
and noise disturbed pulse response sequences are depicted in the upper left plot in gure
4.1. Based on the perturbed sequence of length 70, a Hankel matrix has been constructed,
of which the singular values are examined in order to determine an appropriate order of the
model. The singular values are shown in the upper right plot of the gure. The consequence
of the noise disturbance is that all singular values will be unequal to 0. One has to make a
decision now on what singular values are considered to be originating from the dynamical
system, and which should be considered to be caused by the noise. From the singular value
plot it seems appropriate to choose n = 3 as the principal underlying rank of the Hankel
matrix, as the rst three singular values are (much) larger than the following ones. As a
result n = 3 is chosen for the order of the model to be constructed. The two lower plots
in gure 4.1 show the pulse and step response of the third order model together with the
responses of the system G0 .
The results show that a third order model is reasonably well able to capture the dynamics
of the underlying system. However, it is of course not exact as it has a lower complexity
(lower order) than the original system. The two modes of the system that have not been
modelled, were more or less hidden in the noise disturbance on the pulse response.
0.5
0.25
0.2 0.4
0.15
0.3
0.1
0.2
0.05
0.1
0
0
0.05 0 5 10 15
0 10 20 30 40 50 60 70 singular value number
Pulse response of $G_0$ and of 3rd order model Step response of $G_0$ and of 3rd order model
0.3 1.2
0.25 1
0.2 0.8
0.15 0.6
0.1 0.4
0.05 0.2
0 0
0.05 0.2
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
sample number sample number
Figure 4.1: Upper: left: Exact (solid) and noise disturbed (dashed) pulse response of G0 ;
right: Hankel singular values of H35,35 . Lower: left: pulse response of G0 (solid) and of 3rd
order model (dashed); right: step response of G0 and of 3rd order model (dashed).
are available from a dynamical system (and this situation is practically very relevant e.g.
in the situation of industrial production processes) it is advantageous to have the ability to
construct an approximate model directly on the basis of these data.
A straightforward method could be to dierence the step response data in order to arrive
at pulse response data:
and subsequently apply the realization methods from the previous section. However this
is not attractive since the dierencing operation (characterized by a transfer operation
(z 1)/z) will introduce an amplication of high frequent noise on the measurement data.
As an alternative it is possible to directly use the step response data in an approximate
realization method that this a slightly modied version of the methods discussed previously.
This modication is due to Van Helmont et al. (1990).
As a starting point is taken that the basic ingredient of the Ho/Kalman algorithm is a full
rank decomposition of the Hankel matrix (4.7) as shown in (4.13). If the original Hankel
94 Version 16 November 2005
matrix Hnr ,nc is postmultiplied with a nonsingular nc m nc m matrix Tnc being given by
Im Im
.. ..
0 . .
Tnc =
.. . . . .
.. (4.58)
. . . .
0 0 Im
Since Rnr ,nc is obtained from Hnr ,nc by postmultiplication of a (square) nonsingular matrix,
it follows that both matrices share a lot of properties, as e.g. their rank. Moreover, because
of the special structure of Tnc it can also easily be veried that if we shift Rnr ,nc over one
block row upwards, dual to Hnr ,nc in (4.19), then this shifted matrix satises:
Rnr ,nc = Hnr ,nc Tnc . (4.63)
2. Apply an SVD:
Rnr ,nc = U V T (4.68)
and evaluate the singular values 1 min(nr ,nc ) . Evaluate how the singular values
decrease with growing index, and decide for a number n of singular values that are
signicant, and the remaining number of singular values that will be neglected.
R(n) = Un n VnT
Similar to the algorithm in the previous section, this algorithm also shows two approxima-
tion steps, both in step 3 and step 4.
The type of approximation that is performed here, can be quite dierent from the approx-
imation in the pulse response approximate realization algorithm. This means that when
constructing reduced order models, the pulse response realization may lead to models with
dierent dynamics, than the models obtained by the step response realization algorithm.
This dierence is illustrated in example 4.3.2.
96 Version 16 November 2005
Example 4.3.2 In order to illustrate the dierence in results between the pulse response
based and the step response based algorithm, the two algorithms are applied to noise free
pulse and step responses of the data generating system G0 as also chosen in the previous
example 4.3.1. Pulse and step response sequences were generated with a length of 70
samples. Figure 4.2 shows the singular values of H35,35 and of R35,35 in both linear and
logarithmic scale.
Singular values of pulse and step response realization matrix
12 Singular values of pulse and step response realization matrix
2
10
0
10
10
2
10
8 4
10
6
10
6
8
10
4 10
10
12
10
2
14
10
16
0 10
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
number of singular value number of singular value
Figure 4.2: Singular values of pulse realization (Hankel) matrix H35,35 () and of step
realization matrix R35,35 (o) in linear scale (left gure) and logarithmic scale (right gure).
As the system has order 5, only 5 singular values are essentially contributing. However it
appears that the relation in magnitude among the rst 5 singular values is quite dierent
for the two dierent matrices.
In order to illustrate the approximation properties of the two dierent algorithms, gure
4.3 shows pulse responses and step responses of rst and third order approximate models
obtained by either of the two algorithms.
The results of the third order models (lower plot in gure 4.3 show that the pulse response
algorithm leads to far better results than the step response algorithm. This is particularly
due to the fact that the high-frequent response of the system is present more dominant in
the pulse response than in the step response of the system. The results for the rst order
models (upper plots) show that the low frequent behaviour is better retained by the step
response algorithm. A similar observation can be made here: the low frequent behaviour of
the system is present more dominant in the step response of the system than in its (nite
time) pulse response.
0.25 1
0.2
0.8
0.15
0.6
0.1
0.4
0.05
0.2
0
0.05 0
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
sample number sample number
0.25 1
0.2
0.8
0.15
0.6
0.1
0.4
0.05
0.2
0
0.05 0
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
sample number sample number
Figure 4.3: Upper: Pulse response (left) and step response (right) of G0 (solid), and of rst
order models obtained by pulse response approximate realization (dash-dotted) and by step
response approximate realization (dashed). Lower: ditto for third order models.
Bibliography
J.E. Ackermann (1971). Canonical minimal realization of a matrix of impulse response
sequences. Information and Control, Vol. 19, pp. 224-231.
C. Bonivento and R.P. Guidorzi (1971). Canonical input-output description of linear mul-
tivariable systems. Richerche di Automatica, Vol. 2, pp. 72-83.
A.A.H. Damen and A.K. Hajdasinski (1982). Practical tests with dierent approximate
realizations based on the singular value decomposition of the Hankel matrix. In: G.A.
Bekey and G.W. Saridis (Eds.), Identication and System Parameter Estimation 1982.
Proc. 6th IFAC Symp. Identif. and Syst. Param. Estim., Washington DC, pp. 903-908.
G. Golub and C.I. Van Loan (1983). Matrix Computations. John Hopkins Univ. Press,
Baltimore.
G. Golub and C. Reinsch (1970). Singular value decomposition and least squares solutions.
Num. Math., Vol. 14, pp. 403-420. Repr. in J.H. Wilkinson and C. Reinsch (Eds.),
Handbook for Automatic Computation, vol. II, Linear Algebra. Springer Verlag, 1971,
Contribution I/10, pp. 134-151.
R.P. Guidorzi (1981). Invariants and canonical forms for systems structural and parametric
identication. Automatica, Vol. 17, pp. 117-133.
B.L. Ho and R.E. Kalman (1966). Eective construction of linear state-variable models
98 Version 16 November 2005
5.1 Introduction
Identication of parametric models is basically motivated by the fact that in many applica-
tions and in many situations that we need a representation of a dynamical system in terms
of a model, nonparametric representations as e.g. Bode plots, Nyquist plots, step responses
etc. are not sucient. When considering several application areas in which models are
used, as e.g.
Process simulation
Control design
it is apparent that nonparametric models can only be of limited use. As e.g. in process
simulation it would be rather inecient to simulate output signals on the basis of non-
parametric system representations; in control design there are very ecient methods for
designing controllers on the basis of (limited order) dynamical models. The capturing of
the essential dynamics of a given dynamic system into a limited number of coecients is
the basic issue here.
This chapter deals with the identication of parametric linear dynamic models using so-
called prediction error methods. Prediction error methods have become a wide-spread
technique for system identication. Being developed in the last three decades, the stage
has now been reached that there are (relatively) user-friendly software tools available for
solving these prediction error identication problems. In the engineering community the
System Identication Toolbox of MATLAB1 has become the basic tool for dealing with these
problems.
There are several books treating the class of prediction error methods. In this chapter we
will give a brief overview of techniques, results, and interpretations, staying quite close to
the material presented in Ljung (1987). In general we will restrict attention to the case
of processes having scalar input and output signals. Despite problems of identiability of
1
Trademark of Mathworks, Inc.
99
100 Version 24 November 2005
parametrizations, most results and discussions will have straightforward extensions to the
multivariable case.
where G(z) is a proper rational transfer function that is analytic (i.e. has no poles) in
|z| 1, which means that the dynamical system is stable. The system G thus has the
property that it is LTIFD (linear time-invariant nite-dimensional). In (5.1) {y(t)} is the
(measured) output signal, {u(t)} the (measured) input signal, and v(t) is a nonmeasurable
disturbance term.
The dynamical system representation in terms of the transfer function G is of course an
abstraction from reality. Many systems that are subject of investigation will not satisfy
the LTIFD property, but will exhibit (small) deviations from this framework. Additionally,
measurements that are taken from a dynamical system will always be contaminated with
disturbance signals. In order to deal with all these eects, the disturbance terms v(t) is
added to the system equations. This signal v(t) may reect:
process disturbances;
Actually the disturbance term v(t) is supposed to reect all deviations of the LTIFD frame-
work that occur in the measurement data.
It has to be noted that the location of the disturbance term v on the output of the system
is not a severe restriction, as long as we deal with linear systems. Because of the principle
of superposition, both input, process and output disturbances can be characterized as an
output disturbance.
The question how to model the disturbance signal v(t) is quite an important one. It should
be able to reect all disturbance terms that one expects to be present in a given situation.
Several disturbance paradigms are possible. In the area of so-called bounded error
models the disturbance v(t) is e.g. modelled as a signal that belongs to the class of signals
characterized by
|v(t)| v IR, (5.2)
formally denoted as v v. For an overview of these so-called bounded error models
see e.g. Milanese et al. 1996.
In prediction error identication, as will be treated in this chapter, the disturbance v is
modelled as a zero-mean stationary stochastic process with a (rational) spectral density
v (). This means that we can write
H
v
u + y
G +
where H(z) is a proper rational transfer functions that is stable, and {e(t)} is a sequence
of zero mean, identically distributed, independent random variables (white noise).
The transfer function H(z) will be restricted to be monic, i.e. H(z) has a Laurent expansion
around z = ,
H(z) = 1 + hk z k (5.4)
k=1
and
Rv ( ) := Ev(t)v(t )
102 Version 24 November 2005
= h(k)h(j)E[e(t k)e(t j)]
k=0 j=0
= h(k)h(j)e2 (k j)
k=0 j=0
= e2 h(k)h(k ). (5.7)
k=0
This supports the statement made above that the second order properties of v (and thus
also of y) are dependent on H and e2 .
5.3 Prediction
5.3.1 Introduction
The capability of a model to predict future values of signals will appear to be an important
property when evaluating models as candidate representatives of systems to be modelled. In
this section 5.3 we will analyze how general models of the form as presented in the previous
section can serve as predictions. Here we will specically direct attention to dynamical
systems of the form (5.1).
Note that the rst two terms on the right hand side expression now contain signals that
are available at time t 1, provided that G(q) is strictly proper (i.e. it does not have a
constant (direct feedthrough) term. By substituting e(t) = H(q)1 [y(t) G(q)u(t)] into
the second term of the latter equation equation, one obtains:
and consequently
The assumption that H(z) proper, monic and has a stable inverse, implies that H 1 (z)
also is proper, monic and stable, i.e.
1
= 1 + d1 z 1 + d2 z 2 +
H(z)
with {di } a sequence which tends to 0. As a direct result of this, the expression
H 1 (q)G(q)u(t) + [1 H 1 (q)]y(t)
is fully determined by G(q), H(q) and by past observations y t1 , ut1 . This expression will
be referred to as the one-step-ahead predictor of y(t), and denoted as
As a result:
y(t) = y(t|t 1) + e(t).
This equation represents a decomposition of y(t) in a term y(t|t 1) that is available at
time instant t 1, and a term e(t) that is unavailable at that time.
The best prediction of y(t) is not uniquely determined. There are several possibilities that
can be chosen, dependent on the choice of what to be called as best. Let the probability
density function of e(t) be denoted by fe (x). Then we can analyze the conditional proba-
bility density function of y(t), given y t1 , ut1 . With fe (x)x P (x e(t) x + x) it
follows that
One possible choice is to pick that value of y(t) for which fe (x y(t|t 1)) has its max-
imum value. This is called the maximum a posteriori (MAP) prediction. Another pre-
dictor, that will be used throughout this chapter is the conditional expectation of y(t), or
E(y(t)|y t1 , ut1 ). Because of the fact that Ee(t) = 0 for all t, it follows that
In other words, the one-step-ahead predictor (5.10) has the interpretation of being the
best one-step-ahead predictor in the sense of the conditional expectation. And this holds
irrespective of the probability density function of e, as long as Ee(t) = 0.
The conditional expectation of y(t), as denoted above, has an additional property; it is the
best one step ahead prediction of y(t) if we consider a quadratic error criterion.
Let y(t) be an arbitrary function of (y t1 , ut1 ). Then
Remark 5.3.1 If G(z) is not strictly proper, i.e. if G(z) has a constant (direct feedtrough)
term, then G(q)u(t) is not completely known at time instant t 1, as it also contains a
term that is dependent on u(t). In that situation, the one-step-ahead prediction y(t|t 1)
is dened by E(y(t)|y t1 , ut ).
The prediction result implies that one can predict the next sample of a zero-mean stochastic
(noise) process given observations from the past. This is illustrated in the following example.
H(q) = 1 + cq 1
H(z) = 1 + cz 1
z
H(z)1 = .
z+c
The latter function is stable for |c| < 1. In that case
1 1 1 k
H (z) = = (cz ) = (c)k z k (5.13)
1 + cz 1
k=0 k=0
The expression for the one-step-ahead predictor now follows from (5.10) with G(q) = 0 and
y = v, leading to
v(t|t 1) = [1 (c)k q k ]v(t)
k=0
cq 1 c
= v(t) = v(t 1)
1+ cq 1 1 + cq 1
leading to
v(t|t 1) = cv(t 1) cv(t 1|t 2). (5.14)
As a result the one step ahead predictor can be constructed through a recursive relation
based on {v(t)}. 2
The one-step-ahead predictor has been derived for the situation that the datagenerating
mechanism that generates y(t) is described by a model G(q), H(q) that is known. If y
satises this condition, then apparently
This prediction error can only be calculated a posteriori, when the measurement y(t) has
become available. Substituting equation (5.10) for the one-step-ahead prediction, the pre-
diction error reads:
This prediction error (t) is exactly that component of y(t) that could not have been
predicted at time instant t 1. For this reason it is also called the innovation at time t.
When we substitute the system relation (5.8) into (5.16) it is immediate that (t) = e(t),
but this of course only holds if the datagenerating system indeed is equal to (G(q), H(q)).
Remark 5.3.3 With a similar line of reasoning an expression can be derived for the k-
step-ahead (k > 1) prediction of y(t), i.e. E(y(t)|y tk , ut ). In that case in expression (5.10)
for the one-step-ahead prediction, H 1 (q) has to be replaced by Hk (q)H 1 (q) with
k1
Hk (q) = h(
)q (5.17)
=0
The one-step-ahead prediction of y(t) can be calculated when past data y t1 , ut is available.
However it has to be noted that this implies the assumption that an innitely long data
sequence is available. In general the two expressions H 1 (q)G(q) and [1 H 1 (q)] in (5.10)
will reect series expansions in q 1 of innite length. If we have data available from a certain
time moment on, e.g. for t 0, then (5.10) still can provide a prediction of y(t), by assuming
that u1 , y 1 = 0. However this will only be an approximation of the optimal predictor,
assuming zero initial conditions. The exact predictor, based on conditional expectation,
can be calculated with the Kalman lter, incorporating a possibly nonzero initial state.
Nevertheless we will deal with the predictor (5.10) in the sequel of this chapter, thereby
implicitly assuming that the initial conditions are 0.
Remark 5.3.4 As an alternative for the expression of the one-step-ahead prediction y(t|t
1) (5.10), we will also use the notation,
!
u(t)
y(t|t 1) = W (q) (5.18)
y(t)
H0(q)
v
u + y
G0(q) + presumed data
generating system
predictor model
- +
G(q)
H(q)-1
(t)
A particular model thus corresponds to the specication of G(q), H(q) and fe (). For this
model we can compute the one-step-ahead prediction y(t|t 1). Since this prediction is
not dependent on fe (), the prediction properties of the model are only determined by
G(q), H(q). Models that are specied by only these two rational functions will be called
predictor models.
How can we employ these models in the context of an identication problem?
Predictor models provide a (one-step-ahead) prediction of the output y(t), given data from
the past. The prediction error that is made can serve as a signal that indicates how well a
model is able to describe the dynamics that underlies a measured data sequence. This can
be visualized in the following block diagram in Figure 5.2, in which equations (5.3), (5.8)
and (5.16) are combined.
In this respect the predictor model can be considered as a mapping from data u, y to a
prediction error . If the data generating system G0 , H0 is equal to the model G, H, then
= e. However, also in the situation that the model diers from the data generating
system, the model pertains its role as a predictor, and provides a predicted output signal
and a prediction error.
where a real valued parameter ranges over a subset of IRd . It is assumed that this model
set is composed of predictor models that exhibit the same properties as assumed in the
foregoing part of this chapter, such as properness and stability of G(z), H(z) and stable
invertibility and monicity of H(z).
Underlying the set of models, there is a parametrization that determines the specic relation
between a parameter and a model M within M.
The parametrization of M is dened as a surjective mapping:
M : M (5.21)
with denoting the parameter set. There are many dierent ways of parametrizing sets of
models as meant above. In prediction error identication methods a very popular way of
parametrization is by parametrizing G(q, ) and H(q, ) in terms of fractions of polynomials
in q 1 . As an example we will rst discuss the -so called- ARX model set.
A(q 1 , ) = 1 + a1 q 1 + a2 q 2 + + ana q na
B(q 1 , ) = b0 + b1 q 1 + b2 q 2 + + bnb 1 q nb +1
with
:= [a1 a2 ana b0 b1 bnb 1 ]T (5.22)
such that
M = {(G(q, ), H(q, )) |
B(q 1 , ) 1
G(q, ) = , H(q, ) = , IRna +nb }.
A(q 1 , ) A(q 1 , )
2
The acronym ARX can be explained when writing a corresponding model in its equation
form:
B(q 1 , ) 1
y(t) = 1
u(t) + 1
e(t) (5.23)
A(q , ) A(q , )
or equivalently
A(q 1 , )y(t) = B(q 1 , )u(t) + e(t) (5.24)
AR refers to the AutoRegressive part A(q 1 , )y(t) in the model, while X refers to an
eXogenous term B(q 1 , )u(t).
The model set is completely determined once the integers na , nb and the parameter set
have been specied.
Note that an important aspect of the parametrization M is, that it determines whether
and in which way for every model in the set the two rational functions G(z, ) and H(z, )
108 Version 24 November 2005
are related to each other. The choice for an ARX model set implies that all models in
the set have a common denominator in G(z, ) and H(z, ). Since this can be considered
a structural property of the model set concerned, we will also refer to this model set as
having an ARX model structure. This type of structural property of a model set will appear
to have important consequences, as will be shown in the sequel of this chapter.
Apart from the ARX model structure, there exists a number of other model structures,
that are frequently applied in identication problems. The most important ones are listed
in Table 5.1, in which each model structure is denoted in an equation form and denoted
with an appropriate name.
Note that in Table 5.1, the polynomials A(q 1 ), B(q 1 ), C(q 1 ), D(q 1 ) and F (q 1 ) are
polynomials in q 1 with respective degrees na , nb 1, nc , nd , nf , and that A(q 1 ), C(q 1 ),
D(q 1 ) and F (q 1 ) are monic 2 . The coecients of the polynomials are collected in a
parameter vector , which - for brevity - has not been denoted in the table.
The choice for applying a specic model structure in an identication problem can be a
very important issue. Choosing the wrong structure, may lead to identied models that
2
The degree for polynomial B has been chosen nb 1 rather than nb in order to comply with the use of
this variable in the System Identication Matlab Toolbox. In this way all integer variables na - nf reect
the number of unknown parameters in the respective polynomials.
Chapter 5 109
are bad. The choice for a specic model structure can be based on a priori information
on the process to be modelled (knowledge about how and where disturbance signals enter
our process); however it can also appear during the identication and validation procedure
(e.g. residual tests indicate that a specic extension of the model structure is required).
The choice may also be dictated by other arguments dealing with specic properties of the
model structure. Concerning this latter situation, we have to mention two properties of
model structures that will appear to be important.
Linearity-in-the-parameters.
The model structures ARX and FIR have the property that the one-step-ahead prediction
y(t|t 1), (5.10), is a linear function of the polynomial coecients that constitute the
parameter vector. For an ARX model structure the prediction becomes
y(t|t 1) = B(q 1 )u(t) + (1 A(q 1 ))y(t) (5.25)
In the right hand side of this expression, it appears that all terms are simple products of
one data sample and one polynomial coecient. This implies that y(t|t 1) is an expression
that is linear in the unknown .
A consequence of this linearity is that a least squares identication criterion dened on the
prediction errors (t) is a quadratic function in . As a result, there will be an analytical
expression for the optimal parameter that minimizes the quadratic criterion. The iden-
tication problem is a so-called linear regression problem, and is very attractive from a
computational point of view. This situation will be given further attention in section 5.6.
Note that for a FIR model structure the same situation holds, however now limited to the
special case of A(q 1 ) = 1. Actually the FIR model structure is simply a special case of an
ARX model structure.
Denition 5.4.2 A parametrized model set M with parameter set is uniformly stable if
Without going into detail we make the statement that parametrized model sets having
a model structure as discussed in this section will be uniformly stable if the parameter
set is conned to a region such that the polynomials A(z 1 , ), F (z 1 , ), C(z 1 , ), and
D(z 1 , ) have no zeros for |z| 1.
The parametrizations of models that are discussed so far are limited to fractions of polyno-
mials. However other parametrizations are also possible, as e.g. a parametrization in terms
of state space representations like
The optimal one-step ahead predictor for this model (under zero initial state conditions)
can be calculated in the same way as before by using (5.10).
Another parametrization that will appear to be attractive from an identication point of
view is a parametrization that generalizes the FIR model structure, by using a more general
expansion than the one that is used in the FIR structure:
ng
y(t) = ck Fk (q)u(t) + e(t). (5.27)
k=0
In this expression {Fk (z)}k=1, is chosen to be an (orthonormal) basis for the Hilbert space
of all stable systems, and the expansion coecients ck are collected in the parameter vector
. This parametrization is referred to as ORTFIR and is further discussed in chapter 6.
Chapter 5 111
1
N
N
VN (, Z ) = (t, )2 (5.30)
N t=1
where the relation between (t, ) and the data Z N is determined by the chosen parametrized
model set.
The estimated parameter N is dened as
N = arg min VN (, Z N ) (5.31)
which means that N is the value for that minimizes VN (, Z N ). This criterion is known
as the least squares criterion. Sometimes it will implicitly be assumed that this minimizing
argument of VN (, Z N ) is unique. If this is not the case, arg min is considered to be the set
of minimizing arguments.
The choice of a quadratic criterion function actually dates back to Gauss work in the 18th
century. It is a function that generally leads to satisfactory results. Besides it is very
attractive from a computational point of view. In the case that a model set is chosen that
has a model structure which is linear-in-the-parameters, the quadratic criterion function
(5.30) leads to a quadratic optimization problem, which generally has a unique solution
that can be calculated analytically (linear regression problem).
Remark 5.5.2 One disadvantage of the quadratic criterion function is its relatively high
sensitivity to outliers in the data, i.e. data samples that have extreme high amplitudes.
This lack of robustness against bad data has been a motivation for studying alternative
criterion functions that are robust with respect to unknown variations in the probability
density function, see e.g. Huber (1981). 2
Chapter 5 113
Remark 5.5.3 There are several generalizations possible of a quadratic criterion function.
In a general form one can write:
1
N
N
VN (, Z ) =
((t, ), , t) (5.32)
N t=1
where
() is a specic norm on IR. This norm can be parametrized itself by too, as well
as being time-variant, e.g. for dealing with measurement data that are considered to be of
varying reliability over the length of the time interval. The latter situation is specically
applied in time-recursive identication methods, see e.g. Ljung and Soderstrom (1983). 2
Remark 5.5.4 In the multivariable case, and particularly in the case of multiple output
signals, the prediction error will be vector valued. The multivariable analogon of the
quadratic criterion function can then be written as
1 T
N
VN (, Z N ) = (t, )(t, ) (5.33)
N
t=1
Finally we would like to pay attention to one additional freedom that is available in pre-
diction error identication methods.
In stead of having the criterion function (5.30) directly operating on the prediction error
(t, ), there is the possibility of rst letting the prediction error be ltered through a stable
linear lter L(q), leading to
F (t, ) = L(q)(t, ) (5.34)
Similar to the situation discussed before, the quadratic (or any alternative) criterion func-
tion can now be applied to the ltered prediction error F (t, ), leading to
1
N
VN (, Z N ) = F (t, )2 (5.35)
N t=1
With a proper choice of L, one can inuence the criterion function (and consequently also
the identied model) in a specic way. Here we like to mention that through the choice of
L one is able to enhance or suppress the relative importance of specic frequency regions
in the criterion function.
By choosing a lter L that enhances the low frequency components in and suppresses high
frequency components, minimization of (5.35) will lead to an identied model that will be
accurate in the particular low frequency region enhanced by L and less accurate in high
frequency region suppressed by L. By suppressing high frequency components in , model
inaccuracies in this frequency region will not play a major role in the criterion (5.35).
Employing the expression (5.16) for the prediction error in the function (5.35) we can write:
1. The eect of preltering the prediction error in the criterion function (5.35) is iden-
tical to changing the noise model of the models in the model set from H(q, ) to
H(q, )L(q)1 .
2. If G(z, ) and H(z, ) are scalar transfer functions, then the eect of preltering the
prediction error is identical to preltering the input and output data, i.e.
F (t, ) = [H(q, )]1 [yF (t) G(q, )uF (t)] (5.37)
with
yF (t) = L(q)y(t) (5.38)
uF (t) = L(q)u(t) (5.39)
The situation is depicted in gure 5.3 showing the two equivalent diagrams.
e
e
H0(q)
H0(q)
v
u + y v
G0(q) + + y
u
G0(q) +
- +
G(q,)
L(q) L(q)
(q,)1 uF yF
- +
G(q,)
(t,)
(q,)1
L(q)
F(t,) F(t,)
As mentioned above, through a proper choice of L one can enhance or suppress specic
aspects in the measured data. In the next chapter more attention will be paid to the design
aspects of choosing the lter L.
chapter, this refers to the structures FIR and ARX. In these cases the specic property
of linearity-in-the-parameters can be exploited in solving the least squares identication
problem. In this section it will be shown how a linear least squares regression problem is
solved. Some properties of the identication method are discussed and also brief attention
is given to a closely related instrumental variable identication method.
1
N
VN (, Z N ) = [y(t) T (t)]2 (5.44)
N
t=1
where
y(t) T (t) = (t, ),
the prediction error.
The criterion function (or loss-function) (5.44) now is a function that is not only quadratic
in (t), but also in . This last property implies that the problem of minimizing (5.44)
as a function of becomes a quadratic optimization problem, that in general will have a
unique solution that can be obtained analytically. For a scalar valued this is schematically
depicted in Figure 5.4.
The construction of this solution can simply be done through setting the rst derivative of
the quadratic function to zero and calculating the parameter value that corresponds with
this zero rst derivative. This solution strategy will be shown next,
!
2
N
VN (, Z N ) T
(t)
= [y(t) (t)].
T
(5.45)
N t=1
2
N
= [y(t) T (t)] (t). (5.46)
N
t=1
As a result
VN (, Z N ) 1 1
N N
V()
1
N
R(N ) := (t)T (t) ((na + nb ) (na + nb ) matrix) (5.49)
N t=1
1
N
f (N ) := (t)y(t) ((na + nb ) 1 vector) (5.50)
N
t=1
leading to
N = R(N )1 f (N ).
Note that in the ARX situation, (t) contains delayed input and output samples, and as a
consequence R(N ) and f (N ) are sample covariance functions of {y(t)} and {u(t)}, where
N N
!
Ryy Ryu
R(N ) = N N (5.51)
Ruy Ruu
N
!
fyy
and f (N ) = N . (5.52)
fyu
Remark 5.6.1 Note that the notion of linearity-in-the-parameters is not related to a notion
of linearity/nonlinearity of the models that are identied. All models that are considered
in this chapter so far have linear dynamics. The notion of linearity-in-the-parameters only
reects a property of a parametrization / model structure.
Remark 5.6.2 The linear regression problem can also directly be used for identifying mod-
els with nonlinear dynamics, provided that the structure of the nonlinearity is specied on
beforehand. Consider for instance the following model parametrization:
with T (t) = [y(t 1)u(t 1) u2 (t) u(t 1)u(t 2)] and = [a1 b1 b2 ]T , and dening
T (t) as the predictor, a least squares solution can be obtained along exactly the same lines
as sketched above.
Remark 5.6.3 The linear regression representation y(t|t 1; ) = T (t) only applies
to model structures that are linear-in-the-parameters. For other model structures, as e.g.
ARMAX or OE, an attempt to write the prediction in this form will lead to an expression
of the form
y(t|t 1; ) = T (t, ). (5.55)
In this expression the linearity property is lost as has become a function of .
it follows that:
1
VN (, Z N ) = |YN N |2 (5.57)
N
1 T
and N = TN N N YN (5.58)
In order to retain expressions that are computationally feasible for quasi-stationary input
signals, it is preferable to use the expression
!1
1 T 1 T
N = N YN . (5.59)
N N N N
In this way the two separate components of the right hand side expression remain bounded
for increasing values of N .
N is the least squares solution of an overdetermined set of equations:
YN = N . (5.60)
By way of minimizing the quadratic errors on this matrix equation, an orthogonal projection
YN = N N is determined which maps YN into the space that is determined by the columns
of N . This situation is schematically depicted in gure Fig. 5.5, for N = 3 and 2 parameters
(1) , (2) .
118 Version 24 November 2005
YN
2
1
(2)
2
X
XXX
PP X
- Y
PP N
(1) PP
1 PP
PP
PP
P
q 1
Note that
1
VN (, Z N ) = |YN N |2 (5.61)
N
1
= [YN N ]T [YN N ] (5.62)
N
with N = YN .
Consistency
Suppose there is data available from a data generating system
B0 (q 1 ) 1
y(t) = 1
u(t) + w0 (t). (5.63)
A0 (q ) A0 (q 1 )
where 0 denotes the coecient vector of the system. Note that if w0 is a white noise
process, then the data generating system has an ARX structure.
When identifying a model through an ARX model structure with the same regression vector,
this implies that
y(t|t 1; ) = T (t) (5.65)
Chapter 5 119
and the least-squares parameter estimate is obtained by (5.48). When substituting the
system output (5.64) into the parameter estimator (5.48) it follows that
1
N
1
N = R(N ) (t)y(t)
N
t=1
1
N
1
= 0 + R(N ) (t)w0 (t). (5.66)
N t=1
Consistency of the estimator follows if plimN = 0 . In order to achieve this two condi-
tions have to be satised:
1
N
(b) h := plim (t)w0 (t) should be 0.
N N t=1
R = E(t)T (t)
h = E(t)w0 (t).
The rst condition on R will appear to be satised if the input signal u is suciently
exciting the process (such that all process dynamics indeed exhibit themselves in the mea-
surement data), and the model is not overparametrized (there are no pole-zero cancellations
introduced in the step from (5.63) to (5.64)).
The second condition on h shows that
y(t 1)
..
.
y(t na )
E
w0 (t) = 0.
(5.67)
u(t)
..
.
u(t nb + 1)
If the data generating process indeed has an ARX structure, i.e. the noise w0 is a white
noise process, then the above condition is satised.
Note that y(t i), i > 0 is correlated with w0 (t j), j i; however if w0 is a white noise
process, y(t i) is not correlated with w0 (t) and so the above condition is satised. In this
case plimN N = 0 , which implies that
lim E(N ) = 0 .
N
If the data generating process has a FIR structure (and so does the model), then condition
(b) on h is given by
u(t)
..
E . w0 (t) = 0. (5.68)
u(t nb + 1)
120 Version 24 November 2005
In this situation a sucient condition for consistency of G(q, N ) is that the noise process
w0 is uncorrelated with the input signal components in the regression vector. This condition
allows w0 to be non-white.
This shows that an ARX model estimator is consistent provided that the data generating
system also has an ARX structure. If the noise enters the data in a dierent way (if w0 is
non-white) then an ARX estimator will be biased. This bias is caused by the fact that the
noise term is correlated with some of the components of the regression vector.
Asymptotic variance
The variance of an estimator is a measure for the variation that occurs in the estimate as
a function of taking dierent dierent realizations of the noise process e. The variance of
the least squares parameter estimator can be written as
using the fact that E N = 0 . When considering the matrix notation of the data generating
process:
Y N = N 0 + WN (5.70)
with WN = [e(1) e(2) e(N )]T and using (5.59) it follows that
1 T 1 1 1
cov(N ) = E[ N N ]1 TN WN WN
T
N [ TN N ]1 (5.71)
N N N N
while the bias expression becomes:
1 T 1
E N = 0 + E{( N N )1 TN WN }. (5.72)
N N
Using the fact that for e a white noise process, under weak conditions
T
plim WN WN = e2 I (5.73)
N
1 T
which together with plimN N N N = R shows that for N :
e2 1
cov(N ) (R ) . (5.74)
N
Note that the parameter variance tends to zero for N . This is a direct consequence of
the consistency property of the estimator. In expression (5.74) the variance of the estimator
can be reduced by way of obtaining a matrix R that is well conditioned (i.e. its inverse
does not grow rapidly). The choice of an appropriate input signal could be directed towards
this goal. Equation (5.74) also shows the three basic elements that inuence the variance
of the estimator:
Noise power e2 ; the more noise present on the data, the larger the parameter variance;
Length of measurement interval; the more data points are measured the smaller the
parameter variance;
Input signal and input power; the higher the power of the input signal, the larger
R will become, and thus the smaller the parameter variance.
Chapter 5 121
If S
/ M or G0
/ G, the analysis of bias and variance becomes much more complicated.
Non-asymptotic properties
In the ARX situation of a stochastic regressor, the bias and variance of the estimator follow
from (5.71) and (5.72). These expressions for bias and variance hold under the assumption
that S M and for nite values values of N .
If the regressor (t) is deterministic, as is the case with FIR models with a deterministic
input signal, the statistical analysis of the estimator (5.71) shows some particular simplica-
tions. Denoting yo (t) := G0 (q)u(t), i.e. the noise free output signal, while y(t) = yo (t)+v(t),
it follows from the earlier expressions that
1
N
1 T
E N = ( N N )1 (t) yo (t) (5.75)
N N t=1
1 T 1 1 1
Cov(N ) = ( N N )1 TN EVN VN
T
N ( TN N )1 (5.76)
N N N N
with the output noise signal vector VNT := [v(1) v(2) v(N )]. Note that these expressions
are valid for nite N and irrespective of conditions like G0 G or S M; they even
hold true in the situation G0 / G. However if G0 G, then there exists a 0 such that
T
yo (t) = (t)0 , thus leading to E N = 0 . As a result, in this case the estimator is
unbiased also for nite values of N . If the noise signal v is a stationary white noise process
with variance v2 , the covariance expression reduces to
v2 1
Cov(N ) = ( TN N )1 . (5.77)
N N
This latter expression is closely related to the asymptotic parameter covariance matrix
(5.74), that was presented for the general case in the situation S M.
with (t) the regression vector (vector with explanatory variables), given by
1
N
IV
(t)(t, N )=0 (5.80)
N
t=1
122 Version 24 November 2005
1
N
(t)(t, N ) = 0 (5.82)
N
t=1
Comparing this with (5.80) this shows the close relation between the construction of the
two estimates. When choosing (t) equal to (t), for all t, the same estimates result.
Appropriate choices of instrumental variables are indicated by the related consistency anal-
ysis. This consistency analysis is very similar to the analysis for least squares estimators
provided in section 5.6.4.
Assuming a data generating system, represented by
or equivalently
B0 (q 1 ) 1
y(t) = u(t) + w0 (t) (5.84)
A0 (q 1 ) A0 (q 1 )
with B0 (q 1 )/A0 (q 1 ) = G0 (q), and {w0 (t)} any stationary stochastic process with rational
spectral density, then substitution of the expression for y(t) into the instrumental variables
IV (5.81), one obtains
estimator N
" #1 " #
1
N N
IV 1
N = (t)T (t) (t)[T (t)0 + w0 (t)] (5.85)
N N
t=1 t=1
" #1 " #
1 1
N N
T
= 0 + (t) (t) (t)w0 (t) (5.86)
N t=1 N t=1
Similar to the situation of the ARX least squares estimator, the instrumental variable esti-
mator provides a consistent parameter estimator, plimN N IV = , under the following
0
two conditions:
Condition (a) implies that the instruments should be correlated with the variables in the
regression vector (t);
Condition (b) implies that the instruments should be uncorrelated with the noise distur-
bance term w0 (t).
Note that in the situation that w0 is a white noise process (ARX data generating system)
the choice (t) = (t) satises these conditions. However if w0 is not a white noise process
Chapter 5 123
then the output-dependent entries of the regression vector (t) have to be replaced by other
(instrumental) signals that are not correlated with the disturbance terms. A straightforward
choice for these instruments, is the addition of additional delayed input samples:
u(t)
u(t 1)
..
.
(t) =
u(t nb + 1) .
(5.87)
u(t nb )
..
.
u(t nb na + 1)
Provided that the input signal is suciently exciting (see section 5.7.3) this choice of (t)
generally satises the above two conditions for consistency. This choice of (t) is generally
denoted as the basic IV method.
Under the given conditions the IV estimator will provide a consistent estimator of the ARX
model parameters, even if the data generating system has a noise disturbance that does not
satisfy the ARX structure.
For a more extensive discussion of IV estimation methods, the reader is referred to Soderstrom
and Stoica (1983) and Ljung (1987).
(a) assumptions on the system that has generated the data, and
(b) assumptions on the excitation (input signal) that is present in the available data.
In this chapter the attention that is given to experimental situations will be directed to-
wards the analysis of the identication methods. Issues of experiment setup and design are
postponed until chapter 7.
124 Version 24 November 2005
e
?
r-
+c u - y-
- Process
6
Feedback
Figure 5.6: Block diagram of data generating mechanism, according to Assumption 5.7.1.
where
(b) {e(t)} is a sequence of independent random variables with zero mean and the property
that E[e(t)]4 < Ce < ;
(i) k , i = 1, .., 4, is uniformly stable, i.e.
(c) the family of transfer functions
k=0 d (k)z
(i)
maxi d (k) d(k), k=0 d(k) = Cd < ;
The fact that {r(t)} is assumed to be deterministic means that we consider {r(t)} as a given
- measurable - data sequence that can be reproduced if we want to repeat the experiment.
It does not exclude that the particular sequence {r(t)} is generated as a realization of a
stochastic process. However, when dealing with probabilistic properties as e.g. expected
values, we will consider e as the only probabilistic source.
When discussing properties as consistency, we need the concept and formulation of a true
system, in order to be able to relate an estimated model with the true model. To this
end we formalize the following assumption.
Chapter 5 125
Assumption 5.7.2 There exist rational transfer functions G0 (z), H0 (z) with H0 (z) monic,
stable and stably invertible, such that the data sequence Z satises:
with e a white noise process with zero mean, variance e2 , and satisfying the condition (b)
of assumption 5.7.1.
This true system, determined by (G0 (q), H0 (q)) is denoted by S. 2
Note that in this assumption nothing is said about the order or McMillan degree of G0 , H0 .
Finite dimensionality and linearity of the data generating system are the basic restrictions
here.
Once we have given a model set M it will be important to distinguish whether the data
generating system S can be represented exactly by an element of M.
Let M be induced by a parametrization with parameter set .
We introduce the following notation:
Whenever this set is nonempty, an exact representation of the system S is present in the
model set M. Consequently this situation will be referred to as:
SM (5.93)
In the asymptotic analysis of prediction error identication methods we will pay attention
to both situations S M and S / M. One might argue whether it is of practical inter-
est/relevance to consider the situation S M. Many processes that appear in nature will
not be linear, time-invariant and nite dimensional, and consequently the models that we
build for these processes - within the framework discussed here - will most of the time be
abstractions from reality. However, even if this is the case, we may require from identi-
cation procedures, that when we are in a situation of S M, the real system (at least
asymptotically) can be identied from data.
Denition 5.7.3 Let {u(t)} be a quasi-stationary signal, and let the n n matrix Rn be
dened as the symmetric Toeplitz matrix:
Ru (0) Ru (1) Ru (n 1)
Ru (1) Ru (0) Ru (n 2)
Rn = .. .. .. .. (5.94)
. . . .
Ru (n 1) Ru (1) Ru (0)
126 Version 24 November 2005
Example 5.7.4 Consider a set of models having a FIR model structure, leading to a
prediction error
nb
(t, ) = y(t) bk u(t k) (5.95)
k=1
with V () := E(t, )2 , then through analysis of the linear regression scheme, it follows
that is determined as the solution to the equation:
Ru (0) Ru (1) Ru (nb 1) b1 Ryu (1)
..
Ru (nb 2) b2 Ryu (2)
Ru (1) .
.. . .. = .. (5.97)
.
. .. . .. .. . .
Ru (nb 1) Ru (1) Ru (0) bnb Ryu (nb )
Note that uniqueness of the estimated parameter occurs if and only if the symmetric Toeplitz
matrix has full rank. Alternatively we could state that the order of persistence of excitation
of the input signal is equal to the number of F IR-coecients that can be identied uniquely
with the identication criterion (5.96). 2
Mn (q) = b1 q 1 + b2 q 2 + + bn q n (5.98)
the relation
|Mn (ei )|2 u () = 0 for all (5.99)
implies that Mn (ei ) = 0 for all .
Chapter 5 127
bT Rn b = 0 b = 0 (5.100)
The interpretation of this result is, that if u is persistently exciting of order n, there does
not exist an n-th order moving average lter that reduces the signal u to zero.
As a consequence of this proposition it can be veried that persistence of excitation of order
n is guaranteed if the spectrum u () is unequal to zero in at least n points in the interval
< .
M3 (q) = q 1 2cos(0 )q 2 + 1
= q 1 [1 ei0 q 1 ][1 ei0 q 1 ]
The result of the example is appealing. A sinusoid principally exhibits two degrees of
freedom: an amplitude and a phase, and by exciting a dynamical system with one sinusoid
we can identify two system parameters.
128 Version 24 November 2005
1.5
0.5
0.5
1.5
2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Example 5.7.7 Consider the block signal u(t) depicted in Figure 5.7.
The signal is periodic with a period of 6 time steps.
It can easily be veried that
1
Ru (0) = 1 Ru (1) = 3 Ru (2) = 13 Ru (3) = 1
Ru (4) = 13 Ru (5) = 1
3 Ru (6) = 1 etcetera
According to denition 5.7.3 the Toeplitz matrix
1
1 3 13 1
1
1 1
13
R4 = 3
1 1
3
1
(5.103)
3 3 1 3
1 13 1
3 1
with VN (, Z N ) the quadratic function (5.30), one of the questions that has to be raised
deals with the limit properties of the criterion function VN (, Z N ) as N . In this
respect the following convergence result can be shown to hold.
Chapter 5 129
Proposition 5.8.1 Let M be a model set, with parameter set , that is uniformly stable,
and let Z be a data sequence that is subject to Assumption 5.7.1.
Then
sup |VN (, Z N ) V ()| 0 with probability 1 as N (5.105)
where
V () = E(t, )2 (5.106)
2
The proof of this proposition is somewhat outside the scope of this course, see eg. Ljung
(1987), Davis and Vinter (1985) for a thorough analysis.
The result states that the criterion function, dened as an averaging over time, converges to
a criterion function that averages over an ensemble. Asymptotically the criterion function
thus becomes independent of the specic realization of the noise sequence {v(t)} that has
aected the data sequence Z .
Since the convergence result is uniform in , it also implies that the minimizing arguments
of both criterion functions converge:
Proposition 5.8.2 Consider the situation of Proposition 5.8.1, and let N be dened by
(5.104). Then
Since arg min V () is not necessarily a single element, the convergence (5.107) can be a
convergence into the set c of minimizing arguments of V (). This has to be interpreted
as
inf |N | 0 as N with c = arg min V ()
c
Note that for this convergence result it is not required that the real data generating
system S is a minimizing argument of V (). Consequently the estimate N converges to
the best possible approximation of S that is possible within the model set chosen. The
quality of the approximation is taken in the sense of the criterion V ().
This is illustrated in the following example.
b0 q 1 1 + c0 q 1
G0 (q) = H0 (q) = (5.108)
1 + a0 q 1 1 + a0 q 1
leading to
y(t) + a0 y(t 1) = b0 u(t 1) + e(t) + c0 e(t 1) (5.109)
with {u(t)} and {e(t)} independent unit variance white noise sequences.
We consider a model set with an ARX model structure, determined by
bq 1 1 a
G(q, ) = H(q, ) = and = IR2 (5.110)
1 + aq 1 1 + aq 1 b
130 Version 24 November 2005
In the example a data generating system with an ARMAX structure has been modelled by
an ARX model. This is the reason for the asymptotic parameter estimators to be biased.
In this situation the asymptotic ARX model that is identied becomes dependent on the
specic experimental conditions that underlie the data that is taken from the system. The
input signal now plays an important role in the estimation of the model. This is typical
for situations in which model structure and structure of the data generating system do not
match.
Chapter 5 131
Proposition 5.9.1 Let S be a data generating system and let Z be a data sequence
corresponding to Assumptions 5.7.1, 5.7.2. Let M be a model set that is uniformly stable
with parameter set such that
b0 + b1 q 1 + + bnb 1 q nb +1
G(q, ) = q nk (5.118)
1 + f1 q 1 + + fnf q nf
Together with the convergence result that is shown in the previous section, this leads to
the observation that in the given situation for all :
G(ei , N ) G0 (ei )
and H(ei , N ) H0 (ei ) with probability 1 as N .
132 Version 24 November 2005
The consistency result of this prediction error method shows that consistency is obtained
under conditions that are rather appealing: the model set should be able to describe the
system exactly, and the input signal has to be suciently exciting in order to be able to
extract all of the dynamics of the system from the external signals.
Note that the order of sucient excitation of the input (nb + nf ) is equal to the number of
real-valued parameters that is to be estimated. This order, mentioned in the proposition,
is a sucient condition; in specic situations lower orders may also lead to consistency.
For consistency of the estimated parameter, we need the additional condition that the
parametrization : M is an injective mapping, i.e. (1 ) = (2 ) with 1 , 2
implies that 1 = 2 . Under this condition the solution set T (S, M) will only contain one
element, being the real system parameter vector 0 .
A proof of proposition 5.9.1 is given in the Appendix.
G := {G(z, ), } (5.120)
We can now formalize the situation that the input/output transfer function G0 (z) of the
data generating system can be modelled exactly within the model set, irrespective of the
fact whether the same holds for the noise transfer function H0 (z). We denote:
G0 G (5.122)
Proposition 5.9.2 Let S be a data generating system and let Z be a data sequence cor-
responding to Assumptions 5.7.1, 5.7.2. Let M be a model set with G and H parametrized
independently, i.e.
M = {(G(z, ), H(z, )), = } (5.123)
and let G(z, ) be structured according to (5.118).
If {u(t)} and {e(t)} are uncorrelated (open-loop experiment), and if the input signal {u(t)}
is suciently exciting of an order nb + nf then
2
Chapter 5 133
With a similar reasoning as in the proof of Proposition 5.9.1 the rst term is 0 if and only if
G (S, G) provided that {u(t)} is persistently exciting of sucient order. As the second
term is independent of , this proves the result. 2
In the situation discussed, there exists a consistency result for the input/output transfer
function G irrespective of the modelling of the noise transfer function H. An important con-
dition is that within the model set M the two transfer functions G and H are parametrized
independently, through dierent parts of the parameter vector. This condition refers to
properties of the model structures. Note that for model sets having an ARX structure, this
condition is not satised, because of the common denominator in the two transfer func-
tions. As discussed already in section 5.4 there are three model structures that do have
this property of independent parametrization: OE, FIR and BJ.
In this section we will discuss some general results on asymptotic normality of the parameter
estimators without giving the corresponding proofs in full detail. The interested reader is
referred to Ljung and Caines (1979), Davis and Vinter (1985) and Ljung (1987).
Proposition 5.10.1 Let M be a model set that is uniformly stable with parameter set ,
and let Z be a data sequence that is subject to Assumption 5.7.1.
Consider the parameter estimator
N = arg min VN (, Z N )
with
1
N
VN (, Z ) =N
(t, )2 .
N t=1
Denote
V () = E(t, )2 }
and let be a unique value satisfying arg min V (), V ( ) > 0.
Then, under weak conditions3 ,
N (N ) As N (0, P ) (5.125)
i.e. the random variable N (N ) converges in distribution to a Gaussian p.d.f. with
zero mean and covariance matrix P ,
where
P = [V ( )]1 Q[V ( )]1 (5.126)
Q = lim N E{[VN ( , Z N )][VN ( , Z N )]T } (5.127)
N
while () , () respectively denote rst and second derivative with respect to .
Moreover,
Cov( N N ) P as N (5.128)
2
The given expression for the covariance matrix P in the general setting of Proposition 5.10.1
is quite complicated. Interpretations of this P in specic situations will be considered in
the next subsections.
The importance of the asymptotic normality result and the availabiltiy of a related covari-
ance matrix is, that they can provide expressions for the condence interval of the parameter
estimator related to a specied probability. For a normal
$ distribution a standard condence
(i) 1
interval is chosen by the 3-level (i.e. (i) = N 3 (ii) ) corresponding to a probability
N P
(ii)
level of 99%. In this expression P is the (i, i) matrix element of P .
3
The weak conditions
refer to the situation that N DN 0 as N ,
where DN = E N1 t=1 [(t, )(t, ) E(t, )(t, )] and (t, ) = d (t, )|= .
N d
Chapter 5 135
(t, ) = e(t)
being a sequence of independent, identically distributed random variables with zero mean
and variance e2 .
In this situation the expressions for the asymptotic covariance matrix (5.126), (5.127) can
be simplied.
with
(t, 0 ) := (t, ) . (5.130)
=0
The result of the Proposition has a natural interpretation. Note that (t, ) is the gradient
of y(t|t 1; ) = y(t) (t, ). Consequently the larger this gradient, the smaller the
asymptotic covariance matrix. In other words, the more sensitive the predictor is with
respect to the parameter, the more accurate the parameter can be estimated. A small
variation of a parameter value then will lead to a large eect on the predictor and thus on
the prediction error.
It is also possible to estimate the expression (5.129) from data. In that case, one may want
to use as an estimate of P :
1
N
PN = N [ (t, N ) T (t, N )]1 (5.131)
N t=1
1 2
N
N = (t, N ) (5.132)
N
t=1
We will now present an example of the analytic calculation of the asymptotic covariance of
a single parameter.
The input signal {u(t)} is a white noise process with variance u2 being uncorrelated with
the white noise process {e(t)} having a variance e2 .
136 Version 24 November 2005
bz 1 1
G(z, ) = H(z, ) =
1 + az 1 1 + az 1
with = [a b]T ranging over an appropriate parameter set . The corresponding one-step-
ahead predictor is
y(t|t 1; ) = ay(t 1) + bu(t 1)
and its gradient: !
y(t 1)
(t, ) =
u(t 1)
As a result the asymptotic covariance matrix can be written as
! 1
Ry (0) Ryu (0)
P = e2
Ryu (0) Ru (0)
We can compute the corresponding samples of the covariance functions that are present in
this expression by squaring the left and right hand sides of (5.133) and taking expectations:
b20 u2 + e2
Ry (0) =
1 a20
leading to
e2
e2 1 a20 1 a20 u2
Cov aN = (5.134)
N b20 u2 + e2 N b2 + e22
0 u
1 e2
Cov bN (5.135)
N u2
Note that the covariance of bN increases linearly with the ratio of the noise and input
variance, whereas the covariance of aN reaches an asymptotic value for increasing ratio of
noise to input variance. This is due to the fact the parameter b is present in the input/output
transfer function G only, whereas the parameter a is also present in the noise/output transfer
function H. 2
Chapter 5 137
The result of proposition 5.10.2 also holds for linear regression models; for these models the
prediction error gradient (t, 0 ) can simply be calculated by
(t, 0 ) = (t, )
=0
= [y(t) T (t)]=0
= (t).
which is in agreement with the asymptotic variance expression for linear regression models
that was given in (5.74).
The expression (5.129) for the asymptotic covariance matrix can also be interpreted in
a frequency domain context. To this end we exploit the predictor gradient (t, 0 ) that
appears in (5.129).
Note that
(t, ) := y(t|t 1; ) = (H(q, )1 [y(t) G(q, )u(t)]) (5.137)
leading to
1 H(q, )
(t, ) = 2
(y(t) G(q, )u(t)) + (5.138)
(H(q, ))
1 G(q, )
+ u(t) (5.139)
H(q, )
! !
1 H(q, ) 0 u(t)
= G(q,) H(q,)
(5.140)
(H(q, ))2 G(q, ) 1 y(t)
Using y(t) = G(q, 0 )u(t) + H(q, 0 )e(t), and denoting T (q, 0 ) = G(q,)
H(q,)
,
=0
it follows that !
1 u(t)
(t, 0 ) = T (q, 0 ) (5.141)
H(q, 0 ) e(t)
Applying Parssevals relation now shows that (5.129) can be rewritten as
!1
1 1
P = e2 T (ei , 0 ) ()T (ei , 0 )T d (5.142)
2 |H(e , 0 )|2
i
i.e. G and H are parametrized independently, and we additionally assume that there is a
unique such that
0
= arg min V () with = . (5.145)
and
H0 (z)
F (z) := = fi z i (5.146)
H(z, )
i=0
with
This result provides an expression for the asymptotic variance of parameter estimators for
the case that model sets are applied that are not able to describe the noise/output transfer
function H0 accurately. This typically happens in the case of a model set having an output
error (OE) model structure.
f () = constant.
In the case of a Gaussian distribution N (0 , P ), these contour lines are dened by the
relation
(N 0 )T P1 (N 0 ) = c (5.151)
with c IR, and c 0.
Chapter 5 139
Since P is positive semi-denite and symmetric all its eigenvalues will be real-valued and
0. Additionally the eigenvalue decomposition of P can be written as
P = W W T
(N 0 )T P1 (N 0 ) = (N 0 )T 1 (N 0 ) = c (5.154)
or equivalently
n (i) (i)
| |2
N 0
=c (5.155)
i
i=1
with ()(i)denoting the i-th component of the vector considered. Equation (5.155) is the
characterization of an ellipsoid in the orthogonal basis spanned by the components of N ,
with the center point 0 . The orthogonal basis is determined by the relation
T
w1
wT
2
N = W T N = . N
..
wnT
leading to N = w1 , etcetera. Therefore the principal axes of the ellipsoid are aligned
with the orthogonal eigenvectors
w1 , w2 , of P . The principal axes of the ellipsoid are
determined in size by 2 ci , i = 1, n. This is illustrated in gure 5.8 for a 2-dimensional
example.
140 Version 24 November 2005
2sqrt{c 1 }
0 (2 )
w2 w1
0 (1 )
Figure 5.8: Ellipsoid indicating levels of equal probability density function for a normally
distributed estimator N with covariance matrix P having eigenvalues 1 , 2 and eigenvec-
tors w1 , w2 .
c (, n)
D0 = { | ( 0 )T P1 ( 0 ) < }
N
where c (, n) is the -probability level for a 2n -distributed random variable, having n
degrees of freedom, i.e. P r{2n < c (, n)} = . If the distribution function of the pa-
rameter estimator is indeed correctly represented and Gaussian, the following probabilistic
expression can be made:
N D0 with probability .
Reversely, this statement can be rephrased in a statement on the value of the unknwon
coecient 0 :
0 D with probability ,
with
c (, n)
D = { | ( N )T P1 ( N ) < }.
N
This denition implies that in Cov T there is no information on the separate real and
imaginary parts of T , but only information on its magnitude.
Denoting T (q) = T (q, ), and TN (q) = T (q, N ) a Taylors expansion shows that
with T the partial derivative of T with respect to . Using this relation it follows from
(5.125),(5.129) that
N (TN (ei ) T (ei )) As N (0, P ()) (5.159)
with
P () = T (ei , )P T (ei , )T (5.160)
which formalizes the result for the asymptotic covariance matrix of TN (ei ).
The above result can be used to characterize uncertainty bounds on the frequency responses
of estimated models. The expression for P () shows how the parameter covariance P
induces a frequency response covariance.
In a specic asymptotic situation, where not only the number of data but also the model
order tends to innity, the asymptotic covariance expression will be shown to simplify to
an extremely simple - and appealing - expression.
Remark 5.10.5 The results discussed in this section can readily be extended to the multi-
variable situation, as shown in Zhu (1989).
142 Version 24 November 2005
the pdf of a sequence of observations {y(1) y(N )} is determined by the pdf of {e}; in
this situation the input signal u is again considered as a given - deterministic- sequence.
In the case of an exact parameter 0 reecting the transfer functions G0 and H0 , one can
write also
y(t) = y(t|t 1; 0 ) + e(t)
If e(t) has a pdf fe (x, t), with {e} a sequence of independent random variables, as is the
case with white noise, then it can be shown that the joint probability density function for
y N = (y(1) y(N )) conditioned on the given input sequence uN is given by
+
N
N
fy (, x ) = fe (x(t) x(t|t 1; )).
t=1
When substituting a measured sequence of input and output data, one obtains the - a
posterior - likelihood of this measured data sequence:
+
N
Ly (; y N ) = fe (y(t) y(t|t 1; )) (5.164)
t=1
which now has become a deterministic function of the unknown parameter . The maxi-
mum likelihood (ML) estimator is dened by that value of that maximizes the likelihood
function (5.164). In other words: it selects that parameter value for which the observed
data would have been most likely to occur.
If fe is a Gaussian pdf, i.e.
2
1 x2
fe (x, t) = 2
e e
2e
then
1 (t, )2
N
log Ly (; y ) = constant + N log e +
N
. (5.165)
2 e2
t=1
Chapter 5 143
Maximization of the likelihood function is obtained by minimization of its minus log likeli-
hood, according to
1 (t, )2
N
ML
N = arg min N log e + (5.166)
2 t=1 e2
N
= arg min (t, )2 (5.167)
t=1
1
E(N 0 )(N 0 )T JN
shows that the mean square error of any unbiased estimator is lower bounded by a
particular minimum value, specied by the Fisher information matrix. As a conse-
quence, there is no other unbiased estimator which gives a smaller variance than the
ML estimator, which makes the estimator asymptotically ecient.
For a Gaussian pdf fe the Fisher information matrix can be shown to satisfy
1
N
JN = 2 E(t, 0 ) T (t, 0 )
e t=1
with (t, 0 ) := y(t|t 1; ). This shows the close resemblance with the expression for
the asymptotic variance of prediction error estimators as formulated in Proposition 5.10.2.
In the current context it holds that for any unbiased estimator N :
1
cov (N ) JN
First attention will be given to an important result in which the asymptotic model is char-
acterized in terms of properties formulated in the frequency domain. This result originates
from Wahlberg and Ljung (1986). Secondly a less general result in terms of time-domain
properties will be shown, due to Mullis and Roberts (1976).
and consequently
1
|G0 (ei ) G(ei , )|2 u () + v ()
V () = d (5.174)
2 |H(ei , )|2
where
v = e2 |H0 (ei )|2 .
Since = arg min V () is the value (or set of values) to which the parameter estimator
N converges with probability 1, we now have a characterization of this limit estimate in
the frequency domain. This limit estimate is that value of that minimizes the expression
(5.174). The very important and illustrative formula (5.174) has become well known as
formula (8.66), pointing to the corresponding equation in the book Ljung (1987).
For interpretation of this criterion formulated as a frequency domain expression, it is how-
ever more attractive to consider a slightly modied expression. Starting from (5.171), one
can write:
Now, because of the fact that H0 (q) and H(q, ) are monic for all , the second term on
the right hand side is dependent on e(t 1), e(t 2), but not on e(t). And since {e(t)}
is a white noise process this implies that the second and the third term on the right hand
side are uncorrelated. As a result, E(t, ) can be written as a summation of three terms,
originating from the three separate terms on the right hand side of the above equation.
Since the third term is independent of , it follows that
1
|G0 (ei ) G(ei , )|2 u () + |H0 (ei ) H(ei , )|2 e2
= arg min d.
2 |H(ei , )|2
(5.176)
This very nicely structured expression shows that the limiting estimate is obtained by
minimizing additive errors on G0 and H0 , where the two error terms in the numerator are
weighted by the spectral density of their related signal (input u versus noise e). Additionally
there is a weighting with the inverse of the noise model. A couple of simple special cases
will be considered.
This frequency weighting function determines how the errors in the dierent frequency re-
gions are weighted with respect to each other. For those values where u ()/|H (ei )|2
is large, the relative importance of error terms |G0 (ei ) G(ei , )|2 in the total mist is
large, and consequently the estimated parameter will strive for a small error contribution
|G0 (ei ) G(ei , )|2 at that frequency. By choosing the xed noise model and the input
spectrum, this weighting function can be inuenced. In the next chapter we will pay more
attention to this phenomenon.
Note that this situation is similar to the case of a xed noise mode, with the dierence that
in this latter case, the noise model H(q, ) is not known a priori.
Preltering data
In section 5.5 we already discussed the possibility of preltering the prediction error, before
applying the sum-of-squares criterion function. This leads to a ltered prediction error
and the corresponding limiting estimate C is determined as the set of minimizing argu-
ments of the function
VF () = EF (t, )2 (5.180)
Using similar arguments as before, this leads to the situation that
1
|L(ei )|2
= arg min {|G0 (ei ) G(ei , )|2 u () + |H0 (ei ) H(ei , )|2 e2 } d
2 |H(ei , )|2
(5.181)
and for the situation of a xed noise model:
1 u ()|L(ei )|2
= arg min |G0 (ei ) G(ei , )|2 d (5.182)
2 |H (ei )|2
Again, as also discussed in section 5.5, the inuence of the prelter L is that it redirects
the noise model H to L1 H. The preler L can also be considered as a design variable,
that is available to the user for shaping the integral function to a desired form.
The situation of a xed noise model especially refers to the case of model sets having a so-
called Output Error (OE) structure. In that case the xed noise model satises H (q) = 1.
Now the question is whether we can say anything about the asymptotic criterion opti-
mization in the situation that the noise model is also parametrized. Taking a closer look
at (5.176) it shows that the integrand essentially contains two terms that are both -
dependent, and the result is that the interpretation of the system approximation that is
involved becomes implicit.
Chapter 5 147
u ()/|H(ei , )|2 ,
and
minimizing the noise spectrum error |H0 (ei ) H(ei , )|2 with the frequency weight-
ing e2 /|H(ei , )|2
We will illustrate the frequency domain expression (5.176) in the following example, that
is taken from Ljung (1987).
y(t) = G0 (q)u(t)
with
0.001q 2 (10 + 7.4q 1 + 0.924q 2 + 0.1764q 3 )
G0 (q) =
1 2.14q 1 + 1.553q 2 0.4387q 3 + 0.042q 4
No disturbances act on the system. The input is a PRBS (see chapter 7) with a clock period
of one sample, which yields u () 1 for all .
On the basis of available input and output data, this system was identied with the predic-
tion error method using a quadratic criterion function and prelter L(q) = 1, with a model
set having an Output Error (OE) structure:
b1 q 1 + b2 q 2
y(t|t 1; ) = u(t) (5.183)
1 + f1 q 1 + f2 q 2
Note that both G0 and the model set are strictly proper, i.e. b0 = 0.
Bode plots of the true system and of the resulting model are given in Figure 5.9, reecting
G0 (ei ) and G(ei , N ) as functions of . We see that the model gives a good description of
the low-frequency properties but is less accurate at high frequencies. According to (5.174),
the limiting model is characterized by
1
= arg min |G0 (ei ) G(ei , )|2 d (5.184)
2
since H (q) = 1 and u () 1. Since the amplitude of the true system falls o by a
factor of 102 to 103 for > 1, it is clear that errors at higher frequencies contribute only
marginally to the criterion (5.184); the good low-frequency t is the result.
148 Version 24 November 2005
Frequency response
0
10
Amplitude
2
10
2 1 0 1
10 10 10 10
100
Phase (deg)
200
300
400
1 0 1
10 10 10
Frequency (rad/s)
Figure 5.9: Amplitude and phase Bode plots of true system G0 (solid line), second order
OE model (dashed line), and second order ARX model (dotted line).
10 0
10 -1
10 -2
10 -2 10 -1 10 0 10 1
frequency (rad/sec)
criterion penalizes high-frequency mist much more. As a result the model improves at
higher frequencies at the cost of a worse low-frequency behaviour. 2
The mechanism that is illustrated in the example is particularly directed towards the fre-
quency weighting of the additive error in the estimated input/output transfer function.
The compromise that is made between minimization of this term and a tting of the noise
spectrum to the error spectrum, as mentioned before, will be of increasing inuence on the
nal estimate, the higher the level of the noise disturbance on the data.
S : y(t) = G0 (q)u(t)
with
0.9q 1
G0 (q) = .
1 1.8q 1 + 0.8108q 2
The system has order 2, having poles in z = 0.9 0.028i and a zero in z = 0. In gure
5.11 the pulse responses of the system G0 is shown, together with the pulse responses of
the asymptotic rst order ARX and OE estimates, using model sets determined by
The input signal was chosen to be a unit variance white noise process.
Note that in both model sets considered, the number of parameters equals 3.
In the OE-model the 3 model parameters are used to arrive at a well-balanced approximation
of the systems pulse response. In the ARX model two parameters are used to t g0 (0) and
g0 (1) exactly, while only one parameter remains for approximating the tail of the response.
Chapter 5 151
Pulse response
4
3.5
2.5
1.5
0.5
0
0 10 20 30 40 50 60 70 80 90 100
Figure 5.11: Pulse response of second order data generating system G0 (q) of example 5.12.3
(solid) and asymptotic rst order ARX (dashed) and OE models (dash-dotted).
Example 5.12.4 A situation is considered that is similar to the previous example, however
now with a data generating system satisfying:
1 0.9q 1
G0 (q) = .
1 1.8q 1 + 0.8108q 2
Again this system has order two, having poles in z = 0.9 0.028i and zeros in z = 0.9 and
z = 0.
In gure 5.12 again the pulse responses are sketched of the data generating system and of
asymptotic rst order OE and ARX models.
Pulse response
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 5 10 15 20 25 30 35 40
Figure 5.12: Pulse response of second order data generating system G0 (q) of example 5.12.4
(solid) and asymptotic rst order ARX (dahsed) and OE models (dash-dotted).
In contrast to the situation of example 5.12.3 the three responses now are ver close. Appar-
ently the second process can much more accurate be approximated by a rst order model.
This can be explained by the fact that the location of the zero in z = 0.9 is very near to
the two complex poles of the system. In the ARX model the exact t of g0 (0) and g0 (1) is
now not as dramatic as in 5.12.3.
Example 5.12.5 In the third example an experimentally obtained pulse reponse is used
as a data generating system. 85 pulse response samples are obtained from measurements on
(part of) the retina system. This pulse response is used for generating input/output data,
152 Version 24 November 2005
Pulse response
100
80
60
40
20
20
40
60
80
0 10 20 30 40 50 60 70 80 90
Figure 5.13: Pulse response of data generating (retina) system of example 5.12.5 (solid)
and of asymptotic 5th order ARX- (dashed) and OE-model (dash-dotted) with N = 3000.
and 5th order ARX and OE-models are estimated under white input noise conditions. The
results are depicted in gure 5.13. The consequences of the exact t of {g0 (k)}k=0,5 for
the ARX model are considerably. 2
The results of this section support the more general frequency-domain results of section
5.12.2. As indicated there an ARX model structure will generally lead to an estimate
that emphasizes the high-frequent behaviour of the model and achieves a less accurate
low frequent behaviour. This same phenomenon is exhibited in the presented time-domain
result.
Note that the dierence between ARX and OE estimates will be larger the slower the start
of the pulse response of G0 . In the case when G0 contains a number of time-delays, the
ARX estimate will particularly identify these delays, at the cost of an accurate description
of the dynamics in the tail of the response.
The result of proposition 5.12.2 might suggest that it is a good strategy to drasticaly
increase the degree of the polynomial B(q, ). The exact t of the pulse response will then
take place over a larger interval. However this strategy can not be followed without care,
as with an increasing number of parameters to be estimated, the variance of the estimators
will increase.
that employs both rst and second derivative of the criterion function with respect to the
(k)
unknown parameter. Here N denotes the parameter estimate in the k-th iterative step of
the algorithm, and k is the step size. This step size can also be optimized by choosing:
(k) (k) (k)
k = arg min VN (N [VN (N )]1 VN (N )) (5.192)
(k+1)
i.e. that value of that minimizes VN (N ).
This algorithm is designed to give a one-step convergence for quadratic functions. The local
(k)
convergence of the Newton-Raphson algorithm is quadratic, i.e. when N is close to the
(k+1) (k)
optimum point N then N N is of the same magnitude as N N 2 .
The Gauss-Newton or quasi-Newton method uses an approximation of the second derivative
(k)
VN (N ).
Note that VN () = N2 Nt=1 (t, )(t, ) and
2 2 T
N N
VN () = (t, ) (t, )
T
( (t, ))(t, ). (5.193)
N N
t=1 t=1
If = 0 then (t, ) will asymptotically become a white noise signal, and the second term
2
in this equation will vanish. The approximation VN () N N T
t=1 (t, ) (t, ) is then
used in the iterative search algorithm.
The Gauss-Newton algorithm generally exhibits a linear convergence.
For a general overview of iterative search algorithms, see Dennis and Schnabel (1983).
The gradient of the prediction, (t, ), has to be available in order to apply the iterative
search methods as sketched above. Generally - for common model sets - this gradient can be
obtained by ltering the data, as is illustrated in the next example considering an ARMAX
model set.
C(q 1 ) y(t|t 1; ) = y(t i) (5.196)
ai
154 Version 24 November 2005
C(q 1 ) y(t|t 1; ) = u(t j) (5.197)
bj
C(q 1 ) y(t|t 1; ) + q k y(t|t 1; ) = y(t k) (5.198)
ck
Consequently, with
it follows that
(t, ) = C(q 1 )1 (t, ) (5.199)
It may be apparent that from a computational point of view linear regression schemes are
much more attractive than iterative search methods. In the latter situation one generally
has the problem that only convergence to a local minimum can be achieved, whereas it is of
course of interest to obtain the global minimum of the criterion function. This basic problem
in nonlinear optimization problems can only be dealt with by starting the iterative search
in several points of the parameter space, and thus by scanning this space for convergence to
the global minimum. Finding an appropriate initial parameter value can be an important
task that may contribute considerably to the ultimate result. If we start the iterative search
at a good estimate, then the optimization procedure will only improve the result. For
constructing initial model estimates, several methods are available and commonly applied.
High order FIR modelling and consecutively model reduction by approximate real-
ization techniques, as discussed in chapter 4.
High order ARX modelling and model reduction, choosing among the wide class of
model reduction techniques, as e.g. a model reduction based on (truncated) balanced
realizations.
e(t)
?
H0 (q)
u(t) ? y(t)
- G0 (q) -
+e -
? ?
1 1
(k1) (k1)
A(q 1 , N ) A(q 1 , N )
A(q 1 , )
?
F (t, )
If this procedure converges then it follows that the resulting prediction error is an
output error. However this does not necessarily make the algorithm equivalent to an
output error identication algorithm, as this output error is not necessarily minimized
by the iterative approach. It has been shown by Stoica and Soderstrom (1981) that
local convergence (and sometimes global convergence) to the true parameter vector
can be obtained under the condition that the data generating system has an output
error structure with white additive noise on the output. However note that in a
situation of approximate modelling, the result of this iterative algorithm will generally
be dierent from a (real) output error estimate.
The prediction error identication methods discussed in this and the previous chapter can
also be implemented in a recursive way. This implies that models are estimated on-line while
gathering the measurement data. The parameter estimate N obtained after N -datapoints,
is updated to N +1 when -at the next time instant- additional data is obtained. We will not
discuss these methods further here, but will refer to the standard references Ljung (1987),
Soderstrom and Stoica (1989), and specially to Ljung and Soderstrom (1983).
156 Version 24 November 2005
M = arx(DATA,NN)
M = oe(DATA,NN)
M = bj(DATA,NN)
M = armax(DATA,NN)
M = pem(DATA,NN)
In the respective m-les, parameters are estimated on the basis of output and input data
present in IDDATA-object DATA, according to a model structure determined by the name
of the routine, and with number of parameters per polynomial indicated in
NN = [na nb nc nd nf nk ]
B(q, ) C(q, )
A(q, )y(t) = q nk u(t) + (t).
F (q, ) D(q, )
The structure vector NN in the above routines, only requires those integers that are relevant
for the particular model structure. The PEM command refers to the use of the general
structure including all ve polynomials.
The output M is an estimated model in the IDPOLY-format as utilized in the System Iden-
tication Toolbox; it includes a nominal estimate, together with an estimated parameter
variance.
Parameter estimates can be made visible by the following commands.
5.15 Summary
In this chapter we have discussed the basic aspects and results that are involved in iden-
tication of linear dynamic models by prediction error identication methods. We have
restricted attention to the so-called least squares criterion function, and to single input
single output (SISO) models.
Key ingredients to an identication procedure are
the model set M = {(G(q, ), H(q, ))} determined by a choice of structure (ARX,
FIR, OE, BJ etcetera) and by appropriate degrees of the dierent polynomials;
the identication criterion N1 N 2
t=1 (t, )
Chapter 5 157
The basic reference for the material presented in this chapter is Ljung (1987, 1999). In
Soderstrom and Stoica (1989) much of this material is also present as well as extensions to
the multivariable situation.
158 Version 24 November 2005
Appendix
Since 0 T (S, M), (t, 0 ) = H01 (q)G0 (q)u(t) + H01 (q)y(t) = e(t).
Moreover
Now we have to analyze whether V () = V (0 ) implies that [G(q, ) H(q, )] = [G0 (q) H0 (q)].
Denote G(q) = G0 (q) G(q, ), and H(q) = H0 (q) H(q, ). Then
Because {e(t)} is a white noise process, E[H(q)e(t)]2 = 0 implies that H(q) = 0, leading to
H(q, ) = H0 (q).
In order to arrive at the implication
G(q, ) = G0 (q)
V () V (0 ) = 0 (5A.10)
H(q, ) = H0 (q)
N = V ( )1 VN ( , Z N )
1
N
VN ( , Z N ) = 2 (t, )(t, ). (5A.13)
N
t=1
1
N
(t, )(t, ) As N (0, Q)
N t=1
with
Q = lim N E{[VN ( , Z N )][VN ( , Z N )]T }
N
160 Version 24 November 2005
The dierent terms of VN are not independent, but due to the uniform stability of the
model set, the dependence between distant terms will decrease, and a similar result can
still be shown to hold. We will not discuss the additional step of (5.128). 2
1
N N
Q = lim 4 E(t, 0 )e(t)e(s) T (s, 0 ).
N N
t=1 s=1
Since {e(t)} is a white noise sequence, and (t, 0 ) is uncorrelated with e(s) for s < t, it
follows that
Q = lim 4e2 E[(t, 0 )((t, 0 ))T ].
N
while the second term on the right hand side equals 0, since (t, 0 ) is uncorrelated with
e(t).
Substituting the above expressions in (5.126) shows the result. 2
consequently
Ry u ( ) = g ( ). (5A.16)
Substitution of (t, ) into (5A.14) then leads to
As g0 (k) = 0 for k < 0 it follows directly that (5A.17) can be extended to the interval
< k nb 1. Then the two (recursive) equations (5A.17), (5A.18) are identical over
the interval < k nb 1, which proves the result. 2
Bibliography 161
Bibliography
K.J. Astrom (1980). Maximum likelihood and prediction error methods. Automatica, 16,
pp. 551-574.
M.H.A. Davis and R.B. Vinter (1985). Stochastic Modelling and Control. Chapman and
Hall, London.
J.E. Dennis and R.E. Schnabel (1983). Numerical Methods for Unconstrained Optimization
and Nonlinear Equations. Prentice-Hall, Englewood Clis, NJ.
P.J. Huber (1981). Robust Statistics. Wiley, New York.
L. Ljung and P.E. Caines (1979). Asymptotic normality of prediction error estimation for
approximate system models. Stochastics, 3, pp. 29-46.
L. Ljung and T. Soderstrom (1983). Theory and Practice of Recursive Identication. MIT-
Press, Cambridge, Mass.
L. Ljung and Z.D. Yuan (1985). Asymptotic properties of black-box identication of transfer
functions. IEEE Trans. Automat. Contr., AC-30, pp. 514-530.
L. Ljung (1985). Asymptotic variance expressions for identied black box transfer function
models. IEEE Trans. Automat. Contr., AC-30, pp. 834-844.
L. Ljung (1987). System Identication - Theory for the User. Prentice-Hall, Englewood
Clis, NJ. Second edition, 1999.
M. Milanese, J. Norton, H. Piet-Lahanier and E. Walter (Eds.) (1996). Bounding Ap-
proaches to System Identication. Plenum Press, New York.
C.T. Mullis and R.A. Roberts (1976). The use of second order information in the approxi-
mation of discrete time linear systems. IEEE Trans. Acoust., Speech, Signal Processing,
vol. ASSP-24, 226-238.
T. Soderstrom and P. Stoica (1983). Instrumental Variable Methods for System Identica-
tion. Lecture Notes in Control and Information Sciences, Springer-Verlag, New York.
T. Soderstrom and P. Stoica (1989). System Identication. Prentice-Hall, Hemel Hemp-
stead, U.K.
K. Steiglitz and L.E. McBride (1965). A technique for the identication of linear systems.
IEEE Trans. Automat. Control, AC-10, pp. 461-464.
P. Stoica and T. Soderstrom (1981). The Steiglitz-McBride identication algorithm re-
visited - convergence analysis and accuracy aspects. IEEE Trans. Automat. Control,
AC-26, no. 3, pp. 712-717.
P. Stoica, T. Soderstrom, A. Ahlen and G. Solbrand (1984). On the asymptotic accuracy
of pseudo-linear regression algorithms. Int. J. Control, vol. 39, no. 1, pp. 115-126.
P. Stoica, T. Soderstrom, A. Ahlen and G. Solbrand (1984). On the convergence of pseudo-
linear regression algorithms. Int. J. Control, vol. 41, no. 6, pp. 1429-1444.
B. Wahlberg and L. Ljung (1986). Design variables for bias distribution in transfer function
estimation. IEEE Trans. Automat. Contr., AC-31, pp. 134-144.
Y.C. Zhu (1989). Black-box identication of MIMO transfer functions: asymptotic prop-
erties of prediction error models. Int. J. Adaptive Control and Sign. Processing, vol. 3,
pp. 357-373.
162 Version 24 November 2005
Chapter 6
Abstract1
A least squares identication method is studied that estimates a nite number of expansion
coecients in the series expansion of a transfer function, where the expansion is in terms
of recently introduced generalized basis functions. The basis functions are orthogonal in
H2 and generalize the pulse, Laguerre and Kautz bases. One of their important properties
is that when chosen properly they can substantially increase the speed of convergence of
the series expansion. This leads to accurate approximate models with only few coecients
to be estimated. Explicit bounds are derived for the bias and variance errors that occur in
the parameter estimates as well as in the resulting transfer function estimates.
6.1 Introduction
The use of orthogonal basis functions for the Hilbert space H2 of stable systems has a long
history in modelling and identication of dynamical systems. The main part of this work
dates back to the classical work of Lee (1933) and Wiener (1946), as also summarized in
Lee (1960).
In the past decades orthogonal basis functions, as e.g. the Laguerre functions, have been
employed for the purpose of system identication by e.g. King and Paraskevopoulos (1979),
Nurges and Yaaksoo (1981), Nurges (1987). In these works the input and output signals of
a dynamical system are transformed to a (Laguerre) transform domain, being induced by
the orthogonal basis for the signal space. Consecutively, more or less standard identication
techniques are applied to the signals in this transform domain. The main motivation for
this approach has been directed towards data reduction, as the representation of the mea-
surement data in the transform domain becomes much more ecient once an appropriate
1
This chapter is a reprint from P.M.J. Van den Hof, P.S.C. Heuberger and J. Bokor (1995). System
identication with generalized orthonormal basis functions. Automatica, Vol. 31, pp. 1821-1834. As a
result notation may be slightly dierent from the notation in the other chapters. For a more compact
treatment see also Heuberger et al. (2005).
163
164 Version 17 October 1996
basis is chosen.
In Wahlberg (1990, 1991, 1994a) orthogonal functions are applied for the identication of
a nite sequence of expansion coecients. Given the fact that every stable system has a
unique series expansion in terms of a prechosen basis, a model representation in terms of a
nite length series expansion can serve as an approximate model, where the coecients of
the series expansion can be estimated from input-output data.
Consider a stable system G(z) H2 , written as
G(z) = Gk z k (6.1)
k=0
with {Gk }k=0,1,2, the sequence of Markov parameters. Let {fk (z)}k=0,1,2, be an orthonor-
mal basis for the set of systems H2 . Then there exists a unique series expansion:
G(z) = Lk fk (z) (6.2)
k=0
Note that fk (z) = z k is one of the possibilities for choosing such a basis. If a model of the
system G is represented by a nite length series expansion:
n1
G(z) = Lk fk (z). (6.4)
k=0
then it is easily understandable that the accuracy of the model, in terms of the minimal
possible deviation between system and model (in any prechosen norm), will be essentially
dependent on the choice of basis functions fk (z). Note that the choice fk (z) = z k corre-
sponds to the use of so-called FIR (nite impulse response) models (Ljung, 1987).
As the accuracy of the models is limited by the basis functions, the development of appro-
priate basis functions is a topic that has gained considerable interest. The issue here is that
it is protable to design basis functions that reect the dominant dynamics of the process
to be modelled.
Laguerre functions are determined by
(1 az)k
fk (z) = 1 a2 z , |a| < 1. (6.5)
(z a)k+1
see e.g. Gottlieb (1938) and Szego (1975), and they exhibit the choice of a scalar design vari-
able a that has to be chosen in a range that matches the dominating (rst order) dynamics
of the process to be modelled. Considerations for optimal choices of a are discussed e.g. in
Clowes (1965) and Fu and Dumont (1993). For moderately damped systems, Kautz func-
tions have been employed, which actually are second order generalizations of the Laguerre
functions, see Kautz (1954), Wahlberg (1990, 1994a, 1994b).
Chapter 6 165
Recently a generalized set of orthonormal basis functions has been developed that is gen-
erated by inner (all pass) transfer functions of any prechosen order, Heuberger and Bosgra
(1990), Heuberger (1991), Heuberger et al. (1992, 1993). This type of basis functions gen-
eralizes the Laguerre and Kautz-type bases, which appear as special cases when choosing
rst order and second order inner functions. Given any inner transfer function (with any
set of eigenvalues), an orthonormal basis for the space of stable systems H2 (and similarly
for the signal space
2 ) can be constructed.
Using generalized basis functions that contain dynamics can have important advantages in
identication and approximation problems. It has been shown in Heuberger et al. (1992,
1993), that if the dynamics of the basis generating system and the dynamics of the system
to be modelled approach each other, the convergence rate of a series expansion of the system
becomes very fast. Needless to say that the identication of expansion coecients in a series
expansion benets very much from a fast convergence rate; the number of coecients to be
determined to accurately model the system becomes smaller. This concerns a reduction of
both bias and variance contributions in the estimated models.
In this paper, we will focus on the properties of the identication scheme that estimates
expansion coecients in such series expansions, by using simple (least squares) linear re-
gression algorithms. To this end we will consider the following problem set-up, compatible
with the standard framework for identication as presented in Ljung (1987).
Consider a linear, time-invariant, discrete-time data generating system:
n1
y(t, ) = Lk ()fk (q)u(t), (6.7)
k=0
n1
(t, ) = y(t) Lk ()fk (q)u(t). (6.8)
k=0
1
N
N (n) = arg min (t, )2 , (6.9)
N t=1
n1
G(z, N ) = Lk (N )fk (z). (6.10)
k=0
166 Version 17 October 1996
This identication method has some favorable properties. Firstly it is a linear regression
scheme, which leads to a simple analytical solution; secondly it is of the type of output-error-
method, which has the advantage that the input/output system G0 (z) can be estimated
consistently whenever the unknown noise disturbance v(t) is uncorrelated with the input
signal (Ljung, 1987).
However, it is well known that for moderately damped systems, and/or in situations of high
sampling rates, it may take a large value of n, the number of coecients to be estimated,
in order to capture the essential dynamics of the system G into its model. If we are able
to improve the basis functions in such a way that an accurate description of the model to
be estimated can be achieved by a small number of coecients in a series expansion, then
this is benecial from both aspects of bias and variance of the model estimate.
In this paper we will analyse bias and variance errors for the asymptotic parameter and
transfer function estimates, for the general class of orthogonal basis functions as recently
introduced in Heuberger and Bosgra (1990), Heuberger (1991), Heuberger et al. (1992,
1993). The results presented generalize corresponding results as provided by Wahlberg
(1991, 1994a) for the Laguerre and Kautz bases.
In section 6.2 we will rst present the general class of orthogonal basis functions, and the
least squares identication will be formulated in section 6.3. Then in sections 6.4 and 6.5
we will introduce and analyse the Hambo transform of signals and systems, induced by
the generalized orthogonal basis. This transform plays an important role in the statistical
analysis of the asymptotic parameter estimates. This asymptotic analysis is completed in
section 6.6. A simulation example in section 6.7 illustrates the identication method, and
the paper is concluded with some summarizing remarks.
We will use the following notation.
()T Transpose of a matrix.
IR pm
set of real-valued matrices with dimension p m.
C Set of complex numbers.
Z+ Set of nonnegative integers.
2 [0, ) Space of squared summable sequences on the time interval Z+ .
pm
2 [0, ) Space
of matrix sequences {Fk IRpm }k=0,1,2, such that
T
k=0 tr(Fk Fk ) is nite.
pm
H2 Set of real p m matrix functions that are squared integrable on the
unit circle.
2
2 -norm of a vector; induced
2 -norm or spectral norm of a constant
matrix, i.e. its maximum singular value.
1
1 -norm of a vector; induced
1 -norm of a matrix operator.
-norm of a vector; induced
-norm of a matrix operator.
H2 H2 -norm of a stable transfer function.
E limN N1 N t=1 E
Kronecker matrix product.
ei i-th Euclidean basis vector in IRn .
In n n Identity matrix.
(t) Kronecker delta function, i.e. (t) = 1, t = 0; (t) = 0, t = 0.
:= is dened by.
The scalar transfer function G(z) has an nb -dimensional state space realization
(A, B, C, D), with A IRnb nb , and B, C, D of appropriate dimensions, if G(z) = C(zI
Chapter 6 167
Theorem 6.2.1 Let Gb (z) be a scalar inner function with McMillan degree nb > 0, having
a minimal balanced realization (A, B, C, D). Denote
Then the sequence of scalar rational functions {eTi Vk (ei )}i=1,,nb ;k=0, forms an or-
thonormal basis for the Hilbert space H2 . 2
Note that these basis functions exhibit the property that they can incorporate systems
dynamics in a very general way. One can construct an inner function Gb from any given set
of poles, and thus the resulting basis can incorporate dynamics of any complexity, combining
e.g. both fast and slow dynamics in damped and resonant modes. A direct result is that for
any specically chosen Vk (z), any strictly proper transfer function G(z) H2 has a unique
series expansion
1
G(z) = z Lk Vk (z) with Lk
1nb [0, ). (6.12)
k=0
For specic choices of Gb (z) well known classical basis functions can be generated.
a. With Gb (z) = z 1 , having minimal balanced realization (0, 1, 1, 0), the standard pulse
basis Vk (z) = z k results.
1 az
b. Choosing a rst order inner function Gb (z) = , with some real-valued a, |a| < 1,
za
and balanced realization
(A, B, C, D) = (a, 1 a2 , 1 a2 , a) (6.13)
c. Similarly the Kautz functions (Kautz, 1954; Wahlberg, 1990,1994a), originate from
the choice of a second order inner function
cz 2 + b(c 1)z + 1
Gb (z) = (6.15)
z 2 + b(c 1)z c
168 Version 17 October 1996
with some real-valued b, c satisfying |c|, |b| < 1. A balanced realization of Gb (z) can
be found to be given by
!
b (1 b2 )
A = (6.16)
c (1 b2 ) bc
!
0
B = (6.17)
(1 c2 )
C = 2 1 D = c (6.18)
with 1 = b (1 c2 ) and 2 = (1 c2 )(1 b2 ), see also Heuberger et al. (1992,
1993).
The generalized orthonormal basis for H2 also induces a similar basis for the signal space
2 [0, ) of squared summable sequences, through inverse z-transformation to the signal-
domain. Denoting
Vk (z) = k (
)z , (6.19)
=0
it follows that {eTi k (
)}i=1,,nb ;k=0,, is an orthonormal basis for the signal space
2 [0, ).
These
2 basis functions can also be constructed directly from Gb and its balanced realiza-
tion (A, B, C, D), see Heuberger et al. (1992).
n1
(t, ) = y(t) Lk Vk (q)u(t 1), (6.20)
k=0
We will assume that the input signal {u(t)} is a quasi-stationary signal (Ljung, 1987)
having a spectral density u (), with a stable spectral factor Hu (ei ), i.e. u () =
Hu (ei )Hu (ei ).
We will further denote
and consequently
(t, ) = y(t) T (t). (6.24)
Following Ljung (1987), under weak conditions the parameter estimate N (n) given by (6.9)
will converge with probability 1 to the asymptotic estimate
with
R(n) = E(t) T (t) F (n) = E(t)y(t). (6.26)
For the analysis of bias and variance errors of this identication scheme, we will further use
the following notation:
1 (0)
G0 (z) = z Lk Vk (z)
k=0
(0) (0)
0 = [L0 Ln1 ]T
(0)
e = [L(0)
n Ln+1 ]
T
In the analysis of the asymptotic parameter estimate (n) the block-Toeplitz structured
matrix R(n) will play an important role. Note that this matrix has a block-Toeplitz struc-
ture with the (j,
) block-element given by Exj (t)xT (t). For the analysis of the properties
of this block-Toeplitz matrix, we will employ a signal and system transformation that is
induced by the generalized basis. This transformation is presented and discussed in the
next two sections.
Remark 6.3.1 In this paper we consider the identication of strictly proper systems by
strictly proper models. All results will appear to hold similarly true for the case of proper
systems and corresponding proper models, by only adapting the notation in this section.
Let {Vk (z)}k=0, be an orthonormal basis, as dened in section 6.2, and let {k (t)}k=0,
be as dened in (6.19). Then for any signal x(t)
m2 there exists a unique transformation
X (k) := k (t)xT (t) (6.29)
t=0
We will refer to x() as the Hambo-transform of the signal x(t). Note that x
m
2 , and
x() H2nb m .
Now consider a scalar system y(t) = G(q)u(t) with G H2 with u, y signals in
2 . Then
there exists a Hambo-transformed system G() H2nb nb such that
where the shift operator q operates on the sequence index k. The construction of this
transformed system is given in the following proposition.
Proposition 6.4.1 Consider a scalar system G H2 relating input and output signals
according to y(t) = G(q)u(t) with u, y
2 , with G(z) = k=0 gk z
k .
The interpretation of this proposition is that the Hambo-transform of any system G can be
obtained by a simple variable-transformation on the original transfer function, where the
variable transformation concerned is given by z 1 = N ().
Note that this result generalizes the situation of a corresponding Laguerre transformation,
+a
where it concerns the variable-transformation z = (see also Wahlberg, 1991). How-
1 + a
ever due to the fact that in our case the McMillan degree of the inner function that generates
the basis is nb 1, the Hambo-transformed system G increases in input/output-dimension
to G H2nb nb . Note that, as G is scalar, N () is an nb nb rational transfer function
Chapter 6 171
Then
y() = G()u() (6.37)
with G() as dened in (6.35). 2
Proof: For m = 1 the result is shown in Proposition 6.4.1. If we write the relation between
y and u componentwise, i.e. yi (t) = G(z)ui (t) it follows from the mentioned Proposition
that yi () = G()ui (), where yi , ui IRnb 1 (). It follows directly that
y() = [y1 () ym ()] = G()[u1 () um ()] = G()u(). 2
One of the results that we will need in the analysis of least squares related block Toeplitz
matrices is formulated in the following Proposition.
Proposition 6.4.3 Consider a scalar inner transfer function Gb (z) generating an orthog-
onal basis as discussed before. Then
Gb () = 1 Inb . (6.38)
Proof: It can simply be veried that for all k, Gb (q)k (t) = k+1 (t). With Proposition
6.4.2 it follows that k+1 ()
= Gb ()k (). t
Since for each k, k () = t=0 Inb (t k) , it follows that for all k,
The basis generating inner function transforms to a simple shift in the Hambo-domain.
Next we will consider a result that reects some properties of the back-transformation of
Hambo-transformed systems to the original domain, i.e. the inverse Hambo transform.
The following lemma relates quadratic signal properties to properties of the transformed
signals.
Proof: This Lemma is a direct consequence of the fact that due to the fact that the basis
is orthonormal, it induces a transformation that is an isomorphism. 2
The transformation that is discussed in this section refers to
2 -signals and the correspond-
ing transformation of systems actually concerns the transformation of the
2 -behaviour or
graph of a dynamical system. However, this same orthogonal basis for
2 can also be em-
ployed to induce a transformation of (quasi-)stationary stochastic processes to the transform
domain, as briey considered in the next section.
hv = Hv . (6.43)
Let w(t) = Pwv (q)v(t) with Pwv a stable scalar transfer function. Then
hw = Pwv hv (6.45)
with hw , hv
2 , the impulse responses of stable spectral factors of w (), v (), re-
spectively. Similarly to Lemma 6.4.5 we can now formulate some properties of stochastic
processes.
Lemma
6.5.1 Let w, z be m-dimensional
stationary stochastic processes, satisfying w(t) =
k=0 hw (k)e(tk) and z(t) = k=0 hz (k)e(tk), with {e(t)} a scalar-valued unit variance
white noise process. Then
1
T
E[w(t)z (t)] = hT (ei )hz (ei )d. (6.46)
2 w
Chapter 6 173
The previous lemmas can simply be shown to hold also in the case
of quasi-stationary
1
signals. To this end we already used the operator E := limN N1 Nt=0 E, where E
stands for expectation.
Theorem 6.6.1 The matrix R(n) dened in (6.26) is a block-Toeplitz matrix, being the
covariance matrix related to the spectral density function u ().
Proof: The (j,
) block-element of matrix R(n) is given by Exj (t)xT (t). Since xj (t) =
Gjb (q)Inb x0 (t) it follows with Lemma 6.5.2 that
1
Exj (t)xT (t) = hT (ei )[GTb (ei )]j [Gb (ei )] hx0 (ei )d.
2 x0
Since x0 (t) = q 1 V0 (q)u(t) we can write hx0 (t) = q 1 V0 (q)Hu (q)(t). Since Hu is scalar,
we can write hx0 (t) = Hu (q)q 1 V0 (q)(t) = Hu (q)hv0 (t), with hv0 (t) the impulse response
of the transfer function q 1 V0 (q).
Applying Proposition 6.4.1 now shows hx0 = Hu hv0 = Hu . The latter equality follows from
hv0 = Inb , as the impulse response of q 1 V0 (q) exactly matches the rst nb basis functions
in the Hambo-domain. Consequently
T 1 i(j) T i i 1
Exj (t)x (t) = e Hu (e )Hu (e )d = ei(j) u ()d
2 2
Remark 6.6.2 In Wahlberg (1991), treating the (rst order) Laguerre case, the corre-
sponding Toeplitz matrix is the covariance matrix related to the spectral density
ei + a
u ( ). (6.51)
1 + aei
This implies that in that case a variable transformation
ei + a
ei (6.52)
1 + aei
is involved, or equivalently
1 + aei
ei . (6.53)
ei + a
Using the balanced realization of a rst order inner function, (6.13), this implies that in the
setting of this paper, the variable transformation involved is given by ei N (ei ), while
N has a minimal balanced realization
(A, B, C, D) = (a, 1 a2 , 1 a2 , a).
The following Proposition bounds the eigenvalues of the block-Toeplitz matrix R(n).
Proposition 6.6.3 Let the block-Toeplitz matrix R(n) dened in (6.26) have eigenvalues
j (R(n)). Then
(a) For all n, the eigenvalues of R(n) are bounded by
In the sequel of this paper we will assume that the input spectrum is bounded away from
zero, i.e. ess inf u () c > 0.
Consequently
0 2 R(n)1 2 E[(t)eT (t)]2 e 2 , (6.55)
where for a (matrix) operator T , T 2 refers to the induced operator 2-norm. For simplicity
of notation we have skipped the dependence of (and 0 ) on n.
We can now formulate the following upper bound on the bias error.
Chapter 6 175
Proposition 6.6.4 Consider the identication set-up as discussed in section 6.3. Then
ess sup u ()
0 2 e 2 , (6.56)
ess inf u ()
,
(0) (0) T
where e 2 = k=n Lk Lk . 2
where V0 (ei1 ) is the
-induced operator norm of the matrix V0 (ei1 ) Cnb 1 ,
i.e. the maximum absolute value over the elements in V0 (ei1 ).
(b) The H2 -norm of the model error is bounded by:
,
ess sup u ()
G(z, ) G0 (z)H2 0 22 + e 22 {1 + }e 2 . (6.57)
ess inf u ()
Note that this latter bound on the bias in the transfer function estimate as well as the
previously derived bound, are dependent on the basis functions chosen. The factor e 22 is
determined by the convergence rate of the series expansion of G0 in the generalized basis.
The closer the dynamics of the system G0 will be to the dynamics of the inner transfer
function Gb , the faster the convergence rate will be. An upper bound for this convergence
rate is derived in the following Proposition, based on the results in Heuberger et al. (1992).
Proposition 6.6.6 Let G0 (z) have poles i , i = 1, , ns , and let Gb (z) have poles j ,
j = 1, nb .
Denote
nb
+
i j
:= max
i 1 i j . (6.58)
j=1
Proof: A sketch of proof is given in the Appendix. For a detailed proof the reader is
referred to Heuberger (1991) and Heuberger and Van den Hof (1996). 2
Note that when the two sets of poles converge to each other, will tend to 0, the upper
bound on e 2 will decrease drastically, and the bias error will reduce accordingly. The
above result clearly shows the important contribution that an appropriately chosen set of
basis functions can have in achieving a reduction of the bias in estimated transfer functions.
The results in (6.56), (6.57) show that we achieve consistency of the parameter and transfer
function estimates as n provided that the input spectrum is bounded away from 0
and e 2 0 for n . The latter condition is guaranteed if G0 H2 .
For the FIR case, corresponding with Gb (z) = z 1 , we know that under specic experimen-
tal conditions the nite number of expansion coecients can also be estimated consistently,
irrespective of the tail. This situation can also be formulated for the generalized case.
Proof: Under the given condition it can simply be veried that u () = Inb . This implies
that the block-Toeplitz matrix R(n) = I, and that for all n 1, R12 (n) = 0. Employing
this relation in the proofs of Propositions 6.6.4, 6.6.5 shows the results. 2
Note that a special case of the situation of an inner Hu is obtained if the input signal u
is uncorrelated (white noise). In that situation Hu = 1 and consequently Hu = Inb , being
inner.
where N (0, Qn ) denotes a Gaussian distribution with zero mean and covariance matrix Qn .
For output error identication schemes, as applied in this paper, the asymptotic covariance
matrix satises:
Qn = [E(t) T (t)]1 [E (t) T (t)][E(t) T (t)]1 (6.61)
with (t) = i=0 h0 (i)(t+i), and h0 (i) the impulse response of the corresponding transfer
function H0 .
We know that, according to Theorem 6.6.1, the block-Toeplitz matrix R(n) = E(t) T (t)
is related to the spectral density function u (). For the block-Toeplitz matrix P (n) =
E (t) T (t) we can formulate a similar result.
Chapter 6 177
Lemma 6.6.8 The block-Toeplitz matrix P (n) = E (t) T (t) is the covariance matrix re-
lated to the spectral density function u () v (). 2
Proof: The proof follows along similar lines as followed in the proof of Theorem 6.6.1. 2
From the asymptotic covariance of the parameter estimate, we can derive the expression
for the transfer function estimate:
N 1
cov(G(ei1 ), G(ei2 )) T (ei1 )Qn n (ei2 ) (6.62)
n nb n nb n
Theorem 6.6.9 Assume the spectral density u () to be bounded away from zero and
suciently smooth. Then for N, n , n2 /N 0:
N
cov(G(ei1 , N ), G(ei2 , N ))
n nb
0 for Gb (ei1 ) = Gb (ei2 ),
1 T i1 v (1 )
V0 (e )V0 (ei1 ) for 1 = 2 .
nb u (1 )
n T i1 v (1 )
V0 (e )V0 (ei1 ) (6.63)
N u (1 )
which is the noise to input signal ratio weighted with an additional weighting factor that is
determined by the basis functions. This additional weighting, which is not present in the
case of FIR estimation, again generalizes the weighting that is also present in the case of
Laguerre basis functions, see Wahlberg (1991). Since the frequency function V0 (ei ) has a
low pass character, it ensures that the variance will have a roll-o at high frequencies. This
is unlike the case of FIR estimation, where the absolute variance generally increases with
increasing frequency.
The role of V0 in this variance expression clearly shows that there is a design variable
involved that can be chosen also from a point of view of variance reduction. In that case
V0 has to be chosen in such a way that it reduces the eect of the noise (v ()) in those
frequency regions where the noise is dominating.
The result of the theorem also shows that - for nb = 1 - the transfer function estimates will be
asymptotically uncorrelated. In that case it can simply be shown that Gb (ei1 ) = Gb (ei2 )
implies 1 = 2 for 1 , 2 [0, ]. In the case nb > 1 this latter situation is not guaranteed.
178 Version 17 October 1996
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
0 20 40 60 80 100 120 140 160 180 200
Figure 6.1: Simulated noise disturbed output signal y(t) (solid) and noise signal v(t)
(dashed) on time interval t = 1, .., 200.
1
10
0
10
-1
10
-2
10 -2 -1 0 1
10 10 10 10
Figure 6.2: Bode amplitude plot of simulated system G0 (solid) and of basis functions V0
(four-dimensional) (dashed).
components of V0 , i.e. the rst four basis functions. Note that all other basis functions
will show the same Bode amplitude plot, as they only dier in multiplication by a scalar
inner function, which does not change its amplitude. We have used 5 dierent realizations
of 1200 data points to estimate 5 dierent models. Their Bode amplitude plots are given
in Figure 6.3 and the corresponding step responses in Figure 6.4.
Figure 6.5 shows the relevant expressions in the asymptotic variance expression (6.63). This
refers to the plots of V0T V0 and of the noise spectrum v () as well as their product. Since
u () = 1, this latter product determines the asymptotic variance of the estimated transfer
function.
To illustrate the power of the identication method, we have made a comparison with the
identication of 5th order (least squares) output error models, dealing with a parametrized
prediction error
b1 q 1 + b2 q 2 + + b5 q 5
(t, ) = y(t) u(t). (6.66)
1 + a1 q 1 + + a5 q 5
1
10
0
10
-1
10
-2
10 -2 -1 0 1
10 10 10 10
Figure 6.3: Bode amplitude plot of simulated system G0 (solid) and of ve estimated models
with n = 5, N = 1200 using ve dierent realizations of the input/output data (dashed).
1.2
0.8
0.6
0.4
0.2
-0.2
0 10 20 30 40 50 60 70 80 90 100
Figure 6.4: Step response of simulated system G0 (solid) and of ve estimated models with
n = 5, N = 1200 using ve dierent realizations of the input/output data (dashed).
Chapter 6 181
2
10
1
10
0
10
-1
10
-2
10
-3
10
-4
10 -2 -1 0 1
10 10 10 10
1 T
Figure 6.5: Bode amplitude plot of V V0 (dashed), spectrum v () (dash-dotted) and
nb 0
their product (solid).
1
10
0
10
-1
10
-2
10 -2 -1 0 1
10 10 10 10
Figure 6.6: Bode amplitude plot of G0 (solid) and ve 5-th order OE models (dashed).
182 Version 17 October 1996
1.2
0.8
0.6
0.4
0.2
0
0 10 20 30 40 50 60 70 80 90 100
Figure 6.7: Step response of G0 (solid) and ve 5-th order OE models (dashed).
6.8 Discussion
In this paper we have analyzed some asymptotic properties of linear estimation schemes
that identify a nite number of expansion coecients in a series expansion of a linear sta-
ble transfer function, employing recently developed generalized orthogonal basis functions.
These basis functions generalize the well known pulse, Laguerre and Kautz basis functions
and are shown to provide exible design variables, that when properly chosen provide fast
convergence of the series expansion. In an identication context this implies that only few
coecients have to be estimated to obtain accurate estimates, while simple linear regres-
sion schemes can be used. Both bias and variance errors are analyzed and error bounds are
established.
As the accuracy of the chosen basis functions can substantially improve the identication
results in both bias and variance, the introduced method points to the use of iterative
procedures, where the basis functions are updated iteratively. Previously estimated models
can then be used to dictate the poles of the basis. Such iterative methods already have
been applied successfully in practical experiments, see e.g. De Callafon et al.(1993).
The exibility of the introduced basis functions provides a means to introduce uncertain
a priori knowledge into the identication procedure. In contrast with other identication
techniques, where a priori knowledge denitely has to be certain, this a priori knowledge
(i.e. the system poles), is allowed to be approximate. The only consequence is that the
accuracy of the identied models will be higher, the better the a priori knowledge is.
Apart from the identication of nominal models, the basis functions introduced here, have
also been applied in the identication of model error bounds, see De Vries (1994) and
Hakvoort and Van den Hof (1994).
Appendix 183
Appendix
Lemma 6A.1 Consider a scalar inner transfer function Gb generating an orthogonal basis
as discussed in section 6.2, with V0 and N as dened by (6.11) and (6.33) respectively. Then
Proof:
n1
n1
(n)T R(n)(n) = Tk ck (6A.6)
k=0 =0
with
1
ck = Vk (ei )VT (ei )u ()d. (6A.7)
2
n1
Denoting (ei ) := k=0 Tk Vk (ei ), it follows that
1
(n)T R(n)(n) = (ei )u ()(ei )d, (6A.8)
2
Since ess inf u () (n)T (n) (n)T R(n)(n) ess sup u () (n)T (n), it follows
that
ess inf u () R(n)2 ess sup u (). (6A.10)
The latter equation can be veried by realizing that, since R(n) is symmetric, there exists
Q(n) satisfying R(n) = Q(n)Q(n)T leading to
Part (b).
The Hermitian form Tn := T (n)R(n)(n) can be written as
(n)
Tn = T (n)diag{1 , , (n)
nb n }(n)
Appendix 185
through unitary transformation preserving the norm, i.e. T (n)(n) = T (n)(n). Conse-
quently
(n)T R(n)(n) (n)
T
max i , (6A.11)
(n)(n) i
(n)
lim max i = ess sup u ().
n i
As a result
R12 (n)2 lim max j (R(n)),
n j
which by Proposition 6.6.3(b) is equal to ess sup u (). This proves the result. 2
where 1 refers to the induced
1 matrix norm and the
1 -norm, respectively. It follows
from the fact that Gb is inner that
!
i1 0
|G(e , ) G0 (e )| V0 (e )1
i1 T i1
1 =
e
= V0T (ei1 )1 [ 0 1 + e 1 ] = V0 (ei1 ) [ 0 1 + e 1 ].
186 Version 17 October 1996
Part (a) of the Proposition now follows by substituting the error bound obtained in Propo-
sition 6.6.4, and using the inequality 0 1 nb n 0 2 .
Because of the orthonormality of the basis functions on the unit circle, it follows that
!
0
G(z, ) G0 (z)H2 = 2 ,
e
(c) Let {i }i=1,,ns denote the poles of G0 . Then the poles of G0 are given by i =
0nb i j
{G1
b (i )}i=1,,ns leading to |i | = j=1 1i j .
The rst statement is proven in Van den Hof et al. (1994), and the latter implication
follows directly by substituting the appropriate expressions.
Lemma 6A.2 Let v be a scalar-valued stationary stochastic process with rational spectral
density v (). Let ei = Gb (ei ). Then
Proof: Let Hv be a stable spectral factor of v , satisfying v () = Hv (ei )Hv (ei ). Then
v () = HvT (ei )Hv (ei ) and it follows that
V0T (ei )v ()V0 (ei ) = V0T (ei )HvT (Gb (ei ))Hv (Gb (ei ))V0 (ei ). Using Proposition
6.4.4, this latter expression is equal to Hv (ei )V0T (ei )V0 (ei )Hv (ei ). Since Hv is scalar
this proves the lemma. 2
Appendix 187
Proof of Theorem 6.6.9. Using (6.62), and substituting n (ei ) shows that
1 T i1
(e )Qn n (ei2 ) =
nb n n
1
[V T (ei1 ) V1T (ei1 ) Vn1
T
(ei1 )] Qn [V0T (ei2 ) V1T (ei2 ) Vn1
T
(ei2 )]T .
nb n 0
Note that this latter expression can be written as
1 T i1
V (e )[I Gb (ei1 )I Gn1 (ei1 )I] Qn
nb n 0 b
[I Gb (ei2 )I Gn1
b (ei2 )I]T V0 (ei2 )).
(6A.15)
Now we evaluate the following expression:
1
[I Gb (ei1 )I Gn1
b (ei1 )I] Qn [I Gb (ei2 )I Gn1
b (ei2 )I]T . (6A.16)
n
Since Gb is an inner function we can consider the variable transformation
ei := Gb (ei ). (6A.17)
Employing this transformation in the expression (6A.16), this latter expression is equivalent
to:
1
[I ei1 I ei1 (n1) I] Qn [I ei2 I ei2 (n1) I]T .
n
The convergence result of Hannan and Wahlberg (1989) and Ljung and Yuan (1985) now
show that for n , this expression converges to
0 if 1 = 2
(6A.18)
Q(1 ), if 1 = 2 ,
where Q() is the spectral density related to the Toeplitz matrix Qn in the limit as n .
Employing Lemma 6A.2, together with (6.61), it follows that for n the spectrum
related to the Toeplitz matrix Qn is given by v ()u ()1 . This is due to the fact
that the symbol (spectrum) of a Toeplitz matrix which is the product of several Toeplitz
matrices, asymptotically (n ) equals the product of the symbols (spectra) of the
separate Toeplitz matrices, see Grenander and Szego (1958).
Combining this with (6A.15) now shows that, for n ,
1 T i1 1 T i1
(e )Qn n (ei2 ) = V (e )Q()V0 (ei2 ) =
nb n n nb 0
0 2 ,
if 1 =
= 1 T i1
V (e )v (1 )u (1 )1 V0 (ei1 ) if 1 = 2 .
nb 0
(6A.19)
Employing Lemma 6A.2 now shows that the resulting expression in (6A.19) becomes:
1 T i1
V (e )v (1 )u (1 )1 V0 (ei1 ).
nb 0
In using Lemma 6A.2 it has to be realized that there always will exist a scalar-valued
stationary stochastic process z with rational spectrum z such that z = v 1
u . 2
188 Version 17 October 1996
Bibliography
G.J. Clowes (1965). Choice of the time scaling factor for linear system approximations
using orthonormal Laguerre functions. IEEE Trans. Autom. Contr., AC-10, 487489.
R.A. de Callafon, P.M.J. Van den Hof and M. Steinbuch (1993). Control relevant identi-
cation of a compact disc pick-up mechanism. Proc. 32nd IEEE Conf. Decision and
Control, San Antonio, TX, USA, pp. 2050-2055.
D.K. de Vries (1994). Identication of Model Uncertainty for Control Design. Dr. Dis-
sertation, Mechanical Engineering Systems and Control Group, Delft Univ. Technology,
September 1994.
Y. Fu and G.A. Dumont (1993). An optimum time scale for discrete Laguerre network.
IEEE Trans. Autom. Control, AC-38, 934938.
G.C. Goodwin, M. Gevers and D.Q. Mayne (1991). Bias and variance distribution in
transfer function estimation. Preprints 9th IFAC/IFORS Symp. Identication and Syst.
Param. Estim., July 1991, Budapest, Hungary, pp. 952-957.
M.J. Gottlieb (1938). Concerning some polynomials orthogonal on nite or enumerable set
of points. Amer. J. Math., 60, 453458.
U. Grenander and G. Szego (1958). Toeplitz Forms and Their Applications. Univ. Califor-
nia Press, Berkeley, 1958.
R.G. Hakvoort and P.M.J. Van den Hof (1994). An instrumental variable procedure for
identication of probabilistic frequency response uncertainty regions. Proc. 33rd IEEE
Conf. Decision and Control, Lake Buena Vista, FL, pp. 3596-3601.
E.J. Hannan and B. Wahlberg (1989). Convergence rates for inverse Toeplitz matrix forms.
J. Multiv. Analysis, 31, no. 1, 127135.
P.S.C. Heuberger and O.H. Bosgra (1990). Approximate system identication using system
based orthonormal functions. Proc. 29th IEEE Conf. Decision and Control, Honolulu,
HI, pp. 1086-1092.
P.S.C. Heuberger (1991). On Approximate System Identication with System Based Or-
thonormal Functions. Dr. Dissertation, Delft University of Technology, The Netherlands,
1991.
P.S.C. Heuberger, P.M.J. Van den Hof and O.H. Bosgra (1993). A generalized orthonormal
basis for linear dynamical systems. Proc. 32nd IEEE Conf. Decision and Control, San
Antonio, TX, pp. 2850-2855.
P.S.C. Heuberger, P.M.J. Van den Hof and O.H. Bosgra (1995). A generalized orthonormal
basis for linear dynamical systems. IEEE Trans. Autom. Control, Vol. AC-40, pp.
451-465.
P.S.C. Heuberger and P.M.J. Van den Hof (1996). The Hambo transform: a transformation
of signals and systems induced by generalized orthonormal basis functions. Proc. 13th
IFAC World Congress, San Francisco, CA, July 1996, Vol. I, pp. 103-108.
P.S.C. Heuberger, P.M.J. Van den Hof and B. Wahlberg (Eds.) (2005). Modelling and
Identication with Rational Orthogonal Basis Functions. Springer Verlag.
W.H. Kautz (1954). Transient synthesis in the time domain. IRE Trans. Circ. Theory,
CT-1, 29-39.
R.E. King and P.N. Paraskevopoulos (1979). Parametric identication of discrete time
SISO systems. It. J. Control, 30, 10231029.
Y.W. Lee (1933). Synthesis of electrical networks by means of the Fourier transforms of
Laguerre functions. J. Mathem. Physics, 11, 83113.
Bibliography 189
Y.W. Lee (1960). Statistical Theory of Communication. John Wiley & Sons, Inc., New
York, NY.
L. Ljung and Z.D. Yuan (1985). Asymptotic properties of black-box identication of transfer
functions. IEEE Trans. Autom. Control, AC-30, 514530.
L. Ljung (1987). System Identication - Theory for the User. Prentice Hall, Englewood
Clis, NJ.
Y. Nurges (1987). Laguerre models in problems of approximation and identication of
discrete systems. Autom. and Remote Contr., 48, 346352.
Y. Nurges and Y. Yaaksoo (1981). Laguerre state equations for a multivariable discrete
system. Autom. and Remote Contr., 42, 16011603.
G. Szego (1975). Orthogonal Polynomials. Fourth Edition. American Mathematical Soci-
ety, Providence, RI,USA.
P.M.J. Van den Hof, P.S.C. Heuberger and J. Bokor (1994). System identication with
generalized orthonormal basis functions. Proc. 33rd IEEE Conf. Decision and Control,
Lake Buena Vista, FL, pp. 3382-3387.
B. Wahlberg (1990). On the Use of Orthogonalized Exponentials in System Identication.
Report LiTH-ISY-1099, Dept. Electr. Eng., Linkoping University, Sweden.
B. Wahlberg (1991). System identication using Laguerre models. IEEE Trans. Automat.
Contr., AC-36, 551562.
B. Wahlberg (1994a). System identication using Kautz models. IEEE Trans. Autom.
Control, AC-39, 1276-1282.
B. Wahlberg (1994b). Laguerre and Kautz models. Prepr. 10th IFAC Symp. System
Identication, Copenhagen, Denmark, Vol. 3, pp. 1-12.
N. Wiener (1949). Extrapolation, Interpolation and Smoothing of Stationary Time Series.
MIT Press, Cambridge, MA.
190 Version 17 October 1996
Chapter 7
Identication design
7.1 Introduction
The identication methods presented in the previous chapter show a number of users
choices that - to the experimenter - act as design variables, that can essentially inuence
the models that are nally identied. Take for instance the following choices that have to
be made:
Choice of model set and model structure;
If we want consistency of the model estimates, then the situation is relatively clear. Some
validation tests concerning this situation are discussed in section 8.3. Of course, consistency
requires a correct choice of the model set and the model structure, as discussed in the
previous chapter; additionally some weak conditions on the experimental situation have
to be satised, as persistence of excitation of the input signal. However note that as an
ultimate objective in systems modelling, consistency is probably not a very realistic goal
to pursue. We denitely have to require from our identication methods that they obtain
consistent estimates in the - articial - situation that this is indeed possible. However we
surely have to deal with the question, what to require from our models in the situation that
consistency cannot be achieved, e.g. caused by the fact that our models simply are not
able to capture all of the (possibly nonlinear, innite dimensional, time-varying) process
dynamics.
191
192 Version 2 December 2005
If consistency is out of the question, probably a realistic and relevant answer to the question
formulated above is
This implies that we should evaluate the quality of our estimated models in relation to
the model application. Applications can e.g. be:
Simulation
Prediction
Diagnosis
Control design
where D reects all design variables that aect the identied model, as model set M,
prelter L, input spectrum u .
Chapter 7 193
Heuristically, the bias expression can be interpreted as nding a compromise between two
minimizations, the rst one being,
1
D arg min |G0 (ei ) G(ei , )|2 Q(, )d (7.2)
2
and the second being the tting of |H(ei , )|2 to the error spectrum, i.e. the numerator
spectrum in (7.1). In many of the situations we will apply system identication techniques
in order to obtain an accurate estimate of the input-output transfer function G0 (ei ).
Therefore we will give a closer look at the bias distribution G(ei , ) G0 (ei ).
G(ei , ) G0 (ei ) 2
D = arg min | | |G0 (ei )|2 Q(, )d (7.4)
G0 (ei )
which shows that in order to obtain a small relative error in G where |G0 | is small, it
requires a weighting function Q(, ) that is much larger at those frequencies.
Note that in this heuristic analysis, the frequency scale is linear, which means that the
decade between say 1 and 10 rad/sec represents a 10 times larger weight than the decade
between 0.1 and 1 rad/sec. To reect a logarithmic frequency scale, one should compensate
for in the weighting function:
|G0 (ei )|2 Q(, )d = |G0 (ei )|2 Q(, )d(log ) (7.5)
To secure a good t at low frequencies, |G0 |2 Q must be much larger than there.
Prelter L(q)
is therefore quite straightforward, but may involve some trial and error.
The eect of the sampling interval on Q will be discussed in section 7.3.
Illustration.
194 Version 2 December 2005
10 0
10 -1
10 -2
10 -3
10 -4
10 -2 10 -1 10 0 10 1
frequency (rad/sec)
Figure 7.1: Bode plot of true system G0 (solid line) and second order OE-model (dashed
line) obtained by applying a high-pass (HP) prelter L1 (q).
Even though heuristic, (7.2) is a quite useful tool to understand and manipulate the bias
distribution. This is illustrated in a continuation of example 5.12.1, also taken from Ljung
(1987).
Example 7.2.1 (Example 5.12.1 continued). We consider again the data generating sys-
tem as in example 5.12.1. The resulting model in the OE-structure (5.183) gave the Bode
plot of gure 5.9. This corresponds to Q() 1 (-independent).
Comparing with (7.5) and (7.4), we see that high frequencies play very little role in the
Bode plot t, due to the rapid roll-o of |G0 (ei )|.
To enhance the high-frequency t, we lter the prediction errors through a fth-order high-
pass (HP) Butterworth lter L1 (q) with cut-o frequency of 0.5 rad/sec. This is equivalent
to introducing a xed noise model H(q) = 1/L1 (q), and changes the weighting function
Q() to this HP-lter. The Bode amplitude plot of the resulting estimate is given in gure
7.1. Comparing it with gure 5.9 the t has become slightly better around = 1rad/s.
However the model has become worse for the lower frequencies, while the second-order
model has problems to describe the fourth-order roll-o for > 1rad/s.
Consider now the estimate obtained by the least-squares method in the ARX model struc-
ture. This was depicted in gure 5.9. If we want a better low-frequency t, it seems
reasonable to counteract the HP weighting function Q(, ) in gure 5.10 by low-pass
(LP) ltering of the prediction errors. We thus construct L2 (q) as a fth order LP Butter-
worth lter with cut-o frequency 0.1 rad/sec. The ARX model structure is then applied to
the input-output data ltered through L2 (q). Equivalently, we could say that the prediction
error method is used for the model structure
b1 q 1 + b2 q 2 1
y(t) = 1 2
u(t) + e(t) (7.6)
1 + a1 q + a2 q L2 (q)(1 + a1 q 1 + a2 q 2 )
The resulting estimate is shown in gure 7.2 and the corresponding weighting function
Q(, ) in gure 7.3. Clearly we have now achieved a much better low-frequency t in the
frequency range 0.4rad/s. We may note that the estimates of gure 7.2 (ltered ARX)
and gure 5.9 (unltered OE) are quite similar. One should then realize that the ltered
Chapter 7 195
10 0
10 -1
10 -2
10 -3
10 -4
10 -2 10 -1 10 0 10 1
frequency (rad/sec)
Figure 7.2: Bode plot of true system G0 (solid line) and second order ARX-model (dashed
line) obtained by applying a low-pass (LP) prelter L2 (q).
10 -2
10 -3
10 -4
10 -5
10 -6
10 -7
10 -8
10 -2 10 -1 10 0 10 1
Figure 7.3: Weighting function Q(, ) = |L2 (ei )|2 |1 + a1 ei + a2 e2i |2 corresponding
to the estimate in gure 7.2.
ARX estimate of gure 7.2 is much easier to obtain than the output error estimate of gure
5.9, which requires iterative search algorithms.
Note that the Steiglitz and McBride procedure as discussed in section 5.13 is an attempt
to choose the lter L such that an output error minimization results. The point remains
that for a real output error minimization L should be chosen parameter dependent.
Do we have an indication of the frequency range over which the process will exhibit
its essential dynamics?
Measurements with an exciting input signal meant for transient analysis. Typical
signals here are steps and/or pulses. Typical information to be extracted from the
data is e.g. the largest and smallest relevant frequencies and the static gain of the
process, i.e. G(z)|z=1 , determining the response of the system to a constant input
signal. Moreover one can use these experiments in order to nally decide which input
and output signals are going to be considered. If the resulting model is going to be
used for control design, it is desirable that the input signals that are going to be
used as control inputs are able to reduce the eect of disturbances on the outputs,
both with respect to amplitude and frequency range. This will guide the choice for
appropriate control input signals.
Additionally it can be veried whether and to which extent (amplitude) the process
dynamics can be considered to be linear. Apart from possible physical restrictions one
generally will be in favour of applying an input signal with a maximum achievable
power, in order to increase the signal-to-noise ratio at the output. Considering a
possible excitation of non-linear dynamics, the amplitude of the input signal may
have to be restricted, as mentioned before.
A possible test signal that can be used for determination of static nonlinearities is
the so-called staircase-signal, as depicted in gure 7.4. The length of every step
in the signal is chosen in such a way that the process response has more or less
reached its static value. Next to indications on relevant frequencies and static gain,
this experiment can provide an estimate of a static nonlinearity. To this end the
static responses {(u1 , y1 ), (u2 , y2 ), } are used to estimate a polynomial-t on these
characteristics.
Measurements with an exciting input signal meant for correlation and frequency anal-
ysis. Typical input signals are sinus, white noise, PRBS (discussed later). These
experiments can be performed to obtain a more accurate indication of the process
dynamics and the frequency range of interest. They can also be used to determine
Chapter 7 197
the design variables for the nal (parametric) identication experiment, as e.g. the
length of the experiment, the type of signal chosen, the sample frequency etc., and to
determine possible time-delays in the process.
The question whether all the experiments mentioned can indeed be performed is strongly
determined by the type of process and the costs that are related to experimentation. When
modelling mechanical servo systems as e.g. robots and optical drives, we generally observe
compact set-ups, fast (high-frequency) dynamics, and generally it will be easily possible to
extensively experiment with the process. Experiments will generally be cheap. However
looking at processes in the (chemical) process industry, the situation is completely dierent.
Industrial production processes often have time-constants of the order of hours/days, which
requires extremely long experiments in order to obtain a data set that is suciently long
for accurate identication. Moreover these experiments will generally be rather expensive,
e.g. due to loss of production capacity. In those situations it may be necessary to reduce
the number of experiments to a minimum and to compress a number of steps discussed
above.
u(t) = c sign[w(t)]
Ru ( ) = c2 =0 (7.7)
= 0 = 0 (7.8)
198 Version 2 December 2005
and consequently
u () = c2 < < .
Therefore the signal u has the spectral properties of white noise, but at the same time it is
a binary signal assuming values c. There are two additional attractive properties of this
signal:
u is bounded in amplitude. This is a very favorable property in view of the possible
excitation of nonlinear dynamics in the process. If the process is assumed to be oper-
ating in an operational condition around which a linearization can be made under the
restriction of an amplitude-bound, this amplitude-bound can be used as a constraint
on the input signal. Moreover the amplitude of physical actuators may be restricted
by the physical conguration (e.g. a valve can open only between xed limits).
u is binary, and therefore it has maximum signal power under an amplitude bound
constraint. This is advantageous as one may expect that the accuracy of the model
estimates will improve when the input power is increased, thus creating a higher
signal-to-noise ratio at the output.
The rst point mentioned is of course also shared by a uniform distributed white noise
process (and not by a Gaussian white noise!). However the second property is typical for a
binary signal. A typical such RBS signal is sketched in Figure 7.5(a).
c
(a)
t
c
c
(b)
t
c
Figure 7.5: (a) Typical RBS with clock period equal to sampling interval (Nc = 1); (b)
RBS with increased clock period Nc = 2.
Ru ( )
6
Nc
Figure 7.6: Covariance function of u(t) as dened in (7.9), having a basic clock period of
Nc sampling intervals.
The increase of the clock-period has a particular inuence on the spectral properties of the
signal, as formulated in the next Proposition, a proof of which is added in the appendix.
and
1 1 cos(Nc )
u () = . (7.11)
Nc 1 cos
The covariance function of the signal u(t) in this proposition is sketched in gure 7.6.
The expression for the spectrum u () is obtained by realizing that the covariance function
Ru ( ) can equivalently be obtained by ltering a white noise process by the linear lter
1 1 1 q Nc
F (q) = (1 + q 1 + q 2 + + q Nc ) = (7.12)
Nc Nc 1 q 1
In gure 7.7 the spectrum u () is sketched for a number of dierent values of Nc . It can
clearly be veried that for increasing Nc there is a shift of signal power to the low-frequent
part. From a at signal spectrum for Nc = 1 (white noise property) a shift is made to a
low frequent signal for Nc > 1, resulting in a signal for Nc = 10 that has negligible power
in the higher frequencies.
Remark 7.3.2 Note that u () = 0 for = 2k Nc , k = 1, .., int(Nc /2). This means that
the spectrum has an increasing number of zero-crossings with increasing value of Nc . It is
straightforward that one has to be careful with spectrum inversion at those frequency points,
e.g. when using nonparametric (or very high order) identication.
200 Version 2 December 2005
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3 3.5
1
Figure 7.7: Spectrum 2 u () of RBS with basic clock period Nc = 1 (solid), Nc = 3
(dash-dotted), Nc = 5 (dotted), and Nc = 10 (dashed).
Remark 7.3.3 The eect on the spectral density that is obtained by changing the clock
period of the RBS to a multiple of the sampling interval can of course also be obtained
by ltering the RBS through a linear low-pass lter. However, a particular advantage of
using the clock period mechanism here is that the resulting signal remains to be amplitude
bounded. As mentioned before this has advantages both from a viewpoint of avoiding the
excitation of nonlinear system dynamics and from a viewpoint of avoiding actuator wearing.
where {w(t)} a stochastic white noise process, and R(q) a stable linear lter that can
be used by the experimenter to inuence the spectral density of {u(t)}. The choice
R(q) = 1 is equivalent to the situation considered before.
2. Consider the random binary signal u(t) with values c, determined by P r(u(t) =
u(t 1)) = p, and P r(u(t) = u(t 1)) = 1 p, with p the non-switching probability,
(0 < p < 1). With p = 1/2 this signal has comparable properties as the previously
mentioned RBS with R(q) 0. Choosing p = 1/2 gives the possibility to inuence
the spectral density of the signal. This RBS is analyzed in Tulleken (1990), where it
is shown that
Chapter 7 201
Ru ( ) = c2 (2p 1) ; and
1 q2
u () = ; with q = 2p 1.
1+ q2
2qcos()
A sketch of the spectral density of u(t) for several values of p is given in gure 7.8.
Note that these spectra do not have the zero crossings as are present in an (P)RBS
with extended clock period. This is considered an advantage of this probabilistic way
of inuencing the signal spectrum. Note that for non-switching probabilities > 0.5,
the low-frequent behaviour of the signals is emphasized; for p < 0.5 it also possible to
construct spectral densities with high-frequency emphasis.
1.4
1.2
0.8
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3 3.5
1
Figure 7.8: Spectrum 2 u () for RBS with non-switching probabilities p = 0.5 (solid),p =
0.75 (dashed), p = 0.90 (dash-dotted) and p = 0.3 (dotted).
r
u(t) = k sin(k t + k )
k=1
u
CF(u) =
u2
i.e. the ratio of innity-norm and 2-norm of the signal. For more information on general
input design the reader is referred to Pintelon and Schoukens (2001) and Godfrey (1993).
The random binary signal (7.13) is studied in Schoukens et a. (1995).
In case we consider consistency of the identied models as an achievable goal, that can
be reached by choosing the correct model structure, the input signal can be chosen as to
minimize some measure on the variance of the parameter estimates. This is the problem of
optimal input, or experiment, design, and has been given wide attention in the literature,
see e.g. Goodwin and Payne (1977). However in many practical situations the user can
hardly assume that the true process is linear and of nite order. Identication must then
be considered as a method of model approximation, in which the identied model will be
dependent on the experimental conditions. A general advice then is to let the input have
its major energy in the frequency band that is of interest for the intended application of
the model.
Discussing the problem of how to choose the sampling frequency, we have to distinguish
two dierent situations:
(b) The sampling frequency that is used in the identication procedure, being equal to
the sampling frequency for which the discrete-time model is built.
process will hardly be contained in the data set. A value of Ts that is larger than the
essential time-constants of the process (corresponding to a sampling frequency s that is
smaller than the smallest relevant frequency), would lead to sampled signals that can not
contain appropriate information on the relevant dynamics in the process. In this case an
appropriate choice of Ts will have to be a trade-o between disturbance reduction and the
incorporation of information on the relevant process dynamics. A general rule-of-thumb for
the lower bound on the total experimentation time TN is given by 5 10-times the largest
relevant time-constant in the process, where this time-constant is determined as 1/ with
the smallest frequency of interest.
The question whether -for a given sampling interval- information on specic process dy-
namics is present in the sampled signals can be answered by employing the Theorem of
Shannon. This states that a sampled signal with sampling frequency s can exactly repro-
duce a continuous-time signal provided that the continuous-time signal is band-limited, i.e.
it has no frequency content for frequencies s /2. This is visualized in the expression
for the Fourier transform of ud (k) := u(kTs ):
1 2k
Ud () = Uc ( ) (7.14)
Ts Ts
k=
where Uc is the Fourier transform of the underlying continuous-time signal. If the continuous-
time signal does not satisfy the restriction that Uc () = 0 for s /2 = Ts , then
reproducing the signal uc (t) from its sampled version ud (k) leads to a distortion where the
frequency components in the original continuous time signal with frequencies > s /2
appear as low-frequent contributions in the reconstructed signal. This eect which is called
frequency-folding or aliasing has to be prevented by taking care of the fact that all contin-
uous signals that are being sampled are rst band-limited through an operation of linear
ltering through a (continuous-time) anti-aliasing lter. This anti-aliasing lter has to
remove all frequency components in the continuous-time signal with frequencies s /2.
In order to reduce this eect of aliasing, a rule-of thumb for the upper bound of the sampling
interval is often given by:
s 10 b (7.15)
with b the bandwidth1 of the process. For a rst-order system having b = 1/ this rule
of thumb can also be rewritten as
set,95
Ts (7.16)
5
with set,95 the 95%-settling time of the step response of the process (set,95 3 for a rst
order process).
When we are going to construct discrete-time models additional arguments play a role when
choosing for a specic sampling interval. The rst one is the aspect of numerical accuracy
and sensitivity. Consider a continuous-time system having a state-space description with
state matrix Ac . If we apply a continuous-time signal that is piecewise constant between
the sampling instants, we can formulate a discrete-time system relation leading to sampled
output signals at the sampling instants. It can simply be veried that the (discrete-time)
system that produces the correct sampled output signal, has a discrete-time state-space
1
The bandwidth is dened as the maximum frequency for which the magnitude of the frequency
function reaches the level of 1/ 2 times its static value.
204 Version 2 December 2005
description with state matrix Ad = eAc Ts . If Ts approaches 0 then Ad will approach the
identity matrix, and consequently all poles of the discrete-time system will cluster around
the point 1. This causes numerical diculties. The dierence equations related to the
models now describe relations between sample values that are so close that they hardly
vary within the range of the order of the dierence equation.
Another implication of choosing a small sampling interval is related to the principle of the
prediction error identication methods that we have discussed. The error criterion that is
involved is determined by the one-step-ahead prediction error of models. As the sampling
interval decreases, this one-step-ahead interval becomes smaller. The result is that the
model t may be concentrated to the high-frequency range. This can be illustrated by
considering the asymptotic bias expression:
/Ts
= arg min |G0 (eiTs ) G(eiTs , )|2 Q(, )d (7.17)
/Ts
where we have incorporated the Ts -dependence. As Ts tends to zero, the frequency range
over which the integral is taken increases. Generally the contribution of the dierence
|G0 (eiTs ) G(eiTs , )| will become smaller as increases, due to a roll-o in the transfer
functions. However in situations where the noise model is coupled to the dynamics in
G(eiTs , ), as is the case for an ARX model structure, then the product
data-acquisition one uses a sample frequency that is as high as possible. All kinds of data
processing operations can then be performed on this highly sampled signal. Before the data
is really used for identication of parametric models one then reduces the sample frequency
by digitally preltering and decimation to a level that is motivated from a point of view of
model application. This strategy is suggested e.g. in Ljung (1987) and Backx and Damen
(1989).
Nonzero means and drifts can be removed from the data by lter operations. Removing
of nonzero means can be done by correcting the signals by subtracting thepresent static
values or an estimate thereof, in the sense of their sample means u = N1 N t=1 u(t), y =
1 N
N t=1 y(t). Standard identication methods are then applied to the corrected data (u(t)
u, y(t) y). Similarly slowly varying disturbances can be removed from the data by high-
pass ltering of the signals. In order to avoid the introduction of phase-shifts during this
ltering, use can be made of symmetrical non-causal lters that operate o-line on the data
sequence.
There are several ways of estimating trends in data and of incorporating these eects in
the models to be estimated, as e.g. using noise models with integration which is equivalent
with dierencing the data. For more details the reader is referred to Ljung (1987) and
Soderstrom and Stoica (1989).
Scaling of signals
Input and output signals of physical processes will generally have numerical values that are
expressed in dierent units and dierent ranges, dependent on the fact whether we deal
with mbars, cms, secs etc. In order to arrive at normalized transfer functions, signals
are scaled in such a way that they exhibit an equal power with respect to their numerical
values. For multivariable systems this scaling problem becomes even more pronounced, as
dierent signal amplitudes then automatically lead to a dierent weighting of the signals
in the prediction error criterion. In other words: signals with larger numerical values will
then dominate over signals with smaller numerical values.
Compensation of time-delays
Information from previous experiments concerning possible time-delays present in the pro-
cess, can now be used to compensate the data for these time-delays by shifting input and
output signals with respect to each other. As a result the time-delays do not have to be
parametrized in the identication procedure. Note that in multivariable systems with m
inputs and p outputs generically m+p1 time delays can be corrected for in this way, while
the maximum number of time delays present is equal to the number of scalar transfers, i.e.
p m. Only for m = 1 or p = 1 all possible occurring time-delays can be corrected by
shifting the signals.
Decimation
In addition to the discussion concerning the choice of sampling frequency, post-processing
of the signals may contain a step of further reduction of the sampling frequency, called
Chapter 7 207
decimation. In this nal step the sampling frequency is realized that is desired from a
viewpoint of discrete-time model description and model application. This reduction of
sampling frequency or enlargement of the sampling interval again has to be preceded by
an anti-aliasing lter which in this case is a discrete-time lter. Note that properties of
input signals that are of importance from an identication point of view (see sections 7.2,
7.3) are formulated for a sampling frequency as is obtained after this decimation step. If
a PRBS is used as input signal, and in view of the identication procedure one requires a
constant input spectrum, then the clock period of the PRBS will have to be chosen equal
to the sampling interval after decimation.
VN (, Z N )
= fN RN (7.22)
b0 + b1 q 1 + + bnb q nb
G(q, ) = (7.23)
1 + a1 q 1 + + ana q na
and one would like to impose the static gain s := G(z)|z=1 of this model, this can be done
by incorporating the restriction
b0 + b1 + + bnb
= s (7.24)
1 + a1 + + ana
or equivalently:
w() = 0 (7.25)
with
where T = [s s s | 1 1 1].
Minimization of VN (, Z N ) under the constraint w() = 0 is now obtained by:
VN (, Z N )
+ = 0 (7.29)
w() = 0 (7.30)
which two equations have to be satised by N and . This shows that the parameter
estimate is obtained by solving
! ! !
RN N fN
= . (7.31)
T 0 s
The parameter estimate for the constrained problem, can directly be calculated, without
requiring a complex optimization procedure. This is induced by the fact that the constraint
- as also the model structure - is linear in the parameters.
1
Note that in the above expression for N , the unconstrained least-squares estimate RN fN
appears explicitly in the right hand side of the expression.
For well-denedness of the constrained least-squares problem, it is necessary that the set of
1
equations (7.31) remains uniquely solvable. This is guaranteed by requiring that T RN =
0.
Chapter 7 209
Constrained least-squares identication by restricting the static gain of the model has been
discussed also in Inouye and Kojima (1990).
Then by following the same analysis as before, the constrained least-squares estimate will
be determined by ! ! !
RN N fN
= , (7.34)
T 0 g
where now is an r-vector of Lagrange multipliers, and a (na + nb + 1) r matrix given
by
g1 ei1 gr eir
.. .. ..
. . .
g1 eina 1 gr eina r
= 1 1
(7.35)
ei 1 e i r
.. .. ..
. . .
einb 1 einb r
The parameter estimate is given by
1 1 1 1 1
N = RN fN + RN [ T RN ] [g T RN fN ]. (7.36)
Well-denedness of this parameter estimate is again restricted to the situation that the
1
matrix T RN is nonsingular. The dimension of this square matrix is equal to the number
of constraints r that has been imposed. It is intuitively clear that whenever the number of
constraints becomes too large, they can not be met anymore by the restricted complexity
model. Singularity of the matrix will denitely occur whenever the number of constraints
r exceeds the number of parameters to be estimated. In that situation becomes a fatt
1
matrix, and singularity of T RN is obvious.
Data preprocessing
210 Version 2 December 2005
Appendix
PRBS signals
Denition 7A.1 PRBS.
Let x be a binary state vector, x(t) {0, 1}n for t Z+ , n > 1, with a given initial value
x(0) = 0, and let a {0, 1}n . Consider the binary signal s(t) dened by the following
algorithm:
with modulo-2 addition; then s(t) is called a pseudo-random binary signal (PRBS). 2
Clock
? ? ?
- state 1 - state 2 - state n - s(t)
an
1 an
2 n
an1 ann
? ? ?
n
n
The shift register will generate a binary signal. This is a deterministic sequence: given the
initial state and the coecient vector a, all future states are completely determined.
It can simply be understood that such a PRBS is a periodic signal. The shift register has
a nite number of states, and each state uniquely determines all future states. The period-
length of the signal is an important property of the PRBS. This leads to the denition of
a special classes of PRBSs.
The period length M = 2n 1 is the maximum period that is possible for such a signal. Note
that the term 1 is caused by the fact that the 0 state (x = 0) should be circumvented,
since this state forces all future states to be equal to 0.
It will be discussed later on that a maximum length PRBS has properties that resemble
the properties of a white noise signal. First we briey consider the question under which
conditions a PRBS becomes a maximum length PRBS. Apparently the properties of a
PRBS are completely determined by the dimension n of the state (register) vector, and by
the coecient vector a that determines the feedback path.
Now let us denote
A(q 1 ) = 1 a1 q 1 a2 q 2 an q n (7A.4)
The PRBS s(t) generated as in denition 7A.1 obeys the following homogeneous equation:
A(q 1 )s(t) = 0 (7A.5)
This can be understood by realizing that s(t) = xn (t) = xnj (t j) for j = 1, , (n 1).
As a result
A(q 1 )s(t) = xn (t) a1 xn (t 1) an xn (t n) (7A.6)
= x1 (t n + 1) a1 x1 (t n) an x(t n) (7A.7)
= 0 (7A.8)
where the latter equality follows from the fact that x1 (t n + 1) equals a1 x1 (t n)
an x(t n) by denition. The problem to study now is the choice of the feedback coecients
ai such that the equation (7A.5) has no solution s(t) with period smaller than 2n 1. A
necessary and sucient condition on A(q 1 ) for this property to hold is provided in the
following proposition.
Proposition 7A.3 The homogeneous recursive relation (7A.5) has only solutions of period
2n 1 (i.e. the corresponding PRBS is a maximum length PRBS) if and only if the following
two conditions are satised:
The binary polynomial A(q 1 ) is irreducible, i.e. there do not exist any two polynomi-
als A1 (q 1 ) and A2 (q 1 ) with binary coecients such that A(q 1 ) = A1 (q 1 )A2 (q 1 )
and A1 , A2 = A;
A(q 1 ) is a factor of 1 q M but is not a factor of 1 q p for any p < M = 2n 1.
For the proof of this proposition the reader is referred to Davies (1970) or Soderstrom and
Stoica (1989). This result has led to the construction of tables of polynomials satisfying the
conditions of the proposition. Examples are: for n = 3 : a1 = 1, a3 = 1; for n = 6 : a1 = 1,
a6 = 1; for n = 10 : a3 = 1, a10 = 1. Here all coecients that are not mentioned should be
chosen 0.
A maximum length PRBS has properties that are similar to the properties of a discrete-time
white noise sequence. This will be formulated in the next proposition. However since we
are generally interested in signals that vary around 0, rather than signals that vary between
0 and 1, we can simply transform the PRBS to
u(t) = c[1 + 2s(t)] (7A.9)
with s(t) a maximum length PRBS as discussed before. The binary signal u(t) will vary
between the two values c and +c.
Appendix 213
Proposition 7A.4 (Davies, 1970). Let u(t) be a maximum length PRBS according to
(7A.9) and denition 7A.2, with period M . Then
c
Eu(t) = (7A.10)
M
1 2
Ru (0) = (1 )c (7A.11)
M2
c2 1
Ru ( ) = (1 + ), = 1, , M 1 (7A.12)
M M
2
1
c2 (1 M2
)
EE E
E
E E
E E
E E
E E
E E
c2
M (1 + 1 E M E
M)
Note that for M the covariance function of u(t) resembles that of white noise with
variance c2 . Due to their easy generation and their convenient properties the maximum
length PRBSs have been used widely in system identication. The PRBS resembles white
noise as far as the spectral properties are concerned. Inuecing the signal spectrum by
changing the clock-period, as described for RBS signals, can similarly be applied to the
PRBS signals.
1
Nc p1
Ru ( ) = lim E u(t + )u(t) (7A.14)
p Nc p
t=0
Nc 1
1 1
p1
m+ m
= lim E e(s + int( ))e(s + int( ))
Nc p p Nc Nc
m=0 s=0
c 1
N
p1
1 1 m+
= lim E e(s + int( ))e(s)
Nc m=0
p p s=0
Nc
c 1
N
1 m+
= E e(s + int( ))e(s)
Nc m=0
Nc
1
Nc c 1
N
1 12
= Ee(s) + Ee(s + 1)e(s)
Nc Nc
m=0 m=Nc
1 Nc
= (Nc ) = .
Nc Nc
A signal with the same covariance function can be obtained by ltering a white noise process
{e(t)} through a linear lter:
w(t) = F (q)e(t) (7A.15)
with
1 1 1 q Nc
F (q) = (1 + q 1 + q 2 + + q Nc ) = . (7A.16)
Nc Nc 1 q 1
Note that Rw ( ) = E j=0 f (j)e(t j) k=0 f (k)e(t k) with f (j) = 1/ Nc for 0 j
Nc
Nc 1 and 0 elsewhere. Consequently Rw ( ) = j=0 f (j)f (j ) = j= f (j)f (j )
which can simple be shown to coincide with Ru ( ). As a result u () = F (ei )F (ei )
which equals
1 (1 eiNc )(1 eiNc )
(7A.17)
Nc (1 ei )(1 ei )
which is equivalent to the expression in the proposition.
Bibliography 215
Bibliography
K.J. Astrom and B. Wittenmark (1984). Computer Controlled Systems. Prentice-Hall,
Englewood Clis, NJ.
A.C.P.M. Backx and A.A.H. Damen (1989). Identication of industrial MIMO processes
for xed controllers. Journal A, vol. 30, no. 1, pp. 3-12.
W.D.T. Davies (1970). System Identication for Self-Adaptive Control. Wiley-Interscience,
New York.
M. Gevers (1993). Towards a joint design of identication and control? In: H.L. Trentel-
man and J.C. Willems (Eds.), Essays on Control: Perspectives in the Theory and its
Applications. Proc. 1993 European Control Conference, Groningen, The Netherlands,
Birkhauser, Boston, pp. 111-151.
K. Godfrey (Ed.) (1993). Perturbation Signals for System Identication. Prentice Hall,
Hemel Hempstead, UK.
G.C. Goodwin and R.L. Payne (1977). Dynamic System Identication - Experiment Design
and Data Analysis. Academic Press, New York.
Y. Inouye and T. Kojima (1990). Approximation of linear systems under the constraint of
steady state values of the step responses. In: M.A. Kaashoek et al. (Eds.), Realization
and Modelling in System Theory. Proc. Intern. Symposium MTNS-89, Volume 1, pp.
395-402. Birkhauser Boston Inc., Boston.
L. Ljung (1987). System Identication - Theory for the User. Prentice-Hall, Englewood
Clis, NJ.
R. Pintelon and J. Schoukens (2001). System Identication - A Frequency Domain Ap-
proach. IEEE Press, Piscataway, NJ, USA, ISBN 0-7803-6000-1.
J. Schoukens, P. Guillaume and R. Pintelon (1995). Generating piecewise-constant excita-
tions with an arbitrary power spectrum. IEE Proc. Control Theory Appl., Vol. 142, no.
3, pp. 241-252.
R.J.P. Schrama (1992). Approximate Identication and Control Design. Dr. Dissertation,
Delft Univ. Technology.
H.J.A.F. Tulleken (1990). Generalized binary noise test-signal concept for improved identi-
cation experiment design. Automatica, vol. 26, no. 1, pp. 37-49.
P.M.J. Van den Hof and R.J.P. Schrama (1995). Identication and control - closed loop
issues. Automatica, vol. 31, pp. 1751-1770.
B. Wahlberg and L. Ljung (1986). Design variables for bias distribution in transfer function
estimation. IEEE Trans. Automat. Contr., AC-31, pp. 134-144.
Y. Zhu and T. Backx (1993). Identication of Multivariable Industrial Processes, Springer
Verlag, Berlin.
216 Version 2 December 2005
Chapter 8
8.1 Introduction
Considering the general identication procedure as sketched in gure 1.9 there are two
issues that have yet to be discussed. In chapter 5 it has been discussed extensively what
dierent model structures can be applied. However the design choice of a particular model
set in a given situation has not been addressed yet. This will be the topic of section 8.2.
Additionally in section 8.3 the issue of model validation will be addressed. These issues are
collected in one chapter as they exhibit many relationships.
217
218 Version 03 December 2005
and on which the identication algorithms are based. In this course we have almost
exclusively addressed a parametrization of {[G(q, ) H(q, )]} in terms of fractions of
polynomials.
The ultimate goal of the user of identication methods, will be to nd a good model at a
low price. These notions of quality and price can provide criteria on the basis of which
appropriate choices for a model set can be made.
When is a model good?
Despite of the fact that the acceptance of a given model will be dependent on its ultimate
use, a general expression on the quality of a model can be given by stating that one aims
at a model with small bias and small variance. Now this expression in itself again reects
conicting requirements. The wish to achieve a small bias motivates the use of large, exible
model sets (high order models), such that the undermodelling error is small. Aiming
for a small variance, on the other hand, motivates the use of only a limited number of
unknown parameters. This latter statement originates from the property that the variance
of estimated parameters generally will increase with an increasing number of parameters.
One could say that the total amount of information that is present in a data sequence is
xed; when this information has to be divided over a larger set of estimated parameters
the information-per-parameter is reduced leading to a larger parameter variance.
When is a model expensive?
Considering the price of a model one can distinguish two dierent phenomena.
In the trade-o that has to be made between quality and price, several aspects have been
treated in the foregoing chapters play a role here, the two most important ones of which
are:
The ability to model the input/output transfer function G(q, ) independently of the
modelling of the noise contribution through H(q, ).
Ljung (1987) distinguishes four dierent sources of information, c.q. types of considerations,
when discussing the problem of model set selection.
Chapter 8 219
A priori considerations.
Based on physical insight in the process that is going to be modelled, there might be clear
information on the (minimal) model order that may be required for modelling the system
accurately. When on physical grounds something can be said about the character of the
noise disturbance on the data, one may also be able to nd arguments for choosing a
particular model structure. For instance, when there is reason to believe that the noise
disturbance on the measured data actually enters the process at a location such that it
contains the process dynamics in its colouring, there is a good argument for choosing an
ARMAX model structure.
Additionally before any data is processed in an identication algorithm one can make a
statement concerning the relation between the number of parameters to be estimated (N )
and the number of data points (N ) that is available. It it apparent that when estimating
50 parameters on the basis of 50 data points, the criterion function will take a very small
value, but the estimated parameters will be very unreliable. In this situation no data
reduction has been achieved. Generally there has to hold that N >> N . A more specic
rule-of-thumb that is often used is
N > 10N (8.1)
but this relation has to be used with care, as the resulting variance is of course dependent
on the signal-to-noise ratio in the measured data sequences.
b0 + b1 q 1 + + bn q n
y(t) = u(t) = Tn (t)0 , (8.2)
1 + a1 q 1 + + an q n
with
Then R(n) := E(t)T (t) is a Toeplitz-structured matrix that is nonsingular provided that
the input signal is suciently exciting; see also the analysis in section 5.6. However when
considering R(n+1) it follows directly from (8.2) that this latter matrix will be singular, as
one element is added to (t) that is linearly dependent on the other elements.
So in general terms we can formulate an order test, by evaluating the rank of R(i) for
increasing values of i, and once the matrix becomes (almost) singular (say for i = j) this
indicates that the system order should be j 1.
220 Version 03 December 2005
In practice the Toeplitz matrix will be composed of elements of the sample correlation
functions, i.e.
1
N
R(n) = n (t)Tn (t). (8.4)
N t=1
min R(n)
detR(n) or
max R(n)
where min and max denote respectively the minimum and maximum singular value of the
corresponding matrix.
It has to be stressed that this model order test is principally based on the availability of
noise free data, which of course is rather unpractical. When there is noise present on the
data an exact rank drop will hardly occur, but when the noise contribution is only small one
may expect that the test is still valid and the Toeplitz matrix becomes almost singular
in the case of overestimating the model order. In the situation of noise disturbed data the
model order test becomes a test that has close connections to the ARX model structure.
Model order estimates that are obtained in this way can therefore be quite dierent from
model order estimates that are obtained with other model structures.
for the dierent parameters N that are estimated for several choices of model orders. In
gure 8.1(left) this is shown for an ARX structure, where on the X-axis the several ARX
model sets are sketched indicated by their number of parameters. For a given number of
parameters (na +nb +1), several dierent model sets are possible, dependent on the separate
values of na and nb . The minimal value of the loss function VN (N , Z N ) is plotted for all
the dierent model sets.
The reasoning here is that one may expect a substantially decreasing value of VN (N , Z N )
until the correct model set is reached. By choosing a model set that is too large (too
high polynomial orders), the reduction in VN (N , Z N ) will only be moderate. As a result
one is looking for the knee in the characteristic plot.
It has to be noted here that in this type of plot the value of VN (N , Z N ) will always decrease
with increasing number of parameters, simply because of the fact that within a larger model
set one necessarily nds a lower minimum of the criterion function. When a model orders
are chosen that are too large (related to the real system), the additional freedom in the
model will be used to tune the model to the specic realization of the noise disturbance.
This mechanism is called overt.
As the indicated plot of VN (N , Z N ) may incorporate this mechanism of overt, an alter-
native can be used, based on a separation of the data into two dierent sets: one part of
the data that is used for identication of N , and another part of the data for calculation
Chapter 8 221
ZN = Z (1) Z (2)
(1)
N = arg min VN (, Z (1) )
(2)
1
N
(1) (1)
VN (N , Z (2) ) = (t, N )2 .
N (2) t=1
(1)
When in this case the model order in N has been chosen too high, this will result in an
(1)
increase of the VN (N , Z (2) ). The t that has been made on the specic noise realization
that was present in the rst part of the data, will now lead to an increase of the criterion
function when evaluated over the second part of the data. This mechanism which is referred
to as cross-validation, is sketched in gure 8.1(b) where a slight increase of the function
value can be observed for N > 5.
Model order selection estimation data set Model order selection validation data set
11 11
10.5 10.5
10 10
Criterion function or loss function
9.5 9.5
9 9
8.5 8.5
8 8
7.5 7.5
7 7
6.5 6.5
6 6
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
Number of parameters Number of parameters
Figure 8.1: Model order test for an ARX model structure; system has an ARX structure
with na = 2, nb = 3. Left plot: criterion function evaluated on estimation data; right plot:
criterion function evaluated on validation data for models based on estimation data.
The principle that is used in this test, is that, when determining the most suitable set
of models, actually a penalty should be added to the criterion function for increasing
complexity (higher order) of the model sets, in order to cope with this risk of overtting.
Formally spoken, we would like to evaluate EVN (N , Z N ) rather than VN (N , Z N ). Analysis
of this problem has led to several so-called information criteria suited for model order
selection, the most important ones of which are
Akaikes Information Criterion (AIC).
This criterion states that the model set should be chosen that achieves the minimum value
for the expression:
1 N
logVN (N , Z N ) + (8.6)
2 N
with N the number of parameters in the model set.
This criterion is based on maximum likelihood-considerations and assuming a Gaussian pdf
of the noise disturbance.
Akaikes Final Prediction Error Criterion (FPE).
222 Version 03 December 2005
This criterion proposes to choose the model set that achieves a minimum value for the
expression:
1 + N /N
VN (N , Z N ), (8.7)
1 N /N
which is an estimate of the prediction error variance that is obtained when the identied
model is applied to another data set than the one that is used for the identication.
Model validation
The ultimate method of verifying whether model structure and model orders have been
chosen appropriately, is by validating the identied model. If in the end the identied
model is considered to be acceptable, then apparently an appropriate choice of model set
was made. This is a clear a posteriori approach. If the identied model is not acceptable,
then a dierent choice of either the model structure and/or the model orders has to be
made. This approach, which already was indicated when discussing the basic identication
scheme in chapter 1, has some avour of trial and error. Since there is no universally
applicable algorithm for model set selection, this eect is -to some extent- unavoidable.
Further methods for model validation are discussed in the next subsection.
Model reduction. This can be an appropriate tool for verifying whether an esti-
mated transfer function (either G or H) can be accurately approximated by a lower
order representation. When (almost) pole/zero cancellations occur in a pole/zero plot
this is an indication of a model order that has been chosen too high.
So, in a situation that there is a considerable amount of noise present in the data,
esim (t) does not reveal very much information concerning G(q, N ).
Note that the simulation error can not be made smaller by adding an (articial) noise
term to the model output. This is due to the fact that the actual realization of v(t) is
unknown. As a result a simulation test can only provide information on the accuracy
of the input/output transfer function G(q, N ), and not of H(q, N ).
In this simulation test, there is also a risk that an overt of the model G(q, N )
leads to a small simulation error, while this does not have to imply that the model
is accurate. Also here a cross-validation is advisable so that the simulation test is
performed on a data interval that has not been used for identication.
Residual tests. The residual signal (t, N ) can exhibit important information on
the validation/invalidation of the identied model.
In a situation of a consistent model estimate, the residual (prediction error) (t, N )
asymptotically becomes a white noise signal. Besides in the situation G0 G, a
consistent estimation of the i/o transfer function G(z, N ) will imply that the residual
(t, N ) asymptotically becomes uncorrelated with past input samples. Corresponding
to these two situations we can formulate the following two model assumptions or, in
statistical terms, the null hypotheses:
We will briey discuss two tests on auto/cross-correlation functions that reect the
two hypotheses mentioned above.
RN ( )
We may expect that is small for 0 and large N , provided that (t) is a
RN (0)
realization of a white noise process. The question is, how small should we expect this
quotient to be?
Dene for some m 1:
RN (1)
..
rm = . (8.11)
N
R (m)
As a result of the central limit theorem it can be shown that N rm As N (0, e4 I),
which implies that all components of rm are asymptotically independent. As a result,
RN ( )
N As N (0, 1) (8.12)
RN (0)
with N the -level of the N (0, 1)-distribution, e.g. N0.95 = 1.96 (95% reliability
interval). The null hypothesis can be accepted if
|RN ( )|/RN (0) N / N (8.14)
with the standard notation R (k) = E(t)(t k), Ru (k) = Eu(t)u(t k). Using
similar notation as before, we can check whether
|Ru
N
( )| P/N N
Chapter 8 225
0.5
0 5 10 15 20 25
lag
Cross corr. function between input 1 and residuals from output 1
0.2
0.1
0.1
0.2
25 20 15 10 5 0 5 10 15 20 25
lag
In the Matlab command RESID, both tests (8.12) and (8.16) have been implemented
using a 3-level of probability, corresponding to N0.99 . An example is shown in
gure 8.2. The residual is taken from a third order ARX model that is estimated on
the basis of 600 data points that were taken from a third order system that also had
an ARX structure.
The test on the cross-correlation function allows a check on the accuracy of the plant
model G(q, N ). If this test is passed, then the test on the auto-correlation function
can be used to validate the noise model H(q, N ). Note that the latter test is more
severe as it requires both G0 and H0 to be modelled accurately.
Correlation between (t) and u(t ) for small values of can indicate that the
presumed time-delay in the model is chosen too high.
Correlation between (t) and u(t ) for negative values of (the residual signal is
correlated with future values of u) can indicate the presence of a feedback loop in the
data. However in the situation that the input signal has a spectrum which is not at
(signal is non-white), it can also point to inaccurate modelling of the rst elements
of the pulse response of G0 (see Problem 8.1). This latter situation may occur if the
time-delay in the model is chosen too high.
As may be clear from the list of tools given above, there is not one single (and optimal)
226 Version 03 December 2005
Example 8.3.1 In order to illustrate the role of the correlation tests, we consider a data
generating system S, that we excite with a step signal. The input signal together with the
observed output signal are depicted in Figure 8.3.
step response: u(t) (red) and y(t) (blue)
80
70
60
50
40
30
20
10
10
0 20 40 60 80 100 120 140 160 180 200
t
From the observed response we carefully conclude that the system G0 has a limited order
and a time delay of around nk = 3.
Next a data set is generated on the basis of N = 5000 data points, using a random white
noise as input signal. As a rst model structure we evaluate an Output Error model
structure with nb = nf = 2, nk = 3.
For this estimated Output Error model, the results of the correlation tests are given in
Figure 8.4(left).
Since both tests peak out of there condence bounds, both models G(N ) and H(N ) are
invalidated.
Next the order of the Output Error model set is increased to 3, according to nb = nf = 3;
nk = 3. The results of the corresponding correlation tests are given in Figure 8.4(right).
It now appears that G(N ) is validated, but H(N ) is not.
In the next step a noise model is introduced, through the choice of a Box Jenkins Model
structure, with nb = nf = 3; nc = nd = 3; nk = 3. The corresponding results in Figure
8.5(left) show that the order of the noise model is still not large enough. Increasing the
order of the noise model to 4, leading to BJ with nb = nf = 3; nc = nd = 4; nk = 3, leads to
an identied model for which both G(N ) and H(N ) are validated (see Figure 8.5(right)).
Chapter 8 227
0.5 0.5
0 0
0.5 0.5
1 1
0 5 10 15 20 25 0 5 10 15 20 25
lag lag
Cross corr. function between input u1 and residuals from output y1 Cross corr. function between input u1 and residuals from output y1
0.3 0.04
0.2
0.02
0.1
0
0
0.02
0.1
0.2 0.04
25 20 15 10 5 0 5 10 15 20 25 25 20 15 10 5 0 5 10 15 20 25
lag lag
Figure 8.4: Auto- (upper gure) and cross-correlation (lower gure) tests for OE-model [2
2 3] (left) and OE-model [3 3 3] (right).
Correlation function of residuals. Output y1 Correlation function of residuals. Output y1
1 1.2
0.8 1
0.6 0.8
0.4 0.6
0.2 0.4
0 0.2
0.2 0
0.4 0.2
0 5 10 15 20 25 0 5 10 15 20 25
lag lag
Cross corr. function between input u1 and residuals from output y1 Cross corr. function between input u1 and residuals from output y1
0.04 0.04
0.02 0.02
0 0
0.02 0.02
0.04 0.04
25 20 15 10 5 0 5 10 15 20 25 25 20 15 10 5 0 5 10 15 20 25
lag lag
Figure 8.5: Auto- (upper gure) and cross-correlation (lower gure) tests for BJ-model [3
3 3 3 3] (left) and OE-model [3 4 4 3 3] (right).
where a cross-validation is used for validation purposes. However be sure that the order is
chosen suciently high (in particular in the ARX case) such that the model for G0 is at
least accurate. This can be checked by inspection of the cross-correlation function Ru ( ),
preferably applied to a validation data set. A correct model for G0 may require a model
order that seems much too high, but this is often due to the fact that in an ARX model
the high order is needed to provide an accurate estimate of H0 . This step is particularly
directed towards achieving an estimate of G0 .
Step 4
Evalute the pole-zero plot of the estimate of G0 from the previous step, and determine the
order of the model. Use this order to identify an Output Error model, and validate this
model with the appropriate tests.
Step 5
If a validated model for G0 has been obtained, extend the model structure with a noise
model, e.g. through a Box Jenkins model structure, and include an appropraite model for
H0 . Validate the end result, e.g. by inspection of the auto-correlation test on the residual.
Model evaluation
[y] = idsim (ue,IDMODEL)
[e] = pe (IDMODEL,DATA)
[e] = resid (IDMODEL,DATA)
[yh] = compare (DATA,IDMODEL,M)
[yp] = predict (IDMODEL,DATA,M)
idsim simulates model output of estimated models; pe allows to calculate the prediction
error (residual) signal; resid performs the residual autocorrelation and cross-correlation
tests; compare allows to compare predicted and simulated outputs with the measured data;
predict generates the M-step forward prediction of the output.
Appendix 229
Appendix
Problem 8.1
Show that for a non-white input signal u, the occurrence of a nonzero cross-correlation
function Ru ( ) for negative values of can be caused by a too high chosen time-delay in
the model.
Solution
Consider the residual signal
3 4
(t, ) = H 1 (q) [G0 (q) G(q)]u(t) + v(t) (8A.1)
with Hu the stable causal spectral factor of u (), and eu having a at spectrum. Then
with
while
Ru ( ) = e2u f1 (k)f2 (k)
k=
9.1 Introduction
9.1.1 Closed loop conguration and problem setting
Many systems operate under feedback control. This can be due to required safety of op-
eration or to unstable behaviour of the plant, as occurs in many industrial production
processes like paper production, glass production, separation processes like crystallization,
etcetera. But also mechanical servo systems like robots, positioning systems as wafer step-
pers (for the production of integrated circuits) and the servo system present in a compact
disc player, are examples of processes that typically exhibit unstable dynamical behaviour
in open loop. As a consequence experiments can only be performed under presence of a
stabilizing controller. Even in situations where plants are stable, production restrictions
can be strong reasons for not allowing experiments in open loop.
Many processes in non-technical areas as for example biological and economical systems
operate only under closed-loop conditions and it is not even possible to remove the feedback
loop.
There can be additional reasons for performing experiments in closed loop. Suppose that
the plant under consideration is operating under control of a given controller, and that the
objective of the identication is to design a better performing controller for the plant. Then
the plant dynamics that exhibit themselves under presence of the old controller, might be
much more relevant for designing an improved controller, than the open loop dynamics.
It is very important to know if and how the open loop system can be identied when it
must operate under feedback control during the experiment. It will be shown that the
feedback can cause diculties but also that in some specic situations these diculties
may be circumvented.
The experimental situation to be discussed in this chapter is depicted in Fig. 9.1. The data
generating system is assumed to be given by the relations
231
232 Version 8 January 2011
e
?
H0
r1 +d u -
- +?
- d
v - y
G0 +
+
6
d?
C r2
+
In contrast to the situation dealt with in open loop identication, the input {u(t)} and noise
{e(t)} are not uncorrelated anymore, due to the presence of the feedback controller C(q).
In (9.2) the signal r1 can be a reference value, a setpoint or a noise disturbance on the
regulator output. Similarly, signal r2 can be a setpoint or a measurement noise on the
output signal. In situations that we are not interested in the separate eects of r1 and r2
we will deal with the signal r dened by:
In most situations the goal of identication of the system above is the determination of an
estimate of the transfer function G0 (z) and possibly H0 (z). Sometimes one may also wish
to determine the controller C(z) in the feedback path.
Referring to the system relations (9.1)-(9.4), the closed loop data generating system can be
shown to be characterized by:
In some parts of this chapter we will adopt an appropriate matrix notation of the transfer
functions that allows a correct interpretation of the results also in the case of multivariable
systems. To this end we distinguish between the input sensitivity and output sensitivity
function, denoted as
Using this notation and employing the fact that W0 G0 = G0 S0 and CW0 = S0 C, we can
rewrite the systems equations into the form:
However, when it is not explicitly stated otherwise, we will consider the scalar situation,
for which holds that W0 = S0 .
Dealing with a system conguration as sketched in gure 9.1 we will assume that the
closed loop system is internally stable, which is guaranteed by stability of the four transfer
functions in (9.9)- (9.10), as formalized in the next denition.
Note that in our conguration the input-output transfer function G0 does not necessarily
have to be stable. However many identication methods discussed will be able to provide
only stable estimates G(q, N ), due to the requirement of a uniformly stable model set
(see denition 5.4.2). We will assume that G0 is stable unless this is specically discussed
in the text. Additionally it will be assumed that r is quasi-stationary. The considered
experimental situation then is in accordance with Assumption 5.7.1 concerning the data
generating mechanism.
Hence
Assuming that the spectral densities u () and yu () can be estimated exactly, which
should be true at least asymptotically as the number of data points tends to innity, it is
found that the spectral analysis estimate of G0 (ei ) is given by:
The nonparametric model used by the spectral analysis method by its very denition
has no structural restrictions (e.g. not even a restriction of causality). Hence it cannot
eliminate certain true but uninteresting relationships between {u(t)} and {y(t)} (such
as the inverse feedback law).
The situation should be dierent if a parametric model is used.
However in the next example we will rst illustrate that direct application of parametric
identication techniques also can lead to undesirable results.
which is the prediction error when we apply data that are obtained from closed loop experi-
ments. From this we conclude that all models (a, b) subject to
a = a + f
b = b +
with an arbitrary scalar (i.e. all models having a bf equal and xed) generate the
same prediction error (9.21) under the feedback (9.19). Consequently there is no way to
distinguish between these models as they induce the same prediction errors. Notice that it
is of no help to know the regulator coecient f . Apparently the experimental condition
(9.19) is not informative enough with respect to the model set (9.20). It is true, though,
that the input signal {u(t)} is persistently exciting of sucient order, since it consists of
ltered white noise. Persistence of excitation is thus not a sucient condition on the input
in closed-loop experiments.
If the model set (9.20) is restricted, for example by constraining b to be 1:
then it is clear that the data are suciently informative to distinguish between dierent
values of the a-parameter. 2
It has been illustrated that there do exist specic closed loop identication problems, both
with respect to the consistency of the results and with respect to possible lack of uniqueness
of estimates due to the experimental situation.
(a) Consistency of (G, H). This is a basic requirement. Whenever our model set
is rich enough to contain the data generating system (S M), the identication
method should be able to consistently identify the plant, under additional conditions
on excitation properties of the plant signals.
In the sequel of this chapter these problems will be referred to as Problems (a)-(c). By far
most results on closed loop identication deal with the consistency problem (a). Standard
references are Gustavsson, Ljung and Soderstrom (1977) and Anderson and Gevers (1982).
For an overview of these results see also Soderstrom and Stoica (1989). Approximate iden-
tication under closed loop circumstances is an area that only recently is given attention
in the literature.
Additional to the properties mentioned in the denition above, attention will be given to
the following issues:
(d) Fixed model order. The ability of identication methods to consider model sets
G of models with a xed and prespecied model order. This property is important
when the application of the identied model, e.g. in model-based control design, puts
limitations on the acceptable complexity of the model.
(e) Unstable plants. The ability to (consistently) identify unstable plants.
(f) Stabilized model (G(q, ), C). This refers to the situation that there is an a priori
guarantee that the (asymptotically) identied model G(q, ) is guaranteed to be
stabilized by the present controller C. This property might be relevant when the
identied model is going to be used as a basis for redesigning the controller.
(g) Knowledge of controller C. This concerns the question whether exact knowledge
of the controller is required by the considered identication method.
When starting to discuss the closed loop identication problems, we have to formalize what
is the plant data and a priori information that is referred to in the problem denition.
What do we consider to be available as plant data? Several situations have to be distin-
guished, and dierent methods to tackle the problems take dierent plant data as a starting
point. Information on several levels and of several types can be available.
measurements of r1 , r2 ;
knowledge of C(q).
Most identication methods discussed in this chapter assume that measured values of y, u
are available. However, on the other items the required information becomes dependent
on the method used. When discussing each method, we will make a clear reference to the
required plant data and a priori knowledge.
Apart from the regular assumptions on data and data generating system as also used in
Chapter 5 we will regularly need one additional condition:
Chapter 9 237
Assumption 9.1.4 The data generating system S and the model set M satisfy the condi-
tion that G0 (z) and G(z, ) are strictly proper for all , i.e. G0 () = G(, ) = 0 for
all .
This assumption is weak and is introduced to avoid algebraic loops in the closed loop sys-
tem. An algebraic loop occurs if neither G0 (z) nor C(z) contains a delay. Then y(t) depends
momentarily on u(t), which in turn depends momentarily on y(t). For some identication
methods, as discussed later on, such a situation would make a consistent estimation impos-
sible. To avoid this situation it is assumed that the system G0 has at least one delay so
that y(t) depends only on past input values. For multivariable situations this restriction
can be further relaxed, as discussed in Van den Hof et al. (1992).
with (t) the regression vector (vector with explanatory variables), given by
by
" #1 " #
1
N N
IV 1
N = (t)T (t) (t)y(t) (9.25)
N t=1 N t=1
or equivalently
B0 (q 1 ) 1
y(t) = 1
u(t) + w(t) (9.27)
A0 (q ) A0 (q 1 )
with B0 (q 1 )/A0 (q 1 ) = G0 (q), and {w(t)} any stationary stochastic process with rational
spectral density, it follows that
" #1 " #
1
N N
IV 1
N = 0 + (t)T (t) (t)w(t) . (9.28)
N N
t=1 t=1
IV = , under the
The IV estimator provides a consistent parameter estimate, plimN N 0
following two conditions:
Note that in this analysis the question whether experiments have been taken from the plant
operating under either open loop or closed-loop experimental conditions, has not been raised
yet. This aspect comes into the picture when analyzing the two conditions (9.29), (9.30).
In a closed loop situation, in general every element of the regression vector (5.79) will
be correlated with the noise disturbance w(t), and consequently a choice (t) = (t) will
denitely not satisfy condition (9.30). However when there is an external signal r available
that is uncorrelated with the noise w, this gives the possibility to satisfy both conditions
(9.29), (9.30).
Take for instance,
(t) = [r(t) r(t 1) r(t d + 1)]T (9.31)
it can simply be veried that (t) is not correlated with the noise signals in the loop that
originate from w, and moreover that it indeed is correlated with the input and output
samples in the regression vector (t), as required by (9.29).
This brings us to the following formal result:
Proposition 9.2.1 Let S be a data generating system and let Z be a data sequence
corresponding to Assumptions 5.7.1, 5.7.2, with r and e uncorrelated. Let M be an ARX
model set that is uniformly stable with parameter set IRd such that S M. Let {} be
an instrumental variable determined by (9.31), Then under weak conditions, as e.g. (9.29),
the instrumental variable estimate (5.81) is consistent.
Chapter 9 239
IV method
Consistency (G(q, N ), H(q, N )) +
Consistency G(q, N ) +
Tunable bias
Fixed model order +
Unstable plants +
(G(q, N ), C) stable
C assumed known no
Note that validity of the mentioned condition (9.29) will depend on the data generating
system, and the experimental condition. It will be necessary that the instrumental variable
signal r has - some unspecied - sucient excitation properties.
In the discussion so far we have not required whiteness of the noise contribution w. This
means that we can hold a similar reasoning as above, based on a data generating system
similar as (5.84) but with w chosen as:
H0 (q)
w(t) = e(t) (9.32)
A0 (q)
Proposition 9.2.2 Consider the situation of Proposition 9.2.1 with an ARX model set
satisfying G0 G (and not necessarily S M). Then - under weak conditions - the
IV ), i.e. for all
same instrumental variable estimate provides a consistent estimate G(q, N
, G(e , N ) G0 (e ) w.p. 1 as N .
i IV i
With this IV-method, problems (a) and (b) as mentioned in section 9.1.1 can be solved
under fairly general conditions. Since the IV identication method does not have the form
of a criterion optimization (as e.g. a least squares method), an approximation criterion
for the case G0 G, as is the topic in problem (c), can not be made as explicit as in the
situation of open loop prediction error methods.
The several properties of the IV identication method in view of the aspects as mentioned
in the beginning of the chapter are summarized in Table 9.1.
For IV estimates the model set that is considered has an ARX structure. Since this structure
provides a uniformly stable model set, in terms of denition 5.4.2, even in situations that
the roots of the polynomial A(q, ) are outside the unit circle, the IV identication method
has no problem with a consistent identication of unstable plants. This of course under the
condition that experiments are performed under closed-loop conditions with a stabilizing
controller.
IV methods for closed-loop system identication are further discussed in Soderstrom et al.
(1987) and Gilson and Van den Hof (2003).
240 Version 8 January 2011
with
S0 (q)
Tr (q, ) = [G0 (q) G(q, )] (9.35)
H(q, )
H0 (q) S0 (q)
and Te (q, ) = (9.36)
H(q, ) S(q, )
where S(q, ) is the sensitivity function of the model, i.e. S(q, ) = [1 + C(q)G(q, )]1 .
When G0 (q) and G(q, ) are strictly proper as assumed in Assumption 9.1.4, and H0 (q) and
H(q, ) are proper and monic, it can simply be veried that Te (q, ) is proper and monic
for all . Consequently this transfer function can be written as:
sp
Te (q, ) = 1 + Te (q, ) (9.37)
sp
with Te (q, ) being the strictly proper part of Te (q, ).
In the direct identication method, the estimated model is obtained by
while according to the convergence result given in Chapter 5 the parameter estimate con-
verges with probability 1 to its asymptotic value determined by
V () = E2 (t, ) (9.40)
Chapter 9 241
Considering this criterion in view of (9.34) and (9.37), and taking account of the fact that
{e(t)} is a white noise process that is uncorrelated with {r(t)}, we can directly achieve the
following lower bound on V () :
Proposition 9.3.1 Let S be a data generating system and let Z be a data sequence
corresponding to Assumptions 5.7.1, 5.7.2, with {r(t)} being uncorrelated to {e(t)}. Let
M be a uniformly stable model set with parameter set IRd such that S M, and let
Assumption 9.1.4 be satised for S and M.
If {r(t)} is persistently exciting of a suciently high order, then arg min V () if
and only if
Tr (ei , ) = 0, and
(9.42)
Te (ei , ) = 1 for all IR
The proposition species the convergence result from Chapter 5 for the closed loop situation.
It follows directly from (9.41), taking into account that, since r and e are uncorrelated, the
contribution of the two terms in (9.34) simply add up in the identication criterion. Since
G(q, ) = G0 (q), H(q, ) = H0 (q) indeed satisfy the conditions (9.42) it follows that the
lower bound in (9.41) indeed can be reached within M.
In the proposition the asymptotically identied models are completely characterized. The
question whether and in which situation we can arrive at consistent estimates G(ei , ),
H(ei , ) now can be reformulated into the question whether (9.42) implies that G(ei , ) =
G0 (ei ) and H(ei , ) = H0 (ei ) for all .
We will analyse this situation in two dierent experimental situations, to be briey treated
in the following two subsections.
As a result, a unique and consistent model is obtained despite the presence of feedback.
Note also that we do not have to be able to measure the external signal {r(t)}. It is sucient
that this external (and persistently exciting) signal is present.
242 Version 8 January 2011
and consistency is obtained if and only if this equation implies that G(ei , ) = G0 (ei )
and H(ei , ) = H0 (ei ) for all .
Using (9.36) we can rewrite (9.45) as
for all . However without any additional conditions, this relation is not sucient to
conclude uniqueness of the solution. It depends on the system, the model set, and the
controller whether or not this equation implies that G(ei , ) = G0 (ei ) and H(ei , ) =
H0 (ei ).
As an illustration of the problem, consider the following heuristic reasoning. If the controller
C(q) has low order, then the transfer function H(q, )1 (1 + G(q, )C(q)) may also be
of low order and the identity above may then give too few restrictions to determine the
estimated model uniquely.
On the other hand, if C(q) has a suciently high order, then the identity above will lead
to sucient equations to determine the estimated model uniquely.
As an example we will illustrate this phenomenon for the case of an ARX model set.
Example 9.3.3 (Closed loop ARX identication without external signal.) Consider
a data generating system determined by
B0 (q 1 ) 1
G0 (q) = , H0 (q) = (9.47)
A0 (q 1 ) A0 (q 1 )
nm . If these restrictions are not satised then (9.50), (9.51) refer to estimated parameter
values that are outside the permitted parameter range, since the degrees of A(q 1 , ) and
B(q 1 , ) are xed to nm . If nc < nm then indeed there exists a whole set of solutions to
(9.50), (9.51). If nc > nm it follows that D(q 1 ) = 0 and (9.50), (9.51) will generate a
unique solution. If nc = nm any zeroth order polynomial D(q 1 ) (which means D(q 1 ) =
d IR) could formally be applied, but since the polynomials A(q 1 , ) and A0 (q 1 ) both
have to be monic, the only possible choice is d = 0, which also shows that a unique solution
is obtained.
However note that the required order of the controller will generally depend on the data
generating system.
As a direct generalization of Proposition 5.9.1 the following consistency result can now be
formulated:
Proposition 9.3.6 Let S be a data generating system and let Z be a data sequence
corresponding to Assumptions 5.7.1, 5.7.2. Let M be a uniformly stable model set with
parameter set IRd such that S M, and let Assumption 9.1.4 be satised for S and
M.
If Z is informative enough with respect to M then
G(ei , ) = G0 (ei )
for . (9.53)
H(ei , ) = H0 (ei )
The excitation condition formulated here is less easy to interpret and quantify, and therefore
the excitation conditions as formulated in the previous two sections are more convenient in
practice.
Note that if we do not apply any order restrictions to the model set M, then the condition
under which a data sequence Z will be informative enough (for all M) can quite simply be
derived from the frequency domain equivalent of (9.52). By rewriting this latter equation
as
1
W (ei )z ()W T (ei )d = 0
2
with W (q) = W1 (q) W2 (q), and
!
u () uy ()
z () =
yu () y ()
it follows that W (ei ) is implied by the (sucient) condition that z () is positive denite
for almost all .
Note that an informative data sequence can also be generated by a closed-loop system with
a nonlinear or a time-varying controller. This will allow the direct identication method to
identify consistent estimates of the plant model. The situation of applying a sequence of
dierent controllers during the identication experiment is further considered in Soderstrom
et al. (1976).
where S(ei , ) is the model sensitivity function, i.e. S(ei , ) = (1 + C(ei )G(ei , ))1 .
Note that in the integrand of this expression either of the two terms contain the parametrized
transfer functions G(q, ) and H(q, ).
If we would be in a situation that G0 G and S M, it can not be concluded that con-
sistency of G would result, not even if G(q, ) and H(q, ) are parametrized independently.
This result that is valid in the open loop case can not be transferred to the closed loop
situation. It can be veried by considering the situation of an Output Error (OE) model
set. In this case H(q, ) = 1. The choice G(q, ) = G0 (q) in (9.54) would make the rst
term in the integrand zero, however the second term also is dependent on G(q, ) (through
S(q, )) and will not necessarily be minimal for G(q, ) = G0 (q, ). Thus in general a
solution G(q, ) = G0 (q) will result, even if G0 G.
The expression (9.54) characterizes the asymptotic model estimate, however this charac-
terization is rather implicit. As shown above, it can be deduced that a plant model will
Chapter 9 245
generally be biased whenever the noise model is not estimated consistently; the distribution
of this bias over frequency can not be assessed from this expression. In order to provide a
more explicit expression for the asymptotic bias, the following analysis is performed (Ljung,
1993):
By denoting
with
1
1 (t, ) = [G(q, )u(t) + H(q, )e(t)].
H(q, )
If either G0 and G(q, ), or C are strictly proper, then E(1 (t, )e(t)) = 0, and so
(, ) = 1 (, ) + 0 .
1
1 (t, ) = [G[S0 r(t) CS0 H0 e(t)] + He(t)]
H
we can write its spectrum as:
1 1 2 2 2 2
2
1 = |G| u + |H| 2Re(GCS 0 H 0 H)
|H|2 e e
5
u 2 |H|2 e2 Re(GCS0 H0 H)e2
= |G| + 2
|H|2 u u
2 2
5
u |H| e
= 2
|G B|2 + |B|2
|H| u
If we restrict attention to the situation that both G0 and G() are causal and stable, and
so G will be causal and stable, then only the causal and stable part of B will contribute
in the minimization of the integral expression. In this situation, we can write:
B = M+ He2
246 Version 8 January 2011
Direct method
Consistency (G(q, N ), H(q, N )) +
Consistency G(q, N )
Tunable bias
Fixed model order +
Unstable plants 2
(G(q, N ), C) stable
C assumed known no
The closed-loop expressions show that only the noisefree part ur of the input signal con-
tributes to variance reduction of the estimates. This implies that whenever the input signal
of the process is limited in power, then only part of the input signal can be used to reduce
the variance of a parameter estimate. If there is no input power limitation, one can of
course always compensate for this loss of variance-reducing signal by adding a reference
signal with an increased power.
The given variance expressions are restricted to the situation that S M and that both
G() and H() are identied; they do not hold true for the situation G0 G, S / M.
Remark 9.3.7 The situation of estimating a plant model in the situation G0 G and
having a xed and correct noise model H = H0 is considered in Ljung (1993). Using the
fact that
2
cov N = e [E(t) T (t)]1 (9.64)
N
where (t) is the negative gradient of the prediction error (9.33), this leads to
n v
cov(G) (9.65)
N u
as it is immaterial whether the input spectrum is a result of open loop or closed loop oper-
ation. Note that this expression gives a smaller variance than the situation in which both
G and H are estimated, and that in this (unrealistic) case the total input power contributes
to a reduction of the estimate variance.
Concerning the property of handling unstable plants, the limitation here is that model sets
are restricted to be uniformly stable, as specied in denition 5.4.2. Because of their specic
structure, only ARX and ARMAX model sets can provide uniformly stable predictors while
the models themselves are unstable. In order to verify this, note that for an ARMAX model
set, the predictor lter is given by:
This predictor lter (and its derivative) are uniformly stable for parametrizations where the
roots of C(z, ) are restricted to the interior of the unit circle, i.e. |z| 1; no restriction
has to be laid upon the roots of A(z, ).
As a result consistent identication of unstable input/output plants G0 is possible provided
that one is dealing with both a data generating system and a parametrized model set in
one and the same ARX or ARMAX form. This conditioned result is indicated in Table 9.2
by the symbol 2.
Note that the overview of properties refers to the general direct method only. For the
specic approach based on the tailor-made parametrization, as presented in section 9.6
some of the properties are dierent.
1. Identify the closed loop system using r as input and y as output, using a standard
(open loop) model set;
2. Determine the open loop system parameters from the closed loop model obtained in
step 1, using knowledge of the controller.
The rst step of the procedure is a standard open-loop type of identication problem, as
the reference signal r is assumed to be uncorrelated with the disturbance signal e.
Employing (9.5) for the data generating system, it follows that the system that is subject
of identication in the rst step is given by:
G(q, )
= Gc (q, N ) (9.70)
1 + G(q, )C(q)
H(q, )
= Hc (q, N ) (9.71)
1 + G(q, )C(q)
with respect to the parameter vector which parametrizes the open loop plant. The
estimated model from (9.70),(9.71) can be written as
Gc (q, N )
G(q, ) = (9.72)
1 C(q)Gc (q, N )
Hc (q, N )
H(g, ) = (9.73)
1 C(q)Gc (q, N )
Indirect method
Consistency (G(q, N ), H(q, N )) +
Consistency G(q, N ) +
Tunable bias +
Fixed model order
Unstable plants +
(G(q, N ), C) stable 2
C assumed known yes
Concerning the asymptotic bias expression, the rst step of the indirect identication de-
livers an asymptotic criterion:
2
1 G0 (ei ) |L(ei )|2 r ()
= arg min G (ei
, )
2 1 + C(ei )G0 (ei )
c |Hc (ei , )|2 d (9.74)
with K0 (z) a rational transfer function, with K0 (z) and K0 (z)1 monic, proper and stable,
and {w(t)} a sequence of independent, identically distributed, zero mean random variables
(white noise).
In the joint i/o approach, as discussed e.g. in Ng et al. (1977), Anderson and Gevers
(1982) and Soderstrom and Stoica (1989), the combined signal (y(t) u(t))T is considered as
a multivariable time-series, having a rational spectral density, and as a result there has to
exist a (innovations) representation, satisfying
! !
u(t) e1 (t)
= 0 (q) (9.77)
y(t) e2 (t)
with (e1 (t) e2 (t))T a white noise process with covariance matrix 0 , and 0 (z) a rational
transfer function matrix with 0 (z) as well as 0 (z)1 monic, proper and stable.
Since y and u are available from measurements, a model for (9.77) can also be estimated, ap-
plying any prediction error identication method. This situation actually refers to the open
loop identication of a model without a measurable input term, i.e. only the identication
of a (multivariable) noise model H(q, ).
Using an appropriate prediction error identication method, as in the open loop case, it is
possible to construct a consistent estimate (z, N ) of 0 (z).
In the second step of the procedure estimates of G0 , H0 , C0 , K0 and 0 are then derived
from (the estimated) 0 , where C0 refers to the controller present in the data generating
system.
This joint i/o method not only estimates the system dynamics G0 and H0 , but also delivers
estimates of the controller C0 , the input-shaping lter K0 and the input noise covariance
0 .
We will consecutively discuss the several steps in this procedure.
we can write ! !
u(t) w1 (t)
= 0 (q) with (9.80)
y(t) e(t)
! !1
S0 K0 C0 H0 S0 1 C0 ()
0 (z) =
G0 S0 K0 H 0 S0 0 1
!
S0 K0 S0 K0 C0 () S0 C0 H0
= (9.81)
S0 G0 K0 S0 H0 + S0 G0 K0 C0 ()
w1 (t) = w(t) C0 ()e(t) (9.82)
The representation (9.81) is an innovations representation of (u(t) y(t))T under the addi-
tional condition that both 0 (z) and 0 (z)1 are stable.
Stability of 0 (z) is assured if S0 , S0 C0 and S0 C0 H0 are stable, which holds true because
of the stability assumptions on the closed loop system.
With the matrix inversion lemma (see e.g. Kailath, 1980), we can construct the inverse of
0 (z) as, ! !
1 C0 () K01 K01 C0
0 (z)1 = (9.84)
0 1 H01 G0 H01
Under additional stability assumptions on H0 (z)1 G0 (z) and K0 (z)1 C0 (z) this inverse is
stable too, and consequently the representation (9.81) is the unique innovations represen-
tation of (u(t) y(t))T .
This result for the joint i/o method shows similar aspects as the indirect identication
method. Once the rst step in the procedure has provided a consistent estimate, it is
possible to obtain a consistent estimate of the data generating system in the second step.
However, note that whenever the rst step has not provided an exact description, in this
case of 0 (z) and 0 , then the second step in the procedure becomes an approximation,
for which it is hard to draw any conclusions on the resulting estimates.
Note that this joint i/o method allows the identication of not only the data generating
system (G0 , H0 ) but also of the controller C0 .
Apparently this joint i/o method is able to deal with problem (a) of the problems formulated
in section 9.1.3. Its properties are summarized in Table 9.4.
with M (q, ) the parametrized transfer function between r and y. However since we know
the structure of this transfer function, we can parametrize M (q, ) in terms of the original
parameters of the plant G(q, ), by writing
G(q, )
M (q, ) = G(q, )S(q, ) = (9.87)
1 + G(q, )C(q)
254 Version 8 January 2011
G(q, )
(t, ) = y(t) r(t) (9.88)
1 + G(q, )C(q)
G0 G G0 (1 + GC) G(1 + G0 C) G0 G
S0 G0 SG = = =
1 + G0 C 1 + GC (1 + G0 C)(1 + GC) (1 + G0 C)(1 + GC)
it follows that
1
= arg min |S0 (ei )[G0 (ei ) G(ei , )]S(ei , )|2 r ()d (9.91)
2
It can be veried that whenever there exists a such that G(q, ) = G0 (q) this choice will
be a minimizing argument of the integral expression above, and moreover that, provided
that r is persistently exciting of sucient order, this solution will be unique. This leads to
the following Proposition.
Proposition 9.6.1 Let S be a data generating system and let Z , consisting of signals
r, u, y be a data sequence corresponding to Assumptions 5.7.1, 5.7.2, with {r(t)} uncorrelated
to {e(t)}. Let M be a uniformly stable model set, parametrized according to (9.88), and
satisfying G0 G.
If {r(t)} is persistently exciting of a suciently high order, then
The proposition shows that this method can deal with problems (b)-(c) as formulated in
the introduction section. However the price that has to be paid is that one has to deal
with a complicated parametrized model structure (9.88), with its resulting computational
burden.
An additional remark has to be made with respect to the requirement that the model
set M should be uniformly stable. By denition 5.4.2 this requires that the considered
parameter set is connected. Now, in the given situation this is not trivially satised.
When evaluating the model structure as given in (9.88) for the situation that G(q, ) is
parametrized as a common quotient of two polynomials with free coecients, the region of
G(q,)
for which the transfer function 1+C(q)G(q,) is stable can be composed of two (or more)
disconnected regions. This implies that whenever a gradient method is started to optimize
the identication criterion, from within one of the (disconnected) regions, one will never be
able to arrive at a possible optimal estimate in one of the other regions.
An example illustrating this eect is given in gure 9.2, where the criterion function VN () is
sketched for the situation of a constant plant G0 = 3.5 controlled by a 7-th order controller,
with white noise signals entering as reference signal and as disturbance. The plant is
parametrized by one scalar parameter: G(q, ) = .
40 27.5 200
V(theta)
35 27 150
30 26.5 100
25 26 50
1 0 1 2 2 3 4 5 5 10 15
theta
Figure 9.2: Criterion function VN () for static model G(q, ) = for closed-loop identica-
tion with a tailor-made parametrization; o = 3.5.
In this case there are three disconnected areas within = IR that correspond to a stable
closed-loop system. The parameter regions: (, 0], [1.27, 2.64] and [4.69, 9.98] lead to a
unstable closed-loop system. If the nonlinear optimization is started in the region shown in
the left gure or the right gure, the resulting parameter estimate will be on the boundary
of stability, and will not correspond to the global minimum of the criterion function. The
latter situation can only occur for the connected region shown in the middle gure.
The problem with possible lack of connectedness of the parameter region, can be avoided
by identifying models that are of suciently high order. It is shown in Van Donkelaar and
Van den Hof (2000), that a sucient condition for connectedness of the parameter set is
that the model order of G(q, ) is at least as big as the order of the controller.
The tailor-made parametrization applied to an instrumental variable identication criterion
is considered in Gilson and Van den Hof (2001).
Tailor-made
Consistency (G(q, N ), H(q, N )) 2
Consistency G(q, N ) 2
Tunable bias +
Fixed model order +
Unstable plants +
(G(q, N ), C) stable +
C assumed known yes
McMillan degree. This method avoids complicated parametrized model sets, as are required
in the direct method discussed in section 9.3.5, and does not need a prior knowledge about
the dynamics of the controller C(q). It it composed of two consecutive identication steps
that can be performed with standard (open loop) methods.
We consider the closed loop experimental situation as discussed before, and similar to (some
of the) previous methods we assume that there is available a measurable and persistently
exciting external signal r that is uncorrelated with the noise e.
The two-stage method is introduced in Van den Hof and Schrama (1993).
The system relation (9.6) in the closed loop case shows that the input signal satises:
Since r and e are uncorrelated signals, and u and r are available from measurements, it
follows that we can identify the sensitivity function S0 in an open loop way.
To this end, consider a model set, determined by
where u (t) is the one step ahead prediction error of u(t), and S and R are parametrized
independently, and the estimate S(q, N ) of S0 (q) is determined by a least squares prediction
error criterion based on u (t).
From the results on open loop identication, we know that we can identify S0 (q) consis-
tently irrespective of the noise term in (9.92). Consistency of S(q, N ) can of course only
be reached when S0 T := {S(q, ) | B}.
In the second step of the procedure, we employ the other system relation (9.5),
Since ur and e are uncorrelated, it follows from (9.95) that when ur would be available from
measurements, G0 could be estimated in an open loop way, using the common open-loop
techniques. In stead of knowing ur , we have an estimate of this signal available through
urN (t) = S(q, N )r(t) (9.97)
Accordingly we rewrite (9.95) as
y(t) = G0 (q)urN (t) + S0 (q)H0 (q)e(t) + G0 (q)[S0 (q) S(q, N )]r(t) (9.98)
The second step in the procedure consists of applying a standard prediction error identi-
cation method, to a model set
y(t) = G(q, )urN (t) + W (q, )y (t) (9.99)
with G(q, ), W (q, ) parametrized independently, IRd , IRd , and a least
squares identication criterion determining
!
1
N
N
= arg min y (t)2 . (9.100)
N , N
t=1
Proposition 9.7.1 Let S be a data generating system and consider two independent data
sequences containing signals r, u, y corresponding to Assumptions 5.7.1, 5.7.2, with {r(t)}
being uncorrelated with {e(t)}.
Consider the two-stage identication procedure presented in this section with least squares
identication criteria and uniformly stable model sets T (9.93) in the rst step and M
(9.99) in the second step, let {r(t)} be persistently exciting of a suciently high order, and
let the two steps of the procedure be applied to the two independent data sequences of the
process.
If S0 T and G0 G then G(ei , N ) G0 (ei ), for all , with probability 1 as N .
Proof: The proof is given in Van den Hof and Schrama (1993). The two independent data
sequences are taken in order to guarantee that the estimated sensitivity function in the rst
step - and thus the simulated noise-free input signal for the second step - is uncorrelated
to the noise disturbance in the second step. 2
258 Version 8 January 2011
e e
? ?
S0 CH0 H0
r - S0 ur-
?
d - ++
- ?
e -y
G0
+ u
Figure 9.3: Block diagram of closed loop system, showing intermediate signal ur
As a result of this Proposition this method is able to deal with problem (b), the consistent
identication of G0 (q), irrespective of the modelling of the noise disturbance on the data.
Note that for the identication of the sensitivity function in the rst step of the procedure,
we can simply apply a (very) high order model. We will only use the estimate S(q, N ) to
simulate urN (t). Of course considerations of variance may restrict the model order to be
used. A block diagram, indicating the recasting of the closed loop problem into two open
loop problems, is sketched in Figure 9.3.
Additionally, this method also appears to be able to treat problem (c), the explicit charac-
terization of the asymptotic model in the case that G0 G. This will be claried now.
The above Proposition can be extended to also include consistency of the noise model
H(q, N ).
In the case that we accept undermodelling in the second step of the procedure, (G0 G),
the bias distribution of the asymptotic model can be characterized. As the second step in
this procedure is simply an open-loop type identication problem, the expression for the
asymptotic bias distribution, as derived for the open loop situation, can simply be applied.
This leads to the following result.
r e
? ?
S0 S S0 H 0
r - ?
ur- +
d - ++
- ?
e -y
S G0
+ ur
and
r ()
= arg min |S0 (ei ) S(ei , )|2 d (9.104)
|H (ei )|2
Proposition 9.7.3 shows that even if in both steps of the procedure non-consistent estimates
are obtained, the bias distribution of G(q, ) is characterized by a frequency domain ex-
pression which now becomes dependent on the identication result from the rst step (cf.
(9.104)), but that still is not dependent on noise disturbance terms.
Remark 9.7.4 Note that in (9.103) the integrand expression can be rewritten, using the
relation:
= [G0 (ei ) G(ei , )]S0 (ei ) + G(ei , )[S0 (ei ) S(ei , )] (9.105)
which shows how an error made in the rst step aects the estimation of G0 . If S(q, ) =
S0 (q) then (9.103) reduces to (9.102). If the error made in the rst step is suciently small
it will have a limited eect on the nal estimate G(q, ). This eect is also illustrated in
the block diagram of Figure 9.4.
Note that the results presented in this section show that a consistent estimation of the
sensitivity function S0 is not even necessary to get a good approximate identication of
the transfer function G0 . Equations (9.103) and (9.105) suggest that as long as the error
in the estimated sensitivity function is suciently small, the i/o transfer function can be
identied accurately. In this respect one could also think of applying an high order FIR
(nite impulse response) model structure in the rst step, having a sucient polynomial
degree in order to describe the essential dynamics of the sensitivity function. This model
structure will be applied in the simulation example described later on.
260 Version 8 January 2011
10 1
Magnitude
10 0
10 -1
10 -2 10 -1 10 0 10 1
Frequency [rad/s]
Figure 9.5: Bode amplitude plot of exact sensitivity function S0 (solid line) and estimated
sensitivity function S(q, N ) (dashed line).
C = q 1 0.8q 2 (9.107)
Note that the real sensitivity function S0 is a rational transfer function of order 4. The
magnitude Bode plot of the estimated sensitivity function is depicted in gure 9.5, together
with the exact one.
The estimate S(q, N ) is used to reconstruct a noise free input signal urN according to (9.97).
Figure 9.6 shows this reconstructed input signal, compared with the real input u(t) and the
optimally reconstructed input signal ur (t) = S0 (q)r(t).
Chapter 9 261
-1
-2
-3
-4
250 255 260 265 270 275 280
Figure 9.6: Simulated input signal u (solid line), non-measurable input signal ur caused by
r (dashed line) and reconstructed input signal urN (dotted line)
Note that, despite of the severe noise contribution on the signal u caused by the feedback
loop, the reconstruction of ur by urN is extremely accurate.
In the second step an output error model structure is applied such that G0 G, by taking
b0 + b1 q 1 + b2 q 2
G(q, ) = and W (q, ) = 1 (9.110)
1 + a1 q 1 + a2 q 2
Figure 9.7 shows the result of G(q, N ). The magnitude Bode plot is compared with the
second order model obtained from a one-step direct identication strategy in which an output
error method is applied, using only the measurements of u and y. The results clearly show
the degraded performance of the direct identication strategy, while the indirect method gives
accurate results. This is also clearly illustrated in the Nyquist plot of the same transfer
functions, as depicted in gure 9.8.
The degraded results of the direct identication method, here is caused by the fact that in
this method the incorrect modelling of the noise (no noise model in the output error model
structure) deteriorates the estimation of the i/o transfer function. The mechanism, which is
(asymptotically) not present in the open-loop situation, becomes very apparent in the closed
loop case. 2
Remark 9.7.6 Concerning the ability of the method to deal with unstable plants in the
second step of the procedure, again uniform stability of the model set M will generally require
G to be restricted to contain stable models only. Identication of unstable plants would be
possible if the set of transfer functions G0 , So H0 would share the same denominator, and
thus could be modelled with an ARMAX model. Compare with gure 9.4 to see that this
set of transfer functions describes the data generating system in the second step of the
identication procedure. However for such a model structure used in the second step of
the procedure, the tunability of the asymptotic bias expression would be lost. Incorporating
restrictions on the input signal ur might lead to appropriate conditions for handling unstable
262 Version 8 January 2011
10 2
10 1
Magnitude
10 0
10 -1
10 -2 10 -1 10 0 10 1
Frequency [rad/s]
Figure 9.7: Bode amplitude plot of transfer function G0 (solid line), output error estimate
G(q, N ) obtained from the two-stage method (dashed line), and output error estimate
obtained from the direct method (dotted line). Order of the models is 2.
-2
-4
-6
-8
-10
-12
-14
-16
-4 -2 0 2 4 6 8 10 12 14
Figure 9.8: Nyquist curve of transfer function G0 (solid line), output error estimate G(q, N )
obtained from the two-stage method (dashed line), and output error estimate obtained from
the direct method (dotted line). Order of the models is 2.
Chapter 9 263
Two-stage method
Consistency (G(q, N ), H(q, N )) +
Consistency G(q, N ) +
Tunable bias +
Fixed model order +
Unstable plants
(G(q, N ), C) stable
C assumed known no
plants also. However this has not been shown yet. The asymptotic results of Propositions
9.7.2 and 9.7.3 remain valid also for unstable plants.
Based on relation (9.112), we can obtain an estimate D(q) of the transfer function S0
between r and u, by applying standard open loop techniques and using measured signals r
and u. If r is uncorrelated with e, this problem is an open-loop identication problem.
Based on relation (9.111), we can similarly obtain an estimate N (q) of the transfer function
G0 S0 between r and y, by applying similar open loop methods and using measured signals
r and y.
Note that the pair (G0 S0 , S0 ) can be considered a factorization of G0 , since G0 S0 (S0 )1 =
G0 . Since the closed loop system is stable, the two separate factors composing this fac-
torization are also stable, and moreover they can be identied from data measured in the
closed loop.
As a result, we can arrive at an estimate G(z) = N (z)D(z)1 of the open loop plant G0 (z).
If the estimates of both factors are obtained from independent data sequences, we may
expect that the resulting estimate G(z) is consistent in the case that both estimators D(z)
and N (z) are consistent.
When the external exciting signal r is not measurable, the method sketched above can
not be followed directly. However, it appears to be that this lack of information can be
completely compensated for, if we have knowledge about the controller C(z).
Note that from system relations (9.1)-(9.4) it follows that
Using knowledge of C(q), together with measurements of u and y, we can simply recon-
struct the reference signal r in the relations (9.112),(9.111) as used above. So in stead of
a measurable signal r, we can equally well deal with the situation that y, u are measurable
and C is known. Note that for arriving at consistent models, the basic requirement of
persistence of excitation of r then is replaced by a similar condition on the signal u + Cy.
It appears to be true that the signal {u(t) + C(q)y(t)} is uncorrelated with {e(t)} provided
that r1 , r2 are uncorrelated with e. Additionally we can still use the relations (9.111),
(9.112) even if r is not measurable. In that case we will consider r(t) to be given by
r(t) = u(t) + C(q)y(t).
It has been discussed that (G0 S0 , S0 ) is a specic stable factorization of G0 of which the two
factors can be identied from data measured in the closed loop. We can raise the question
whether there exist more of these factorizations that can be identied from similar data.
By introducing an articial signal
with F (z) any chosen and xed stable rational transfer function, we can rewrite the systems
relations as
above, provided of course that the factors themselves are stable. This situation is sketched
in gure 9.9.
We will now characterize the freedom that is available in choosing this xed transfer function
F . To this end we need the notion of a coprime factorization over IRH .
(a) G0 = N D 1 , and
The equation in (b) above is generally denoted as the (right) Bezout identity.
There exists a dual denition for left coprime factorizations (lcf) (D, N ) with G0 = D 1 N ,
in which the condition in (b) has to be replaced by N X + D Y = I. For scalar transfer
functions, right and left coprimeness are equivalent.
The interpretation of coprimeness of two stable factors over IRH , is that they do not have
any common unstable zeros, i.e. zeros in |z| 1, that cancel in the quotient of the two
factors.
z 0.5
G0 (z) = . (9.117)
z 0.7
Examples of coprime factorizations (over IRH ) N0 , D0 of this function are:
N0 = G0 ; D0 = 1;
note that X = 0, Y = 1 will satisfy the Bezout identity XN0 + Y D0 = 1.
z 0.5 z 0.7
N0 = ; D0 = ;
z 0.9 z 0.9
here e.g. X = 3.5(1 0.9z 1 ) and Y = 2.5(1 0.9z 1 ) will satisfy the Bezout
identity.
z 1.2 z 1.2
Note that a factorization N0 = , D0 = is not coprime.
z 0.7 z 0.5
The notion of coprimeness of factorizations can be used to characterize the freedom that is
present in the choice of the lter F .
Proposition 9.8.3 Consider a data generating system according to (9.1),(9.2), such that
the closed loop system is internally stable, and let F (z) be a rational transfer function
dening x(t) as in (9.114). Let the controller C have a left coprime factorization (Dc , Nc ).
Then the following two expressions are equivalent
b. F (z) = W Dc with W any stable and stably invertible rational transfer function. 2
266 Version 8 January 2011
e
?
H0
r1
v
r2 + + y
+ ? u + ?
-f - C - f - G0 - f -
+
6
r ?
f+
x F C
e
?
W0 H0
+ y
- N0,F +
- ?
f -
- D0,F +
- f -u
+
6
CW0 H0
6
e
Proposition 9.8.4 Consider the situation of Proposition 9.8.3. For any choice of F =
W Dc with W stable and stably invertible, the induced factorization of G0 , given by
(G0 S0 F 1 , S0 F 1 ) is right coprime. 2
Chapter 9 267
Proof: Let (X, Y ) be right Bezout factors of (N, D), and denote [X1 Y1 ] = W (Dc D +
Nc N )[X Y ]. Then by employing (9A.1) it can simply be veried that X1 , Y1 are stable and
are right Bezout factors of (G0 S0 F 1 , S0 F 1 ). 2
Proposition 9.8.5 The lter F yields stable mappings (y, u) x and x (y, u) if and
only if there exists an auxiliary system Gx with rcf (Nx , Dx ), stabilized by C, such that
1
F = (Dx + CNx )1 . For all such F the induced factorization G0 = N0,F D0,F is right
coprime.
Proof: Consider the situation of Proposition 9.8.3. First we show that for any C with lcf
Dc1 Nc and any stable and stably invertible W there always exists a system Gx with rcf
Nx Dx1 such that W = [Dc Dx + Nc Nx ]1 .
Take a system Ga with rcf Na Da1 that is stabilized by C. With Lemma 9A.1 it follows that
Dc Da + Nc Na = with stable and stably invertible. Then choosing Dx = Da 1 W 1
and Nx = Na 1 W 1 delivers the desired rcf of a system Gx as mentioned above.
Since F = W Dc and substituting W = [Dc Dx + Nc Nx ]1 it follows that F = [Dx + CNx ]1 .
2
We will comment upon this result in the next section, where we will discuss the dual Youla
parametrization.
Employing this specic characterization of F , the coprime plant factors that can be iden-
tied from closed loop data satisfy
N0,F G0 (I + CG0 )1 (I + CGx )Dx
= . (9.118)
D0,F (I + CG0 )1 (I + CGx )Dx
In terms of identifying a model of the plant dynamics, we are faced with the system relations
(9.115),(9.116) as also sketched in gure 9.9. We can now formulate the prediction error
for the 1-input 2-output dynamical system relating input x(t) to output col(y(t), u(t))
by: ! !
Hy (q, )1 0 y(t) N (q, )x(t)
(t, ) = (9.119)
0 Hu (q, )1 u(t) D(q, )x(t)
where Hy (q, ) and Hu (q, ) are the noise models in the two transfer functions. If these
noise models are xed, i.e. Hy (q, ) = Hy (q) and Hu (q, ) = Hu (q), then a least squares
identication criterion will yield the asymptotic estimate determined by1
! ! !
1 N0,F N () |Hy |2 0 N0,F N ()
= arg min x d.
2 D0,F D() 0 |Hu |2 D0,F D()
(9.120)
1
The arguments ei have been suppressed for clarity.
268 Version 8 January 2011
together with the relation x(t) = F (q)r(t) (9.114) then the asymptotic parameter estimate
can equivalently be characterized by
5
1 |G0 S0 N ()F |2 |S0 D()F |2
= arg min + r d. (9.123)
2 |Hy |2 |Hu |2
The integral expression now changes from a simple additive mismatch on the open loop
plant dynamics (as in Chapter 5), to a weighted mismatch on sensitivity functions and
plant-times-sensitivity functions, clearly exhibiting the role of the controller also in this
criterion. By choosing a model structure in which Hy and Hu are xed (not parametrized),
the asymptotic identication criterion becomes explicit, as meant in Problem (c) in the
introduction. However the model parametrization appearing in this criterion is still in the
form of parametrized factors N (), D() and not of its transfer function model G().
In order to be able to characterize the asymptotic criterion in terms of G0 (q) and G(q, ),
a restriction on the parametrization of N () and D() is required.
Requiring that
with S(q, ) = [1+C(q)G(q, )]1 the sensitivity function of the parametrized model, would
provide us with an asymptotic identication criterion that can be characterized in terms of
G0 (q), S0 (q) and G(q, ), S(q, ). This parametrization restriction implies that
and actually connects the two separate coprime factors to each other. Note that the two
plant factors (N0,F , D0,F ) satisfy the (similar) relation
This expression will lead to a monic transfer function H(q, N ) provided that we have
restricted the plant model G(q, N ) to be strictly proper. If this is not the case, then more
complex expressions have to be used for the construction of a (monic) noise model, similar
to the situation of the joint input/output method from section 9.5. Generally speaking,
this coprime factor identication approach can be considered to be a generalization of the
joint i/o method.
For analyzing the results on consistency, the open-loop framework as presented in Chapter
5 can fully be used. This is due to the fact that the closed-loop identication problem
actually has been split in two open-loop type of identication problems. If the model sets
for parametrizing N (q, ), D(q, ) have been chosen properly, then consistent estimates of
the plant factors can be achieved under the generally known conditions. This situation
refers to Problems (a) and (b) as mentioned in the introduction of this chapter. An explicit
approximation criterion as considered in Problem (c) can be formulated; this approximation
criterion can be written in terms of the model transfer function G(q, ) provided that a
parametrization constraint is taken into account, as discussed in (9.126).
As a nal note we remark that due to this coprime factor framework the identication
method discussed does not meet any problems in handling unstable plants and/or unstable
controllers, as both are represented by a quotient of two stable factors.
In table 9.7 a summary is given of the properties of this method. The coprime factor
identication method is successfully applied to a X-Y positioning table (Schrama, 1992)
and to a compact disc servo mechanism (Van den Hof et al.. 1995).
Stated from the other point of view: for a given plant G0 with McMillan degree n, the
McMillan degree of corresponding coprime factors N0,F , D0,F will be dependent on G0 , C,
Gx and Dx , as can be understood from (9.118).
A relevant question then appears to be: Under what conditions on F (q) will the coprime
factorizations N0,F , D0,F have a McMillan degree that is equal to the McMillan degree of
G0 ? If one would restrict attention to coprime factorizations with this property, then one
could control the McMillan degree of the plant model, by enforcing the model order of the
coprime factors separately.
Next we will consider a specic class of coprime factorizations that has the property as
mentioned above.
which for the scalar case reduces to |N (ei )|2 + |D(ei )|2 = 1. 2
By using normalized coprime factorizations one can avoid the identication of dynamics
that is redundantly present in both factors.
The principle scheme that can be followed now is composed of two steps:
1 Construct a data lter F such that the coprime factors N0,F , D0,F that are accessible
from closed-loop data become normalized;
2 Identify N0,F and D0,F using a parametrization in terms of polynomial fractions and
with a common denominator:
B(q 1 , ) A(q 1 , )
N (q, ) = D(q, ) = (9.129)
D(q 1 , ) D(q 1 , )
2
In the exceptional case that G0 contains all-pass factors, (one of) the normalized coprime factors will
have McMillan degree < n, see Tsai et al. (1992).
Chapter 9 271
Bode magnitude
1
10
0
10
1
10 2 1 0 1
10 10 10 10
frequency (rad/sec)
Bode phase
15
10
10
15
20 2 1 0 1
10 10 10 10
frequency (rad/sec)
Figure 9.10: Bode magnitude (upper) and phase (lower) plot of G0 (solid), N0 (dashed)
and D0 (dash-dotted).
B(q 1 , N )
G(q, N ) = . (9.130)
A(q 1 , N )
The normalization step 1 in this procedure is necessary in order to guarantee that one can
indeed restrict attention to coprime factor models with a McMillan degree that is similar
to the McMillan degree of the plant model. If there is no normalization, then N0,F and
D0,F can be (very) high order factorizations, which require high order models for accurate
modelling.
Step 1
The construction of F (q) as mentioned above is guided by the following principle. If Gx =
G0 and (Nx , Dx ) is a normalized right coprime factorization of Gx , then N0,F = Nx and
D0,F = Dx being normalized also. This can be veried by considering the expression (9.118)
for the factors N0,F and D0,F . The resulting lter F (q) is then given by
The algorithm presented here has been worked out in more detail in Van den Hof et al.
(1995), where also the consequences are analyzed of inaccuracies that occur in the rst
step of the procedure, for the accuracy of the nally estimated plant models. One can
observe that part of the mechanism is similar to the two-stage method of closed-loop
identication. In a rst step, actually exact plant information is required to prepare for the
second step. In both methods, this rst step is replaced by an high-order accurate estimate
of this plant.
Whereas the general coprime factor identication does not require knowledge of the con-
troller that is implemented in order to arrive at a plant model, the algorithm presented
above does require this knowledge in order to optimally tune the lter F according to
(9.131).
G0
v
track
r1 u + + y
+e
- - actuator -?
e ?
-e -
+6 +
C
This experimental set up is used to gather time sequences of 8192 points of the input u(t)
to the radial actuator G0 and the disturbed track position error y(t) in closed loop, while
exciting the control loop by a bandlimited (100Hz10kHz) white noise signal r1 (t), added
to the input u(t).
The results of applying the two steps of the procedure presented in Section 9.8.2 are shown
in a couple of gures. Recall from Section 9.8.2, that in the rst step the aim is to nd an
auxiliary model Gx with a normalized rcf (Nx , Dx ) used to construct the lter F , such that
the factorization (N0,F , D0,F ) of G0 becomes (almost) normalized. The result of performing
Step 1 is depicted in Figure 9.12. A high (32-nd) order model Gx has been estimated and
a normalized factorization (Nx , Dx ) has been obtained. Based on this factorization a lter
F has been constructed according to (9.131), giving rise to a coprime factorization of G0
in terms of (N0,F , D0,F ).
1
magnitude
10
0
Nx
10
1
10 Dx
2
10 2 3 4
10 10 frequency [Hz] 10
0
10
1
10 2 3 4
10 10 frequency [Hz] 10
Figure 9.12: Results of Step 1 of the procedure. Upper gure: Bode magnitude plot of
estimated 32nd order coprime plant factors (Nx , Dx ) of auxiliary model Gx (solid) and
spectral estimates of the factors (N0,F , D0,F ) (dashed). Lower gure: Plot of |N0,F (ei )|2 +
|D0,F (ei )|2 using the spectral estimates N0,F and D0,F .
Figure 9.12(a) shows an amplitude Bode plot of a spectral estimate of the factors N0,F and
D0,F , respectively denoted by N0,F and D0,F , along with the factorization Nx and Dx of the
high order auxiliary model Gx . Additionally, it is veried whether the spectral estimates
(N0,F , D0,F ) are (almost) normalized. To this end the expression |N0,F (ei )|2 + |D0,F (ei )|2
is plotted in Figure 9.12(b). The fact that this expression is very close to unity shows that
274 Version 8 January 2011
the coprime factorization that is induced by the specic choice of F is (almost) normalized.
magnitude
1
10
N
0
10
1
10 D
2
10 2 3 4
10 10 frequency [Hz] 10
phase
0
D
100
N
200
300
400 2 3 4
10 10 frequency [Hz] 10
Figure 9.13: Bode plot of the results in Step 2 of the procedure: Identied 10th order
coprime factors (N , D) (solid) and spectral estimates (dashed) of the factors N0,F , D0,F .
Figure 9.13 presents the result of a low (10th) order identied model of the factorization
(N0,F , D0,F ), which is obtained in the second step of the procedure outlined in section 9.8.2.
The picture shows the amplitude and phase Bode plots, along with the previously obtained
spectral estimates of the coprime factors.
Amplitude and phase Bode plots of the nally obtained 10th order model G(q, N ) are
depicted in Figure 9.14 along with the corresponding spectral estimate.
Scalar normalized coprime factorizations exhibit the property that their amplitude is bounded
by 1. As a result, the integral action in the plant is necessarily represented by a small de-
nominator factor D0,F for low frequencies, whereas a roll-o of G0 for high frequencies will
be represented by a roll-o of the numerator factor N0,F . This is clearly visualized in the
results.
This application shows the successful identication of an unstable plant, from closed-loop
experimental data.
1
10
0
10
1
10
2
10 2 3 4
10 10 frequency [Hz] 10
phase
100
200
300
400 2 3 4
10 10 frequency [Hz] 10
Figure 9.14: Bode plot of the results in Step 2 of the procedure: Identied 10th order model
1
G = N D 1 (solid) and spectral estimate N0,F D0,F (dashed) of the plant G0 .
Proposition 9.9.1 ((Desoer et al. , 1980)) Let C be a controller with rcf (Nc , Dc ), and
let Gx with rcf (Nx , Dx ) be any system that is stabilized by C. Then a plant G0 is stabilized
by C if and only if there exists an R IRH such that
G0 = (Nx + Dc R)(Dx Nc R)1 . (9.132)
The above proposition determines a parametrization of the class of all linear, time-invariant,
nite-dimensional systems that are stabilized by the given C. The parametrization is de-
picted in a block diagram in Figure 9.15.
Note that Nx Dx1 is just any (nominal, auxiliary) system that is stabilized by C. In the
case of a stable controller C, a valid choice is given by Gx = 0, as the zero-transfer function
is stabilized by any stable controller.
This also can be used to illustrate the relation of this method with the more classical indirect
method in section 9.4.
For a stable controller C valid choices of the coprime factors are Nc = C, Dc = 1 and
Nx = 0, Dx = 1. This particular choice of coprime factors leads to the representation:
R
G0 = ,
1 CR
which directly corresponds to
G0
R=
1 + CG0
being the closed-loop transfer function from r to y. So, in this particular situation the
set of all plants that is stabilized by C is characterized by all stable closed-loop transfer
functions.
Now we will focus on the more general situation, that is also valid for e.g. unstable con-
trollers. The next Proposition shows that for any given system G0 , the corresponding stable
R and the corresponding factorization (9.132) are uniquely determined.
Proposition 9.9.2 Consider the situation of Proposition 9.9.1. Then for any plant G0
that is stabilized by C holds
(a) The stable transfer function R in (9.132) is uniquely determined by
R = R0 := Dc1 (I + G0 C)1 (G0 Gx )Dx . (9.133)
which proves the result, employing the relations C(I + G0 C)1 = (I + CG0 )1 C and
(I + G0 C)1 G0 = G0 (I + CG0 )1 . 2
This Proposition shows that the coprime factorization that is used in the Youla parametriza-
tion is exactly the same coprime factorization that we have constructed in the previous
section, by exploiting the freedom in the prelter F , see Proposition 9.8.5.
Now we face the interesting question whether we can exploit this Youla parametrization
also in terms of system identication. To this end we rst have to extend the corresponding
system representation as sketched in gure 9.15 with the appropriate disturbance signal
on the output. We do this in the standard way by adding a noise term to the output as
depicted in Figure 9.16.
Nc R0 Dc H0
r1
r2 + + + + + y
C + Dx1 Nx
- + u + +
Figure 9.16: Dual Youla-representation of the data generating system with noise.
It appears that the eect of the noise term e on the measured signals u and y can be
equivalently represented by adding am appropriately ltered noise term e at the output of
the dual-Youla parameter R0 . This is depicted in Figure 9.17.
Proposition 9.9.3 With respect to the measured signals u and y, the two schemes in
Figures 9.16 and 9.17 and equivalent, provided that K0 is chosen as
K0 = Dc1 W0 H0 .
Proof: For the moment we consider the transfer of signals within the open-loop part of
the scheme for the equation y = G0 u + H0 e. Equivalently we assume for the moment that
the feedback loop is disconnected. The transfer function from e to z is then given by:
K0
Hze =
1 R0 Nc Dx1
278 Version 8 January 2011
z
e
K0 + +
Nc R0 Dc
r1
r2 + + + + y
C + Dx1 Nx
- + u x +
Figure 9.17: Dual Youla-representation of the data generating system with noise shifted to
the dual-Youla parameter.
Nc Dx1 K0
Hxe = .
1 R0 Nc Dx1
Dc K0 + Nx Nc Dx1 K0
Hye = Dc Hze + Nx Hxe =
1 R0 Nc Dx1
and this transfer should be equal to H0 (as in the open loop conguration). Rewriting the
latter equation leads to
K0 (Dc + Nc Gx )
= H0
1 R0 Nc Dx1
or equivalently
H0 (1 R0 Nc Dx1 )
K0 = .
Dc + Nc Gx
Using the expression (9.133) for R0 it follows that
Dc1 (G0 Gx )
1 1+CG0 Nc
K0 = H0 (9.137)
Dc + Nc Gx
1 + CG0 C(G0 Gx )
= H0 (9.138)
(1 + CG0 )(1 + CGx )Dc
which shows that for K0 = Dc1 W0 H0 the eect of e on the output y is the same as in
Figure 9.16. If the eect of e on y is the same, that this also necessarily holds for the eect
of e on the input u in the closed-loop case. 2
The scheme depicted in Figure 9.17 has a number of several nice properties, as formulated
next.
Proposition 9.9.4 In the block diagram as depicted in Figure 9.17, the following properties
hold:
Chapter 9 279
Proposition 9.9.5 Consider the identication of the transfer function R0 with the (scalar)
model structure (9.144) using a xed noise model K(q, ) = K (q) and a least squares
identication criterion. Then the asymptotic parameter estimate is characterized by
2
1 1 1 1
= arg min [G0 G()] r d (9.148)
2 Dc (1 + G0 C) (1 + G()C) K
!
1 1
G0 G() 1 2
= arg min r d. (9.149)
2 Dc 1 + G0 C 1 + G()C) K
Proof: Substituting the expression (9.133) for R0 and R() it follows that
1 1
R0 R() = (G0 Gx )Dx (G() Gx )Dx
Dc (1 + G0 C) Dc (1 + G()C)
1 G0 G()
= [1 + Gx C]Dx . (9.150)
Dc (1 + G0 C)(1 + G()C)
Using the fact that [1 + Gx C]Dx = Dx + CNx = F 1 this leads to the required result. 2
An interesting remark that has to be made, is that the stable transfer function R0 to be
estimated in this approach, is strongly dependent on the nominal system Gx that is chosen
as a basis for the dual Youla-parametrization. If we choose Gx = G0 (i.e. we have complete
knowledge of the input/output system to be identied), then the resulting R0 will be zero.
Whenever Gx is close to G0 then we may expect R0 to be a small transfer function.
In this way, R0 represents the dierence between assumed knowledge about the system
and the actual system itself.
The presented procedure of identifying R0 is introduced, analyzed and employed in Hanssen
(1988), Schrama (1991, 1992) and Lee et al. (1992); see also Van den Hof and Schrama
(1995).
The following remarks can be made with respect to this identication method.
The identication method fruitfully uses knowledge of the controller that is imple-
mented when experiments are performed. Knowledge of this controller is instrumental
in the identication, similar to the situation of the indirect identication method in
section 9.4.
An estimated factor R(N ) that is stable, will automatically yield a model G(N ) that
is guaranteed to be stabilized by C.
Having identied the parameter and the corresponding transfer function R( )
with a xed McMillan degree nr , the McMillan degree of G( ) will generally be much
larger. This is due to the required reparametrization as presented in (9.146). This
implies that in the identication as discussed above, the complexity (McMillan degree)
of the resulting model G( ) is not simply tunable. Constructing an appropriate
parameter space in such a way that the corresponding set of models {G(), }
has a xed McMillan degree, is a nontrivial parametrization problem, that has not
been solved yet.
The asymptotic identication criterion, as reected by (9.148), is not dependent on
the chosen auxiliary model Gx or its factorization.
Chapter 9 281
Dual-Youla method
Consistency (G(q, N ), H(q, N )) +
Consistency G(q, N ) +
Tunable bias +
Fixed model order
Unstable plants +
(G(q, N ), C) stable +
C assumed known yes
0
10
1
10
2
10 1 0
10 10
Figure 9.18: Typical curve for Bode magnitude plot of sensitivity function S0 (solid) and
related complementary sensitivity G0 C/(1 + CG0 ) (dashed).
By designing the (xed) noise model K (or the signal spectrum r1 ), this bias expression
can explicitly be tuned to the designers needs. However the expression is dierent from
the related open-loop expression (5.174). Instead of a weighted additive error on G0 , the
integrand contains an additive error on G0 S0 . Straightforward calculations show that
[G0 G()]r21
= arg min 2 . (9.155)
(1 + CG0 )(1 + CG())K
This implies that in the (indirect) closed-loop situation, the additive error on G0 is always
weighted with S0 . Thus emphasis will be given to an accurate model t in the frequency
region where S0 is large and the identied model will be less accurate in the frequency
region where S0 is small. In Figure 9.18 a typical characteristic of S0 and closed-loop
transfer G0 C/(1 + CG0 ) is sketched. This illustrates that emphasis will be given to an
accurate model t in the frequency region that particularly determines the bandwidth of
the control system. In this area (where |S0 | 1), the noise contribution of v in the output
signal y is amplied by the controller. According to Bodes sensitivity integral (Sung and
Hara, 1988) for a stable controller:
log |S(ei )| d = c (constant)
0
with c determined by the unstable poles of the plant, and c = 0 for G0 stable. This implies
that the attenuation of signal power in the low frequency range, will always be compen-
sated for by an amplication of signal power in the higher frequency range. The aspect
that closed-loop identication stresses the closed-loop relevance of identied (approximate)
models, has been given strong attention in the research on identication for control.
Whereas direct identication needs consistent estimation of noise models in order to con-
sistently identify G0 , indirect methods can do without noise models. Incorporation of noise
models in indirect methods is very well possible, but this will result in bias distributions
that become dependent on the identied noise models as well as on (the unknown) v .
Chapter 9 283
9.10.2 Variance
For analyzing the asymptotic variance of the transfer function estimates we consider again
the prediction error framework (Ljung, 1999) that provides variance expressions that are
asymptotic in both n (model order) and N (number of data). For the direct identication
approach, and in the situation that S M this delivers:
!1
G(ei ) n u () eu ()
cov v () . (9.156)
H(ei ) N ue () 0
with ur := S0 (q)r1 and ue := CS0 (q)v and the related spectra ru = |S0 |2 r1 and eu =
|CS0 |2 v . Using the expression ue = CS0 H0 0 , (6.63) leads to (Ljung, 1993; Gevers et
al., 1997):
1 (CS 0 H 0 )
G n v ,
cov u
H N ru CS0 H0
0
and consequently
n v n v u
cov(G) cov(H) . (9.157)
N ru N 0 ru
This shows that only the noise-free part ur of the input signal u contributes to variance
reduction of the transfer functions. Note that for ur = u the corresponding open-loop
results appear.
In Gevers et al. (2001) it is shown that for all indirect methods presented in this chapter,
these expressions remain the same. However, again there is one point of dierence between
the direct and indirect approach. The indirect methods arrive at the expression for cov(G)
also without estimating a noise model (situation G0 G), whereas the direct method
requires a consistent estimation of H0 for the validity of (9.157).
The asymptotic variance analysis tool gives an appealing indication of the mechanisms that
contribute to variance reduction. It also illustrates one of the basic mechanisms in closed-
loop identication, i.e. that noise in the feedback loop does not contribute to variance
reduction. Particularly in the situation that the input power of the process is limited, it
is relevant to note that only part of this input power can be used for variance reduction.
This has led to the following results (Gevers and Ljung, 1986):
If the input power is constrained then minimum variance of the transfer function
estimates is achieved by an open-loop experiment;
If the output power is constrained then the optimal experiment is a closed-loop ex-
periment.
Because of the doubly asymptotic nature of the results (N, n ), this asymptotic
variance analysis tool is also quite crude.
For nite model orders, the variance results will likely become dierent over the several
methods. The direct method will reach the Cramer-Rao lower bound for the variance in
the situation S M. Similar to the open-loop situation, the variance will typically increase
284 Version 8 January 2011
when no noise models are estimated in the indirect methods. This will also be true for the
situation that two - independent - identication steps are performed on one and the same
data set, without taking account of the relation between the disturbance terms in the two
steps. Without adjustment of the identication criteria, the two stage method and the
coprime factor method are likely to exhibit an increased variance because of this.
For nite model orders and the situation S M, it is claimed in Gustavsson et al. (1977)
that all methods (direct and indirect) lead to the same variance; however for indirect
methods this result seems to hold true only for particular (ARMAX) model structures. A
more extensive analysis of the nite order case is provided in Forssell and Ljung (1999).
Special attention has to be given to the situation of the direct identication method in
closed-loop. Considering the prediction error in this situation:
G0 G() H0 (1 + CG())
(t, ) = r(t) + e(t) (9.158)
H()(1 + CG0 ) H()(1 + CG0 )
it can be observed that if G and H are estimated consistently, (t, ) = e(t) which can
be veried by analyzing the autocovariance function RN ( ). If RN ( ) is a Dirac-function
(taking account of the usual condence bounds) this can be taken as an indication that
G and H are estimated consistently, provided that the contribution to (t, ) of the r-
dependent term will generally not lead to a white noise term.
For checking the consistency of G one would normally perform a correlation test on Rr ( ),
to verify whether this covariance function is equal to 0 for all . However when doing direct
identication the signal r is usually not taken into account, and the standard identication
N ( ). The interpretation of this latter test asks
tools will provide a correlation test on Ru
for some extra attention when closed-loop data is involved.
Writing
(t, ) = H()1 [(G0 G())u(t) + H0 e(t)] (9.159)
it follows that
Concerning the term Reu ( ) one can make the following observations:
For > 0, Reu ( ) = 0 since e(t) is uncorrelated with past values of the input u.
For = 0, Reu ( ) = 0 only if C contains a delay;
For < 0, Reu ( ) is the (time-)mirrored pulse response of H0 S0 C.
The latter statements are implied by the fact that the e-dependent term in u(t) is given by
H0 S0 Ce(t).
Chapter 9 285
When Reu ( ) is a time-signal with values unequal to zero for 0, it can easily be observed
that whenever H() = H0 , there will be a nonzero contribution of the second term in (9.160)
to the cross-covariance function Ru ( ). As a result it follows that Ru ( ) = 0 does not
imply that G( ) = G0 .
The implication of this result is that in a situation where the classical validation tests
RN ( ) and Ru
N ( ) fail, it is unclear whether the invalidation is due to an incorrect estimate
9.12 Evaluation
The assessment criteria as discussed in section 9.1.3 have been evaluated for the several
identication methods, and the results are listed in Table 9.9.
The methods that are most simply applicable are the direct method and the two-stage
method. When considerable bias is expected from correlation between u and e, then the
two-stage method should be preferred. For the identication of unstable plants the co-
prime factor, dual-Youla/Kucera and tailor-made parametrization method are suitable, of
which the latter one seems to be most complex from an optimization point of view. When
approximate -limited complexity- models are required, the coprime factor method is attrac-
tive. When additionally the controller is not accurately known, the two-stage method has
advantages.
All methods are presented in a one-input, one-output conguration. The basic ideas as well
as the main properties are simply extendable to MIMO systems.
The basic choice between direct and indirect approaches should be found in the evaluation
of the following questions:
(a) Is there condence in the fact that (G0 , H0 ) and e satisfy the basic linear, time-
invariant and limited order assumptions in the prediction error framework?
(b) Is there condence in the fact that C operates as a linear time-invariant controller?
The direct method takes an armative answer to (a) as a starting point. Its results are not
dependent on controller linearity; however the method requires exact modelling in terms of
question (a). The indirect methods are essentially dependent on an armative answer to
(b), and might be more suitable to handle departures from aspect (a).
286 Version 8 January 2011
So far the experimental setup has been considered where a single external signal r1 is
available from measurements. In all methods the situation of an available signal r2 (in
stead of r1 ) can be treated similarly without loss of generality. A choice of a more principal
nature is reected by the assumption that the controller output is measured disturbance
free. This leads to the (exact) equality
r = u + C(q)y.
The above equality displays that whenever u and y are available from measurements, knowl-
edge of r and C is completely interchangeable. I.e. when r is measured, this generates full
knowledge of C, through a noise-free identication of C on the basis of a short data sequence
r, u, y. Consequently, for the indirect methods that are listed in Table 9.9, the requirement
of having exact knowledge of C is not a limitation.
This situation is dierent when considering an experimental setup where the controller
output (like the plant output) is disturbed by noise. Such a conguration is depicted in
Figure 9.19, where d is an additional (unmeasurable) disturbance signal, uncorrelated with
the other external signals r and v.
+ y
- ?
r1 - h u- +
h -
+ + G0
6
+
?
h C h
+ r2
+6
r + d = u + C(q)y
and apparently now there does exist a principal dierence between the information content
in r and in knowledge of C. Two situations can be distinguished:
r is available and C is unknown. In this case the indirect identication methods
have no other option than to use the measured r as the external signal in the several
1
Only in those situations where the real plant (G0 , H0 ) has an ARX or ARMAX structure.
2
Not possible to identify unstable plants if in the second step attention is restricted to independently
parametrized G and K.
3
An accurate (high order) estimate of G0 as well as knowledge of C is required; this information can be
obtained from data.
4
For the indirect method, stability is guaranteed only if C is stable.
5
Consistency holds when the parameter set is restricted to a connected subset containing the exact plant
vector 0 .
Chapter 9 287
methods. In this way the disturbance d will act as an additional disturbance signal
in the loop that will lead to an increased variance of the model estimates.
C is exactly known. In this case the signal u + Cy can be exactly reconstructed and
subsequently be used as the external signal in the several indirect methods. In
this way the disturbance signal d is eectively used as an external input, leading to
an improved signal to noise ratio in the estimation schemes, and thus to a reduced
variance of the model estimates.
When measured signals u and y are given, one can argue what is the extra information
content of having knowledge of r and/or C. From the comparative results of direct and
indirect identication methods, one can conclude that this extra information allows the
consistent identication of G0 , irrespective of the noise model H0 .
An additional aspect that may favour closed-loop experiments over open-loop ones, is the
fact that a controller can have a linearizing eect on nonlinear plant dynamics; the presence
of the controller can cause the plant to behave linearly in an appropriate working point.
For all identication methods discussed, the nal estimation step comes down to the ap-
plication of a standard (open-loop) prediction error algorithm. This implies that also the
standard tools can be applied when it comes down to model validation (Ljung, 1999).
288 Version 8 January 2011
Appendix
Lemma 9A.1 Schrama (1992). Consider rational transfer functions G0 (z) with right co-
prime factorization (N, D)! and C(z) with left coprime factorization (Dc , Nc ).
G0
Then T (G0 , C) := (I +CG0 )1 C I is stable if and only if Dc D + Nc N is stable
I
and stably invertible. 2
!
N
Proof: Denote = Dc D + Nc N . Then T (G0 , C) = 1 Nc Dc .
D
If 1 is stable then T (G0 , C) is stable, since all coprime factors are stable. This proves
().
Since (N, D) right coprime, it follows that there exist stable X, Y such that XN + Y D = I.
Similarly, since (Dc , Nc ) left coprime, there exist stable Xc , Yc such! that Nc Xc + Dc Yc = I.
Xc
Stability of T (G0 , C) implies stability of X Y T (G0 , C) which implies stability
Yc
of 1 . 2
Bibliography
B.D.O. Anderson and M.R. Gevers (1982). Identiability of linear stochastic systems op-
erating under linear feedback. Automatica, 18, no. 2, 195-213.
B. Codrons, B.D.O. Anderson and M. Gevers (2002). Closed-loop identication with an
unstable or nonminimum phase controller. Automatica, 38, 2127-2137.
R.A. de Callafon, P.M.J. Van den Hof and M. Steinbuch (1993). Control-relevant identi-
cation of a compact disc pick-up mechanism. Proc. 32nd IEEE Conf. Decision and
Control, San Antonio, TX, pp. 2050-2055.
C.A. Desoer, R.W. Liu, J. Murray and R. Saeks (1980). Feedback system design: the
fractional representation approach to analysis and synthesis. IEEE Trans. Automat.
Contr., AC-25, pp. 399-412.
R.A. Eek, J.A. Both and P.M.J. Van den Hof (1996). Closed-loop identication of a con-
tinuous crystallization process. AIChE Journal, 42, pp. 767-776.
U. Forssell and L. Ljung (1999). Closed-loop identication revisited. Automatica, 35,
12151241.
U. Forssell and L. Ljung (2000). A projection method for closed-loop identication. IEEE
Trans. Autom. Control, 45, 2101-2106.
M. Gevers and L. Ljung (1986). Optimal experiment designs with respect to the intended
model application. Automatica, 22, 543-554.
M. Gevers, L. Ljung and P.M.J. Van den Hof (2001). Asymptotic variance expressions for
closed-loop identication. Automatica, 37, 781-786, 2001.
M. Gilson and P.M.J. Van den Hof (2001). On the relation between a bias-eliminated
least-squares (BELS) and an IV estimator in closed-loop identication. Automatica, 37,
1593-1600.
M. Gilson and P.M.J. Van den Hof (2003). IV methods for closed-loop system identi-
cation. Preprints 13th IFAC Symposium on System Identication, August 27-29, 2003,
Rotterdam, The Netherlands, pp. 537-542.
I. Gustavsson, L. Ljung and T. Soderstrom (1977). Identication of processes in closed loop
- identiability and accuracy aspects. Automatica, 13, 59-75.
F.R. Hansen and G.F. Franklin (1988). On a fractional representation approach to closed-
loop experiment design. Proc. American Control Conf., Atlanta, GA, USA, pp. 1319-
1320.
T. Kailath (1980). Linear Systems. Prentice Hall, Englewood Clis, NJ.
W.S. Lee, B.D.O. Anderson, R.L. Kosut and I.M.Y. Mareels (1992). On adaptive robust
control and control-relevant system identication. Proc. 1992 American Control Conf.,
Chicago, IL, USA, pp. 2834-2841.
L. Ljung (1993). Information content in identication data from closed-loop operation.
Proc. 32nd IEEE Conf. Decision and Control, San Antonio, TX, pp. 2248-2252.
L. Ljung (1999). System Identication - Theory for the User. Prentice-Hall, Englewood
Clis, NJ, 2nd edition.
T.S. Ng, G.C. Goodwin and B.D.O. Anderson (1977). Identiability of MIMO linear dy-
namic systems operating in closed loop. Automatica, 13, pp. 477-485.
A.G. Partanen and R.R. Bitmead (1993). Excitation versus control issues in closed loop
identication of plant models for a sugar cane crushing mill. Proc. 12th IFAC World
Congress, Sydney, Australia, Vol. 9, pp. 49-56.
R.J.P. Schrama (1990). Open-loop identication of feedback controlled systems. In: O.H.
290 Version 8 January 2011
Bosgra and P.M.J. Van den Hof (Eds.), Selected Topics in Identication, Modelling and
Control. Delft University Press, Vol. 2, pp. 61-69.
R.J.P. Schrama (1991). An open-loop solution to the approximate closed loop identication
problem. In: C. Banyasz & L. Keviczky (Eds.), Identication and System Parameter
Estimation 1991. IFAC Symposia Series 1992, No. 3, pp. 761-766. Sel. Papers 9th
IFAC/IFORS Symp., Budapest, July 8-12, 1991.
R.J.P. Schrama (1992). Approximate Identication and Control Design - with Application
to a Mechanical System. Dr. Dissertation, Delft Univ. Technology, 1992.
T. Soderstrom, L. Ljung and I. Gustavsson (1976). Identiability conditions for linear
multivariable systems operating under feedback. IEEE Trans. Automat. Contr., AC-21,
pp. 837-840.
T. Soderstrom and P. Stoica (1983). Instrumental Variable Methods for System Identica-
tion. Lecture Notes in Control and Information Sciences, Springer-Verlag, New York.
T. Soderstrom, P. Stoica and E. Trulsson (1987). Instrumental variable methods for closed-
loop systems. In: 10th IFAC World Congress. Munich - Germany. pp. 363368.
T. Soderstrom and P. Stoica (1989). System Identication. Prentice-Hall, Hemel Hemp-
stead, U.K.
H.K. Sung and S. Hara (1988). Properties of sensitivity and complementary sensitivity
functions in single-input single-output digital control systems. Int. J. Control, 48, 2429-
2439.
M.C. Tsai, E.J.M. Geddes and I. Postlethwaite (1992). Pole-zero cancellations and closed-
loop properties of an H -mixed sensitivity design problem. Automatica, 28, pp. 519-530.
P.M.J. Van den Hof, D.K. de Vries and P. Schoen (1992). Delay structure conditions for
identiability of closed-loop systems. Automatica, 28, no. 5, pp. 1047-1050.
P.M.J. Van den Hof and R.J.P. Schrama (1993). An indirect method for transfer function
estimation from closed loop data. Automatica, 29, no. 6, pp. 1523-1527.
P.M.J. Van den Hof, R.J.P. Schrama, O.H. Bosgra and R.A. de Callafon (1995). Identica-
tion of normalized coprime plant factors from closed-loop experimental data. European
J. Control, Vol. 1, pp. 62-74.
P.M.J. Van den Hof and R.J.P. Schrama (1995). Identication and control closed-loop
issues, Automatica, Vol. 31, pp. 1751-1770.
P.M.J. Van den Hof and R.A. de Callafon (1996). Multivariable closed-loop identication:
from indirect identication to dual-Youla parametrization. Proc. 35th IEEE Conference
on Decision and Control, Kobe, Japan, 11 - 13 December 1996, pp. 1397-1402.
P.M.J. Van den Hof (1998). Closed-loop issues in system identication. Annual Reviews in
Control, Volume 22, pp. 173-186. Elsevier Science, Oxford, UK.
E.T. van Donkelaar and P.M.J. Van den Hof (2000). Analysis of closed-loop identication
with a tailor-made parametrization. European Journal of Control, Vol. 6, no. 1, pp.
54-62.
M. Vidyasagar (1985). Control System Synthesis - A Factorization Approach. MIT Press,
Cambridge, MA, USA.
Y.C. Zhu and A.A. Stoorvogel (1992). Closed loop identication of coprime factors. Proc.
31st IEEE Conf. Decision and Control, Tucson, AZ, pp. 453-454.
Appendix A
Statistical Notions
var(x1 ) := cov(x1 , x1 ).
Estimation
Consider an estimate N of 0 , based on N data points. The following properties are
directed towards N .
a Unbiased. N is unbiased if E N = 0 .
c Consistent.
N is (weakly) consistent if for every > 0,
implying that for almost all realizations N , the limiting value limN N is equal
to 0 . This is also denoted as N 0 with probability (w.p.) 1 as N .
Strong consistency implies weak consistency.
301
302 Version 8 November 2005
cov(N ) cov(N ) N
for all consistent estimate N . In this expression the inequality should be interpreted
in a matrix-sense, being equivalent to the condition that cov(N )cov(N ) is a positive
semi-denite matrix.
N AsN ( , P )
with the asymptotic mean, and P the covariance matrix of the asymptotic pdf.
Since for consistent estimates P will be 0 the above expression will generally be written
as
1
(N ) AsN (0, P )
N
with P referred to as the asymptotic covariance matrix of N .
Theorem of Slutsky
Let x(N ), x1 (N ), x2 (N ) be sequences of random variables for N N.
(b) If plimN x(N ) = x and h is a continuous function of x(N ), then plimN h(x(N )) =
h(x1 ).
2 -distribution
If x1 , x2 , xn are
independent normally distributed random variables with mean 0 and
variance 1, then ni=1 x2i is a 2 -distributed random variable with n degrees of freedom,
denoted as
n
x2i 2 (n),
i=1
with the properties that its mean value is n and its variance is 2n.
Appendix B
Matrix Theory
Block matrices
If a matrix A is nonsingular, then
!
A B
det = det A det(D CA1 B) (B.2)
C D
and !1 !
A B A1 + E1 F E1
= (B.3)
C D 1 F 1
UT U = Iq (B.4)
V TV = Ir (B.5)
such that
P = U V T
with a diagonal matrix with nonnegative diagonal entries
1 2 min(q,r) 0.
303
304 Version 13 March 2004
Proposition B.2 Let P be a q r matrix with rank n, having a SVD P = U V T , and let
k < n. Denote !
T
Ik
Pk := U k V , k = Ik 0 . (B.6)
0
Then Pk minimizes both
P P62 and P P6F (B.7)
over all matrices P6 of rank p.
Additionally
P Pk 2 = k+1 , and
1
2
min(q,r)
P Pk F = i .
2
i=k+1
Bibliography
Noble, B. (1969). Applied Linear Algebra. Prentice-Hall, Englewood Clis, NJ.
Appendix C
C.1
Denition C.1 (Characteristic polynomial) For any square n n matrix A, the char-
acteristic polynomial of A is dened as
a(z) = det(zI A) = z n + a1 z n1 + a2 z n2 + + an
The equation a(z) = 0 is called the characteristic equation of A, and the n roots of this
equation are called the eigenvalues of A.
305
Index
306
Appendix C 307
OE, 108
orthonormal basis functions, 167
output error model structure, 108
overfit, 220
parametrization, 107
periodogram, 28
periodogram averaging, 68
persistence of excitation, 125
power spectral density, 23, 28
power spectrum, 38
power-signals, 19
PRBS, 200, 211
prediction, 102
prediction error, 104
prediction error identification, 99
predictor models, 106
quasi-stationary signals, 37
RBS, 197
rcf, 265
residual tests, 223