Ac16078457 1
Ac16078457 1
Ac16078457 1
Master's Thesis
I want to thank Paul O’Leary, the head of the Chair of Automation at the Montanuniversity
Leoben, for enabling this work. I’m thankful to my supervisor Matthew Harker for all his
support and the helpful discussions. I would like to thank Petra Hirtenlehner for always
being helpful with any organisatory matters, and all the other people working at the Chair
of Automation for providing a welcoming and friendly working environment. I’m grateful
to my parents who always supported me throughout my life, in the pursuit of my education
and through this venture.
ii
Abstract
iii
Kurzfassung
iv
Contents
1 Introduction 1
1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Systems and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 System Identification and its Place in Control Theory . . . . . . . . . . . 3
1.4 Why are Linear Models Preferred over Nonlinear Models? . . . . . . . . 4
v
CONTENTS vi
4 Case Studies 38
4.1 Free Fall with Drag due to Air . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.1 The Riccati Equation . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.2 System Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.3 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1.4 System Identification Based on One Data Set . . . . . . . . . . . 42
4.1.5 System Identification Based on Two Data Sets . . . . . . . . . . 45
4.1.6 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.7 System Identification Based on More Than Two Data Sets . . . . 52
4.1.8 System Identification of the Model Parameter and the Initial Con-
ditions of the Experiment . . . . . . . . . . . . . . . . . . . . . . 53
4.2 The Duffing Equation and the Nonlinear Mass and Spring System . . . . 56
4.2.1 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.2 System Identification Based on Two Data Sets A and B . . . . . . 58
4.2.3 Statistical Analysis – Monte-Carlo Simulation . . . . . . . . . . 61
4.2.4 Multiple Starting Point Procedure . . . . . . . . . . . . . . . . . 65
4.3 Dynamic Friction Models – Stribeck Effect . . . . . . . . . . . . . . . . 68
4.3.1 Modeling of Friction . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.2 The LuGre Friction Model . . . . . . . . . . . . . . . . . . . . . 69
4.3.3 The Simulation Model . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.4 System Identification of the LuGre Friction Model . . . . . . . . 78
5 Conclusion 85
List of Figures
1.1 Block diagram model of a system with an input u(t) and an output y(t). . 3
4.1 Local minima of the objective function for the free fall experiment A. . . 43
4.2 Position data and estimate of the object in free fall with drag due to air
resistance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Velocity estimate of the object in free fall with drag due to air resistance. . 45
4.4 Local minima of the objective functions for the free fall experiment A and
B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5 Estimated position trajectories for the free fall experiment based in the
data sets A and B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6 Estimated velocity trajectories for the free fall experiment based in the
data sets A and B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.7 Cumulative distribution function of the results of the Monte-Carlo simu-
lation for the free fall experiment based on one and two data sets A and
B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.8 Histogram of the results of the Monte-Carlo simulation for the free fall
experiment based on one data set A. . . . . . . . . . . . . . . . . . . . . 51
4.9 Histogram of the results of the Monte-Carlo simulation for the free fall
experiment based on two data sets A and B. . . . . . . . . . . . . . . . . 52
vii
LIST OF FIGURES viii
4.29 (v(t), Ff (t)) characteristic for smaller excitation levels u(t) < μ0 mg. . . . 75
4.30 (v(t), Ff (t)) characteristic for higher excitation levels u(t) > μ0 mg. . . . . 75
4.31 Numerically approximated Jacobian for a triangular input signal u(t) with
û = 0.5μ0 mg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.32 Selected components of the numerically approximated Jacobian for a tri-
angular input signal u(t) with û = 0.5μ0 mg. . . . . . . . . . . . . . . . . 77
4.33 Numerically approximated Jacobian for a triangular input signal u(t) with
û = 1.5μ0 mg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.34 Triangular input signal uA (t) with û = 0.5μ0 mg that is applied to the Lu-
Gre friction model on order to generate identification data. . . . . . . . . 79
4.35 Identification data set generated by adding noise to the sampled response
of the exactly parameterized system to the input signal uA (t). . . . . . . . 80
4.36 Identification data set generated by adding noise to the sampled response
of the exactly parametrized system to the input signal uB (t). . . . . . . . 81
4.37 Comparison of the system response to the input signal uA (t) for the exact
and the identified parameter sets. . . . . . . . . . . . . . . . . . . . . . . 84
4.38 Comparison of the system response to the input signal uB (t) for the exact
and the identified parameter sets. . . . . . . . . . . . . . . . . . . . . . . 84
List of Tables
4.1 Results of the system identification based on the data from the free fall
experiment A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Results of the system identification based on the data from the free fall
experiment A and B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Results of the Monte-Carlo simulation of the system identification of the
free fall experiment based on one and two data sets. . . . . . . . . . . . . 50
4.4 Results of the Monte-Carlo simulation of the system identification of the
parameter and the initial conditions of free fall experiment. . . . . . . . . 56
4.5 Results of the Monte-Carlo simulation for the nonlinear mass and spring
system after the removal of outliers. . . . . . . . . . . . . . . . . . . . . 64
x
Chapter 1
Introduction
1
CHAPTER 1. INTRODUCTION 2
output can be compared to the measured data set. This can be done by visual comparison
or limited statistical analysis. In addition to that, separate validation data sets from dif-
ferent experiments can be used to compare the performance of the model under different
operating conditions or the identification data can be split up in a process called cross-
validation.
When dealing with measurement signals perturbed by noise it is of interest to look at the
influence the noise has on the identified model parameters. The influence of gaussian
noise in the data sets as well as the utilization of multiple data sets in order to minimize
the influence of noise will be explored.
u(t) y(t)
S
Figure 1.1: Block diagram model of a system with an input u(t) and an output y(t).
scribing the behaviour of the system. A simple block diagram model of a system S can be
seen in Fig. 1.1.
The main difference between a static and a dynamic system is that for a static system
the output only depends on the current state and input of the system. In that sense a static
system can be described by a simple algebraic equation:
For a dynamic system the observed output is dependent on the past history of the input,
i.e. for any sequence of the input that is applied to the system over time, the system
evolves in a different way. The tool that allows us to encode the effect that the state
of the system and the input to the system at a certain point in time have on the change
of the trajectory is the differential equation. The internal mechanisms of the system are
defined by differential equations describing the interaction of the internal properties in
between themselves as well as their interaction with the external influences. Higher order
differential equations can be transformed into first order ones by assigning additional state
variables. This results in the state space representation of the dynamic system as a system
of first order differential equations [3]:
∂x
= f (x(t), u(t),t). (1.2)
∂t
The output variables y(t) are the selected properties of the system that we are interested
in. They represent the measurements that we are making. The outputs y(t) at any point
in time are related to the external inputs u(t) at that time as well as the internal state of
the system x(t) at that time. This means that once the state space equation is solved and
we know the trajectory of the state variables for the time period that we are interested in,
the measurement variables are derived from a regular equation, the observer equation
Having knowledge about any two of these three attributes of a system results in one of
three distinguished kinds of problems of determining the third property [4].
The first kind of task is the simulation problem. Given a system S and a control input u(t),
find the output y(t). As a forward problem, this is the simplest kind of the three types of
problems that were hinted at above. In general, there exists one unique solution that is the
result of solving an initial value problem (IVP) for the system of differential equations
describing the system S from the initial state of the system x(t0 ) at the initial time t0 over
the time frame of the defined control input u(t).
The second kind of problem is the control problem. Given a system S and a desired
system output, i.e. behaviour, y(t), find the input u(t). This is an inverse problem that
may not have a single unique solution. Depending on the exact problem statement, it will
result in one or multiple correct solutions. In general these kind of problems are more
loosely formulated in respect to the requirements on the desired behaviour of the system.
In many cases it is demanded that a system is brought to a specific state, but the specific
path to take and the exact time frame in which to achieve this are not exactly specified.
In fact there are infinitely many possible paths and corresponding control sequences that
move a system from one state to another [5]. In many cases there are also compromises
to be made. For example in a classical PID controlled system faster rise time needs to be
weighted against the ensuing overshoot and oscillations.
The third and last kind of problem is the system identification problem. Given an input u(t)
and an output y(t) of a system, find a system model S that is able to describe this observed
input-output behaviour. This again is an inverse problem that results in many possible
solutions. The quality of the identified model can be assessed in two ways. On the one
hand the ability to represent or interpolate the given input-output data, i.e. the goodness-
of-fit, is the determining factor in the solution of the system identification problem. On the
other hand the suitability of the identified model to be able to make accurate predictions
for different operating conditions when applied in a simulation problem might be of great
importance. Depending on the intended application of the model, shortcomings in this
suitability for extrapolation would rule out models regardless of the goodness of fit to the
given identification data.
The main argument as to why one should always favour modeling a system as a linear
model rather than a nonlinear one is that linear models are naturally easier to deal with.
As per definition linear systems have a more narrowly defined general structure
Figuratively, if the space of all system models was a sea, the space of linear systems
would just be a little island. Because of both the simplicity and the generality of the
linear system, the focus of research has always been weighted on this side. Furthermore,
many nonlinear problems can only be approached by local approximation of the nonlinear
problem by a linear one. Adding to that, naturally, any approach to deal with some type
of problem may work on simple tasks but break down on more complicated ones.
This has lead to a lot of different ways to analyse and work with linear systems. For once
it is possible to look at linear systems in the time and frequency domain. This is enabled
via the Laplace- and the Z -transformation. The Laplace transform enables to solution
of a certain class of partial differential equations, which correspond to single input linear
time invariant dynamic systems:
∂ ny ∂y ∂ mu ∂u
an n + · · · + a1 + a0 y(t) = bm m + · · · + b1 + b0 u(t) (1.6)
∂t ∂t ∂t ∂t
Y (s) an sn + · · · + a1 s + a 0
H(s) = = . (1.7)
U(s) bm sm + · · · + b1 s + b0
Note that here the initial conditions are set to be zero. The transfer function H(s) is
based on the Laplace transform of the impulse response of the system. Any input u(t)
can be approximated as a sum of discrete impulses. The response of a linear system to
the input u(t) is therefore the superposition of the response to each of these impulses. For
an infinitesimally fine discretization this corresponds to a convolution integral in the time
domain
y(t) = h(t) ∗ u(t) , (1.8)
where h(t) is the response of the system to a unit impulse. In the s domain, the convolution
simplifies to a simple multiplication:
For nonlinear dynamic systems this is less applicable. The input-output behaviour of a
nonlinear dynamic system can not simply be classified by an impulse response like in
linear systems as the principle of superposition does not apply [6].
There are various different solution methods available to solve the IVP for a linear sys-
tem. Besides finding an analytic solution, which is arguably easier for linear systems, any
linear or nonlinear vector field defined as in equation (1.4) or (1.5) can be numerically
integrated. The simplest algorithm to numerically integrate the solution of the IVP is the
Euler integration. More advanced algorithms like Runge-Kutta type methods apply an
iterative approach with varying step size in order to get more accurate results. Another
approach is to discretize the state variables in time and to apply a numerical differentia-
tion matrix in order to approximate the discretized derivatives of each state variable. For
linear systems this transforms the system of linear differential equations into a system of
linear equations, which can be solved directly as a least squares problem with equality
constraints, which enforce the initial conditions [7]. It is therefore called the Global Least
Squares solution method. The power of this solution method lies in the fact that it can also
be used for the solution of the inverse problem [8]. As was recently shown, the Global
Least Squares solution method in conjunction with the variable projection method can be
used in the time domain system identification of linear state-space models [9].
The Global Least Squares solution method can also be applied to the solution of the IVP
for a systems of nonlinear differential equations, but this results in a system of nonlinear
equations and the least squares fit of which needs to be optimized in an iterative manner,
relying on good initial estimates of the solution [10].
For homogeneous i.e. unforced systems of linear differential equations it is also possible
to deploy the exponential matrix, which is based on the analytic solution of a linear sys-
tem, to compute the state of the system at any point in time [11].
When general nonlinear models are considered, they are often represented as a kind of
subclass called hybrid models, e.g. a Wiener model, which consists of a linear system
followed by a static nonlinearity, or a Hammerstein model, which consists of a nonlinear
actuator that feeds into a linear system [12]. This approach comes with the advantage
that techniques for the analysis of linear systems can be applied for the analysis of the
respective part of the hybrid model.
To summarize, while there is a whole array of methods available to deal with linear dy-
namic systems, the general approach to solve the simulation problem for a nonlinear
dynamic systems is an approximation of the solution via numerical integration. Further-
more, the fact that the input-output behaviour of a nonlinear dynamic system cannot be
classified by a single impulse response has consequences on the selection of identification
data for the system identification problem of nonlinear dynamic systems.
Chapter 2
In this chapter some of the core principles of the system identification theory are dis-
cussed. These include an overview of the system identification procedure, the selection
of a model structure whose parameters are to be identified, the experimental design that
is used in order to extract information about the object of interest, the goodness-of-fit cri-
terion that is based on the output error of the model, and the verification of the estimated
model parametrization.
7
CHAPTER 2. SYSTEM IDENTIFICATION THEORY 8
described as
for a general, presumably nonlinear system in state space form. The determination of the
parameters ψ for any given model structure is the system identification problem.
In summary it can be said that the task of system identification of black-box models re-
quires less information about the object of interest and less work leading up to the process
of system identification but also results in a system model that gives less insight into the
system that is to be identified.
the sum of the squared vertical distances can be used as the goodness-of-fit Measure. This
reduces the problem of fitting a function to each point in the given data to the problem of
minimizing a scalar valued optimality criterion ε(ψ):
ε(ψ) can be defined in the following equivalent ways for n measurement points:
n
ε(ψ) = ∑ [ξ (ti ) − y(ti )]2 (2.6)
i=1
ε(ψ) = ξ − y(ψ)22 (2.7)
ε(ψ) = [ξ − y(ψ)]T [ξ − y(ψ)] (2.8)
These two objective functions can be minimized simultaneously by combining them into
2
ξ y(ψ)
A
εAB (ψ) = − . (2.11)
ξB y(ψ) 2
When the results of two different experiments are used, there ought to be two different
functions yA (ψ) and yB (ψ) for the model output.
2
ξ yA (ψ)
A
εAB (ψ) = − (2.12)
ξB yB (ψ) 2
For a system with multiple measured output variables a similar approach can be applied.
CHAPTER 2. SYSTEM IDENTIFICATION THEORY 13
In matrix form this can be represented by a weighting matrix W with diagonal entries wi :
The weighting of a specific model output can be especially useful when performing the
system identification based on multiple data sets. Separate weights can be asserted on the
different data sets in order to balance out the effect that each individual data set has on the
overall objective function value.
2
W 0 ξA yA (ψ)
A
εAB (ψ) = − (2.15)
0 WB ξB yB (ψ) 2
The objective function ε(ψ) is positive definite and would be zero if ξ = y(ψ). In general
this equality will never be satisfied because of noise perturbations δ in the measurement,
that can’t be reproduced by the model [16].
If the parameter set ψ corresponding to the global minimum of equation (2.6) is found,
one might expect the value of the objective function to be the norm of the noise δ22 .
This is true when speaking about the expected value in its statistical meaning. When
performing the system identification based on one set of noisy measurements ξi resulting
in the parameter set ψi , the estimated model output will not optimally interpolate the
exact system output yexct but rather try to approximate the measurement noise as good as
possible, as it is minimizing the sum of squares of [ξi − y(ψi )]. For an unbiased estimator
the expected value is the exact value of the parametrization of the underlying system that
produced the observations. However, because of the limited available measurement data,
the estimated results will deviate from the exact parametrization.
CHAPTER 2. SYSTEM IDENTIFICATION THEORY 14
i.e. any identified parameter set ψ depends on the design model y(ψ) representing the
real system, the experimental design used to extract information about the system repre-
sented by yexct , and the accuracy of the measurement represented by δ.
This brings us to the term model uncertainty. As the disturbance in the measurement
induces a variation of the objective function which results in a variation of the identi-
fied parameters, so does this uncertainty of the parameters affect the output of the model.
Model uncertainty can usually be counteracted by increasing the measurement time, as
this will decrease the signal to noise ratio resulting in less disturbance induced varia-
tion [17]. For gaussian measurement noise the sum of squares optimality criterion is the
maximum likelihood estimator [18]. Under this condition and with infinite amounts of
identification data, the minimization of ε(ψ) would result in the exact parameter set. The
problem is that in practice there will only ever be a finite amount of identification data
available.
As ψ is the results of a numerical optimization of the curve fitting problem, the applied
numerical solution methods also has an effect on this solution. For a well posed curve
fitting problem that is optimized with a correctly set up numerical optimization method it
can be assumed that the influence of a numerical discretization error on the identified pa-
rameter set is at least small compared to a disturbance induced variation if not negligible,
as long as the iterative solution method is able to converge to a global minimizer of ε(ψ).
2.6 Verification
The task of verifying the validity of the model in its purpose of describing the behaviour
of the object of interest largely depends on the operational range in which the simulation
model is to be used. At a bare minimum the simulation model should be able to reproduce
the behaviour observed in the experiments used as a basis for the system identification.
This can be verified by comparing the estimated system output of the system model to the
identification data sets.
Of course, it would be advantageous if the simulation model would make it possible to
accurately predict the behaviour of the system under different operating conditions than
presented in the given identification data of the system. Therefore, it is useful to have a
CHAPTER 2. SYSTEM IDENTIFICATION THEORY 15
verification data set at hand, which was measured in a separate experiment, that was not
included in the identification procedure. The simulated model output under the identified
parametrization for this experimental setup can then be compared to the corresponding
verification data set in order to evaluate the capability of the simulation model not only to
reproduce the observed behaviour presented by the identification data sets but also to be
able to predict the behaviour of the object of interest under different operating conditions.
In general, this procedure is called cross-validation, where only a subset of the available
data is used for the estimation and the rest is used for the validation.
Chapter 3
In this chapter the two main numerical tools that can be used in the system identification
of nonlinear dynamic systems are looked at: The numerical approximation of the solution
of initial value problems and the numerical solution of nonlinear curve fitting problems. A
selected number of the available algorithms for each of these problems are discussed. At
the end of this chapter an overview of one implementation of an algorithm for the system
identification of nonlinear dynamic systems is presented.
∂x
ẋ(t) = = f (x(t), u(t),t). (3.1)
∂t
The problem of finding a solution to this system of first order differential equations for a
control input u(t) and an initial state x(t = t0 ) = x0 is called the initial value problem.
For simple systems, this can be done analytically. However, for most nonlinear systems
there is no known analytic solution and the best approach to solving them is to develop a
numerical approximation which is good enough [19].
Given the inputs u(t) for t ∈ [t0 ,t f ] and the state x(t) of the system at time t0 the solu-
tion of the initial value problem for any vector field as defined in equation (3.1) can be
approximated by numerical integration.
3.1.1 Euler-Integration
Numerical integration is the iterative extrapolation of the state trajectory x(t) from an
initial point x(t0 ) based on gradient information given by ẋ(t) = f (x(t), u(t),t). For a
given u(t) this simplifies to ẋ(t) = f (x(t),t).
Euler integration is one of the simplest numerical integration methods. It is based on the
16
CHAPTER 3. NUMERICAL SOLUTION METHODS 17
approximation of the increment Δx = x(t + Δt) − x(t) with the help of the derivative
ẋ(t) = f (x(t),t). The equation for it can be derived by truncating the Taylor series
expansion of x(t) after the first derivative term:
This is the explicit Euler integration method as the approximation of the state at time
t + Δt explicitly depends on the state of the system at time t.
By expanding the Taylor series from the point t + Δt backwards a slightly different equa-
tion for the approximation of the increment can be derived:
This is an implicit equation in x(t + Δt) and is therefore called the implicit Euler inte-
gration formula. It is a more stable numerical integration method but comes with the
disadvantage of the computational effort to solve this implicit equation for x(t + Δt) in
every step.
On a side note, equation (3.2) corresponds to a forward finite difference formula for the
approximation of the derivative:
For example, the fourth order Runge-Kutta method uses four evaluations of ẋ(t) = f (x(t),t)
for every step Δt [21]:
1 1 1 1
x(t + Δt) ≈ x(t) + k1 + k2 + k3 + k4 . (3.13)
6 3 3 6
In linear curve fitting, a design function y(x) that is to be fitted to the data is a linear
combination of n known basis functions fi (x).
n
y(x, ψ) = ψ1 f1 (x) + · · · + ψn fn (x) = ∑ ψi fi (x) (3.15)
i=1
The parameters that are determined in the solution of the linear curve fitting problem are
the n linear coefficients T
ψ = ψ 1 ψ2 . . . ψ n . (3.16)
For N given data points ξi at their respective locations xi with respect to the independent
variable x, one optimality criterion that can be used for the fit is the sum of the squared
vertical distances between the data points xi and the design function evaluated at those
locations y(xi ), i.e. the vertical distance least squares fit. This means that in order to
achieve the best fit under the assumption of gaussian measurement noise, the following
functional is to be minimized.
N
ε(ψ) = ∑ (ξi − y(xi , ψ))2 (3.17)
i=1
with the discrete values ξi of the data that is to be fitted stacked on top of each other in
the vector T
ξ = ξ (x1 ) ξ (x2 ) . . . ξ (xN ) (3.19)
For a linear combination of known basis functions fi (x), y(ψ) can also be descibed as the
matrix product
y(ψ) = Vψ (3.21)
CHAPTER 3. NUMERICAL SOLUTION METHODS 20
This means that for the linear curve fitting problem the objective function can be described
as
ε(ψ) = [ξ − Vψ]T [ξ − Vψ]. (3.23)
The parameters ψ that lead to the best fit in the vertical distance least squares sense are
the ones that minimize this function.
By differentiating equation (3.23) with respect to ψ and setting the derivative to zero an
extrema can be found. The resulting criteria for this extrema is
ξ = Vψ (3.25)
ψ = V+ ξ (3.26)
The important insight to take from this is that for the linear curve fitting problem an
explicit solution can be formulated that directly leads to the optimal solution. As will be
seen later, for nonlinear problems this is not the case. Iterative solution methods have to
be applied, which – starting from an initial guess – try to minimize the objective function
(3.18) step by step. The progress of the iterative solution method is determined by the
numerical optimization algorithm as well as by the topology of the objective function.
These iterative solvers can of course also be applied to linear problems. To discuss the
topology of the linear curve fitting problem, an example problem is looked at. The design
function that is to be fitted to some data is
y(x, ψ) = ψ1 x2 + ψ2 x. (3.28)
CHAPTER 3. NUMERICAL SOLUTION METHODS 21
The data ξ is generated by evaluating y(x, ψ) for ψexct = [2, 12 ]T at 51 evenly spaced points
xi sampled from the interval x ∈ [−2, 3], i.e. with a step size of Δx = 0.1. Futhermore,
noise δ was added to the measurement data. This noise was sampled from the normal
distribution N (0, 1).
These data points as well as the fitted curve can be seen in Fig. 3.1. As equation (3.28)
has two parameters ψ1 and ψ2 , the value of the goodness-of-fit criterion ε(ψ) can be
visualized as a contour plot for a region of the two dimensional parameter space. This
can be seen in Fig. 3.2 for ψ1 ∈ [0, 5] and ψ2 ∈ [−2.5, 2.5]. As is typical for linear
curve fitting problems with linearly independent basis functions, the cost function forms
an ellipsoidal valley with one global minimum. This is due to the fact that the objective
function ε(ψ) is the result of squaring linear functions in ψ.
CHAPTER 3. NUMERICAL SOLUTION METHODS 22
7000
6000
5000
4000
3000
2000
1000
Figure 3.2: Contour plot of the objective function for a linear curve fitting problem in two
parameters ψ1 and ψ2 .
Any iterative optimization method needs to be initialized with an initial guess ψ0 . The
algorithm will in general compute steps in the parameter space that lead to a decrease in
the objective function value ε(ψ0 ). It is quite clear that for a linear curve fitting problem
with linearly independent basis functions taking downhill steps starting from any initial
point in the parameter space will eventually lead to a convergence towards the one global
minimum due to the convex topology of the objective function as visualized in Fig. 3.2.
For nonlinear curve fitting problems, this is not always the case. There might exist multi-
ple local minima which make convergence to the global minimum difficult. As an exam-
ple the function
y(x, ψ) = sin(ψx) (3.29)
is to be looked at. The data ξ that is to be fitted was generated by evaluating y(x, ψ) for
ψ = ψexct = 3 at 51 evenly spaced points x ∈ [0, 2π]. The objective function ε(ψ) for
this curve fitting problem that can be formulated as in (3.28) evaluated for ψ ∈ [−10, 10]
can be seen in Fig. 3.3. There is clearly one global minimum at ψ = ψexct . But there are
also other local minima. This means that any initial guess that is not near the exact value
might lead to a wrong result as the numerical optimization will converge into the wrong
valley.
Equation (3.29) can be augmented by adding a linear coefficient making it a function with
two parameters
y(x, ψ) = ψ1 sin(ψ2 x). (3.30)
CHAPTER 3. NUMERICAL SOLUTION METHODS 23
Figure 3.3: The objective function for the nonlinear curve fitting problem in one parameter
for y(x) = sin(ψx).
Again, a curve fitting problem can be defined based on data points ξ generated by eval-
uating equation (3.30) for ψ = ψexct = [1, 3] at 51 evenly spaced points x ∈ [0, 2π]. The
objective function ε(ψ) for this example problem is visualized in Fig. 3.4 as a contour
plot for ψ1 ∈ [−4, 4] and ψ2 ∈ [−5, 5]. In this case there are actually two global minimum
at ψ = ψexct and at ψ = −ψexct as sine is an odd function. Making a slice through this
contour plot at ψ1 = 1 would result in the two dimensional plot of ε(ψ1 = 1, ψ2 ) over ψ2
as in Fig. 3.3.
As can be seen in Fig. 3.4 for one dimension and in Fig. 3.4 for two dimensions, the
topology of a nonlinear curve fitting problem might look uncanny on a global scale. But
near a minima it actually is shaped similar to a linear curve fitting problem. This is in fact
true for any neighbourhood around a specific local point in ψ. Any differentiable function
can be approximated by a Taylor series expansion around a local point. This means that
a linear curve fitting problem can be defined, using the linearised model derived from a
first order Taylor series expansion. The objective function of this linear curve fitting prob-
lem will of course look similar to the nonlinear curve fitting problem around the point of
linearisation.
CHAPTER 3. NUMERICAL SOLUTION METHODS 24
600
500
400
300
200
100
Figure 3.4: The objective function for the nonlinear curve fitting problem in two parame-
ters for y(x) = ψ1 sin(ψ2 x).
The data ξ that is to be fitted was generated by evaluating y(x, ψ) for ψ = ψexct = 0.5 at
51 evenly spaced points x ∈ [−2, 3]. The evaluation of the objective function ε(ψ) for
this curve fitting problem for ψ ∈ [−1, 1] can be seen in Fig. 3.5. There is clearly one
global minimum at ψ = ψexct . As can be seen in the figure, this curve fitting problem
results in a convex objective function similar to the linear curve fitting problem in section
3.2.1. Therefore the solution of the numerical optimization of ε(ψ) will hardly depend
on variations of the initial guess ψ0 . The exponential function defined in equation (3.31)
can be expanded by adding a second parameter in the form of a linear coefficient.
As before, a curve fitting problem can be defined based on data points ξ. These data points
are sampled from the solution of equation (3.32) at 51 evenly space points x ∈ [−2, 3] for
ψexct = [2, 0.5]T . The objective function ε(ψ) for this example problem is visualized in
Fig. 3.6 as a contour plot for ψ1 ∈ [−2, 3] and ψ2 ∈ [−1, 1]. As can be seen, ε(ψ) again
appears to be convex, as can be expected by adding a linear coefficient to the already
convex results of the exponential function (3.31), which depends on a single parameter.
As is shown by this example, just because a problem is labelled as nonlinear does not
CHAPTER 3. NUMERICAL SOLUTION METHODS 25
Figure 3.5: The objective function for the nonlinear curve fitting problem in one parameter
for y(x) = eψx .
mean that it needs to be ill behaved. Nevertheless, one has to be mindful of potential local
minima and verify the results of a nonlinear curve fitting problem accordingly.
CHAPTER 3. NUMERICAL SOLUTION METHODS 26
14000
12000
10000
8000
6000
4000
2000
Figure 3.6: The objective function for the nonlinear curve fitting problem in two parame-
ters for y(x) = ψ1 eψ2 x .
The parameters that are to be determined are the constant coefficients, the weights, of the
known basis functions fi (x). The function y(x, ψ) might be nonlinear in the independent
variable x as some of the basis functions are nonlinear in x, but y(x, ψ) is linear in the
parameters ψ.
∂ y(x, ψ)
= const. (3.34)
∂ ψi x=xk
In a nonlinear curve fitting problem y(x, ψ) is a nonlinear function in ψ.
∂ y(x, ψ)
= g(ψ) (3.35)
∂ ψi x=xk
This means that it is not possible to formulate an explicit solution for ψ as it is for the
linear curve fitting problem. Different numerical solution methods can be applied, in order
to minimize the objective function ε(ψ) in an iterative manner. Starting at an initial guess
ψ0 the gradient at that position in the parameter space can be evaluated. Different methods
use this gradient information in different ways in order to derive a step Δψ towards the
CHAPTER 3. NUMERICAL SOLUTION METHODS 27
Gradient Descent
If y(x, ψ) is differentiable with respect to ψ, then the gradient descent or steepest descent
method can be used for the numerical minimization of ε(ψ). Starting from an initial
guess ψ0 for the parameters, an updated parameter set can be found by making a step Δψ
into the negative direction of the gradient of ε(ψ).
∂ ε(ψ)
Δψ = −α (3.36)
∂ ψ ψ=ψ0
this results in
∂ ε(ψ) ∂
= 2[ξ − y(ψ)]T [ξ − y(ψ)] (3.38)
∂ψ ∂ψ
∂ y(ψ)
= −2[ξ − y(ψ)]T (3.39)
∂ψ
= −2[ξ − y(ψ)]T J(ψ) (3.40)
ψ1 = ψ0 + αΔψ0 . (3.43)
The gradient descent method is easy to implement but has the disadvantage that it of-
fers little control of the convergence performance beside scaling the step size with the
coefficient α [23]. Selecting a value that is too small might lead to poor convergence
CHAPTER 3. NUMERICAL SOLUTION METHODS 28
due to making little progress in the parameter space. Selecting a values that is too high
might lead to poor convergence due to overshooting of the minimum or the trough of the
ellipsoidal valley.
Gauss-Newton
The Gauss-Newton method is a nonlinear curve fitting algorithm, which works by ap-
proximating the nonlinear design function y(x, ψ) by a first order Taylor series expansion
around the current point ψ0 in the parameter space. In each iteration a step Δψ is com-
puted, which – according to this linearised model – minimizes the sum of squares opti-
mality criterion.
For any inital guess ψ0 there will be a discrepancy between ξ and y(ψ0 ).
ξ = y(ψ0 ) + Δy (3.44)
The Gauss-Newton methods tries to find a step Δψ based on the first order Taylor series
expansion of y(ψ) around ψ0 , which minimizes this discrepancy.
Which is of course the minimum of the linear curve fitting problem, that arises when using
the linearised model (3.45):
ψ1 = ψ0 + αΔψ. (3.49)
The relaxation factor α is typically set to be α < 1. This is due to the fact that the
linearisation of the nonlinear function y(x, ψ) might only be a good approximation in a
close proximity around ψ0 . Making too large steps αΔψ might lead to poor convergence
due to overshooting past the minimum of ε(ψ).
CHAPTER 3. NUMERICAL SOLUTION METHODS 29
Levenberg-Marquardt
which is equation (3.46) multiplied with JT from the left. In the Levenberg-Marquardt
algorithm this equation is modified by adding a damping term λ I to the matrix product on
the left hand side:
(JT J + λ I)Δψ = JT (ξ − y(ψ0 )) (3.51)
where λ is a scalar and I is a unity matrix of appropriate size. This equation can be derived
by adding a penalty term for the step Δψ in the parameter space to the cost function (3.48)
of the Gauss-Newton method:
Setting the derivative of this cost function with respect to Δψ to zero results in equation
(3.51). Depending on the coefficient λ this leads to a different convergence behaviour.
For large values of λ the algorithm makes small steps in the direction of the steepest
descent
1
Δψ ≈ JT (ψ0 )[ξ − y(ψ0 )]. (3.53)
λ
For small values of λ the algorithm makes steps according to the Gauss-Newton method
The damping coefficient λ does not have fixed values and an important aspect of the
Levenberg-Marquardt algorithm is the way in which it is manipulated during the iteration
procedure. Usually λ is initialized as a larger value to ensure a quick descent from the
initial guess ε(ψ0 ) along the direction of the gradient. Smaller values of λ ensure a
good convergence in the region near a minima, where the Gauss-Newton method shows
quadratic convergence if the objective function value at the minimum is approximately
zero: ε(ψ ) ≈ 0 [26].
In the curve fitting problem a function y(x, ψ) is to be fitted to the data points ξ given
at their locations x = [x1 , . . . , xN ]. Therefore the evaluation of y(x, ψ) at these location is
CHAPTER 3. NUMERICAL SOLUTION METHODS 30
described as the vector valued function y(ψ). The Jacobian J, which is needed in some
algorithms for the numerical optimization of the sum of squares goodness-of-fit criterion
ε(ψ), contains the derivatives of y(ψ) with respect to each of the n parameters ψ.
∂ y(ψ) ∂ y(ψ) ∂ y(ψ)
J(ψ0 ) = ∂ ψ1 ∂ ψ2 . . . ∂ ψn (3.55)
ψ=ψ0 ψ=ψ0 ψ=ψ0
For the system identification problem, y(ψ) – the output of the system model – is not
known analytically. It can merely be evaluated for any selected point in the parameter
space ψ by numerical integration of the initial value problem. Therefore, in order to
apply a gradient based optimization method, a numerical approximation of the Jacobian
J needs to be found.
This is done by applying a finite difference formula and by varying one parameter at a
time for each column of the Jacobian J.
∂ y(ψ) y(ψ0 + δi ) − y(ψ0 )
≈ . . . forward (3.56)
∂ ψi ψ=ψ0 Δψi
y(ψ0 + δi ) − y(ψ0 − δi )
≈ . . . central (3.57)
2Δψi
y(ψ0 ) − y(ψ0 − δi )
≈ . . . backward (3.58)
Δψi
with
δi = Δψi ei (3.59)
The central finite difference formula needs 2n evaluations of y(ψ) for each approximation
of the Jacobian J, where the forward and backward formula suffice with (n + 1) evalua-
tions.
Regardless of whether a one sided or the central formula is used, the numerical approxi-
mation of the Jacobian using finite differences is computationally expensive [27]. As an
alternative the Broyden rank-1 update formula can be used to estimate the Jacobian for a
new parameter based on the known Jacobian for a nearby point in the parameter space.
The disadvantage of this method is, that the successive updates of the Jacobian via Broy-
den’s formula lead to divergence from the real value. Therefore is it advisable to at least
CHAPTER 3. NUMERICAL SOLUTION METHODS 31
after some steps perform a more precise approximation via a finite difference formula
[23].
The variable projection methods was introduced by [30] for the solution of nonlinear least
square problems whose variables separate. This means that the design function y(ψ, x)
of the curve fitting problem can be described as a linear combination of nonlinear basis
functions fi (β, x):
m
y(ψ, x) = y(α, β, x) = ∑ αi fi (β, x). (3.62)
i=1
The problem of minimizing this functional is also called a separable least squares prob-
lem. It can be solved in two steps by applying the variable projection method as shown in
[31]:
As the basis functions fi (β, x) only depend on the set of parameters β, for any choice
of β a matrix V(β) similar to a Vandermonde matrix containing the evaluated nonlinear
basis functions can be defined:
⎡ ⎤
f1 (β, x1 ) . . . fm (β, x1 )
⎢ .. .. ⎥
V(β) = ⎢⎣ .
...
. ⎥.
⎦ (3.64)
f1 (β, xn ) . . . fm (β, xn )
This means that the objective function ε(α, β) of the separable least squares problem can
be written as
ε(α, β) = ξ − V(β)α22 . (3.65)
CHAPTER 3. NUMERICAL SOLUTION METHODS 33
For any β the linear coefficients α can be computed as the solution of a linear least squares
problem
α(β) = V+ (β)ξ. (3.66)
which is a function solely dependent on β. Therefore the two steps of solving the separa-
ble nonlinear least squares problem are:
1.
β = argmin ε(α, β)α=V+ (β)ξ (3.68)
β
2.
α = V+ (β )ξ. (3.69)
The advantage of the variable projection method is that number or parameters, that need
to be estimated in the nonlinear optimization, is reduced. This leads to better computa-
tional efficiency and a greater likelihood of finding a global minimum, rather than a local
one [30, 32].
The variable projection method has found application various disciplines, see for example
[31] for an overview. It was also applied in system identification problems. For example
in [33] it is applied in the identification of a Wiener model, which consists of a discrete
LTI system followed by a static non-linearity. In [15], a nonlinear system is identified as a
composite local linear state space model, a composition of discrete LTI systems, with the
help of the variable projection method. The application in [9] for the system identifica-
tion of a continuous LTI system utilizing the Global Least Squares solution method was
already mentioned in section 1.4.
Unfortunately, the application of the variable projection methods on the system identifica-
tion of nonlinear state space models appears to be severely limited by the fact that without
access to an analytic solution the state trajectory of a nonlinear dynamic system has to be
approximated by a numerical solver. This means that in most cases there is no sum of
basis functions that can be superimposed, no projection matrix that can be formulated
and no separation of variables into ones with linear and nonlinear influence on the basis
functions. It is of course possible to design a nonlinear dynamic system, whose output is
the superposition of two independent dynamic processes. In such a case it would be ad-
vantageous to apply the variable projection method on the system identification problem.
CHAPTER 3. NUMERICAL SOLUTION METHODS 34
However, for the problems that are studied in chapter 4 this is not the case.
ψi
= 10n , n >> 1 (3.70)
ψj
If the general range or order of magnitude of some parameters values is known then a
simple linear transformation can be applied in order to put them on a similar scale. For
example when ψ̂ i is the expected value of ψi , then a parameter transformation which
projects ψ̂ i → 1 but keeps zero at zero could be used:
0 ψ̂ i → 0 1 (3.71)
When ψmin are the minimum values of the parameters ψ and ψmax are the maximum
values, then each parameter can be projected onto the interval [0, 1] by
ψ − ψmin
φ(ψ) = . (3.72)
ψmax − ψmin
The above case where ψi = 0 remains at ψi = 0 can of course be achieved by setting ψmin
to be the zero vector.
A normalization or rescaling of the parameter space can help with the convergence of nu-
merical optimizations methods. One convergence criteria for such an algorithm is often
defined to be a lower limit of the acceptable step size Δψ. If the exact value of one of
the parameters is of smaller or equal order compared to the smallest accepted step size,
then that parameter might be identified less precisely. The extent of this numerical dis-
cretization error of course also depends on the influence that variations of this parameter
have on the system output and on the noise disturbance in the identification data.
One way of counteracting this is to set the mentioned convergence criteria to smaller
values. This can be imagined as using a scope to look at the topology of the objective
function in the neighbourhood of the minimum in more detail. But a rescaling of the
one problematic parameter takes on the problem in a slightly different way. Instead of
investing more computational resources in order to essentially resolve the whole param-
eter space on a smaller scale, the parameter space is stretched in that one problematic
dimension to make convergence easier.
CHAPTER 3. NUMERICAL SOLUTION METHODS 35
In the system identification problem the parameters of a system model are to be deter-
mined such that the behaviour of the model reflects the observed behaviour of the system.
This problem can be solved numerically by utilizing numerical integration and numerical
optimization methods.
The system model is given by the state-space and observer equations, e.g. for a single
input single output system:
or in the case that it is given by higher order differential equations, it needs to be trans-
formed into a state-space representation.
The given n observational data points
T
ξ = ξ0 , ξ1 , . . . , ξn−1 , (3.75)
which correspond to a given input sequence u(t), were measured at the given points in
time T
t = t0 ,t2 , . . . ,tn−1 . (3.76)
This means that for any set of system parameters p and initial conditions x0 = x(t0 ) a
numerical solution method can be applied in order to compute the response of the system
model to the given input sequence u(t) at these points in time t:
T
y(ψ) = y(ψ,t0 ), y(ψ,t1 ), . . . , y(ψ,tn−1 ) , (3.77)
where ψ contains the model parameters p and the initial conditions x0 that are to be de-
termined in the system identification. If all the initial conditions are assumed to be known,
then ψ = p.
The goal here is to define a function in MATLAB® , which – provided with the input
of any value for the set of parameters ψ that are to be identified – returns an estimate
of the system output y(ψ) for this parametrization. Any ODE-solver can be applied
for the solution of the initial value problem for equation (3.73). For general problems
the MATLAB® function ode45() can be applied. For the solution of stiff systems
ode23s() might be more advantageous. The observer equation (3.74) can then be used
to extract the system output y(ψ) from the state trajectories [x1 (ψ), x2 (ψ) ...]. Once the
system output y(ψ) is defined as a function in the parameters ψ that are to be identified,
CHAPTER 3. NUMERICAL SOLUTION METHODS 36
it is also possible to implement a function which approximates the jacobian J(ψ) of y(ψ)
by the use of finite difference formulae as described in section 3.2.2. This is necessary in
the case where a gradient based numerical optimization technique like the Gauss-Newton
or Levenberg-Marquardt algorithm is applied for the solution of the nonlinear curve fitting
problem. These algorithms usually have the sum of squares optimality criterion imple-
mented in their code and require the identification data ξ, the function that is to be fit to
that data y(ψ), the gradient information in the form of J(ψ) as well as an initial guess
for the parameter set ψ0 as input. The Nelder-Mead Simplex method implemented in the
MATLAB® function fminsearch() only requires the objective function ε(ψ) and an
initial guess ψ0 as an input. In this case the approximation of the gradient is not required
but the objective function needs to be implemented in MATLAB® utilizing the previously
defined function which returns y(ψ):
There are different numerical solution methods for ODEs as well as different numerical
optimization methods for nonlinear curve fitting problems, some of which were men-
tioned previously. Any reader of this thesis should feel encouraged to combine any ODE
solver and any numerical optimization technique of their liking into an algorithm for the
system identification of nonlinear dynamic systems.
To summarize, here is a short overview of the implemented system identification proce-
dure:
1. Define the identification data [ξ, t] as well as the corresponding system input u(t).
3. Determine whether the initial conditions of the experiment x0 = x(t0 ) are known
or to be estimated
(a) if x0 is known: ψ = p
(b) if x0 is to be estimated: ψ = [pT , xT0 ]T
8. Perform the numerical optimization by providing the selected algorithm with the
necessary inputs: ε(ψ) and ψ0 or ξ, y(ψ) and ψ0 .
This should result in the identified parameter set ψ .
Case Studies
A special case of equation (4.1) can be obtained by setting the coefficient functions q(t)
and r(t) as well as the disturbing function p(t) to constant values. This results in the
simplified description of a body in vertical free fall. The state variable was changed from
x to v, as it in this case describes the velocity of a body of mass m subject to constant
38
CHAPTER 4. CASE STUDIES 39
gravitational acceleration g and a friction force proportional to the square of the velocity
v2 .
c
v̇ − v2 = −g (4.2)
m
The general solution to this equation is known and can for example be found in [34]:
ert − ke−rt
v(t) = vlim (4.3)
ert + ke−rt
with the terminal velocity vlim as the velocity at the stationary state where the gravitational
and the frictional force cancel each other out:
mg
vlim = − and (4.4)
c
cg
r= . (4.5)
m
v0 + vlim tanh rt
v(t) = . (4.6)
1 + vvlim
0
tanh rt
Integrating this function will results in the analytic solution for the position s(t).
t
s(t) = s0 + v(t)dt (4.7)
0
vlim 2 gt v0 gt
s(t) = s0 − cosh + sinh (4.8)
g vlim vlim vlim
with s0 = s(t = 0) (4.9)
These nonlinear equations (4.6) and (4.8) as the solution to the problem of the body in
free fall show that even such a relatively simple nonlinear dynamic system might present
an interesting object of study for the task of system identification.
ms̈ = ∑ Fi = Ff − Fg (4.10)
ms̈ = −cṡ2 sign(ṡ) − mg (4.11)
As the position of the mass s(t) isn’t explicitly present in the equation, a change of vari-
ables can be performed. This results in an equation in the velocity, similar to equation
(4.2).
ṡ = v (4.12)
s̈ = v̇ (4.13)
mv̇ = −cv2 sign(v) − mg (4.14)
Whether the equation of motion (4.10) depending on the position s(t) or equation (4.12)
depending on the velocity v(t) is used for the purpose of system identification, depends
on what measurement data is available. The transformation of (4.10) into state-space
representation with x1 (t) as the position and x2 (t) as the velocity can be used in any case:
ẋ1 = x2 (4.15)
ẋ2 = −cx22 sign(x2 ) − mg. (4.16)
These two first order differential equations are combined into matrix form:
ẋ1 (t) x2 (t)
ẋ(t) = = . (4.17)
ẋ2 (t) −cx22 sign(x2 ) − mg
with c11 = 1 and c22 = 0 for the output y(t) = x1 (t) or c11 = 0 and c22 = 1 for the output
y(t) = x2 (t).
basis for the parameter estimation. It is assumed that both the initial position s0 = h and
the initial velocity v0 as well as the mass m of the object and the gravitational acceleration
g are known. The only parameter that is to be determined is the drag coefficient c. It is
also assumed that the object is falling vertically straight down and doesn’t tumble during
the fall so that the drag coefficient remains constant over time.
One way of tackling this problem would be to find a model that can estimate the time it
takes for the mass to fall down the height h. This would mean that we are looking for the
solution to the boundary value problem
s(t0 ) = h t0 = 0 (4.19)
s(t f ) = 0 t f . . . extracted from the measurement (4.20)
The disadvantage of this is that we are only taking one measurement, the final time t f into
consideration.
Another approach would be to design the experiment in a way that the object reaches the
stationary state of terminal velocity vlim . If the terminal velocity can be estimated, then it
is trivial to estimate the drag coefficient from that using equation (4.4).
However, in the following section the system identification approach of minimizing the
output error of the model in relation to the observed position measurement as described
in chapter 2 will be applied. As such we want to minimize the functional
The position measurements ξ (ti ) observed at the discrete points in time ti are contained
in the vector ξ. The parameter ψ to be estimated is the drag coefficient c. The model out-
put y(ψ,t) can be computed in two ways. For any experimental data ξ, where the initial
velocity v0 ≤ 0, the analytic solution (4.8) can be used. Alternatively the solution to the
differential equation in state-space representation (4.17) can be computed by numerical
integration of the initial value problem. For other problems involving nonlinear differen-
tial equations there might not be any analytic solution available. Therefore the path of
applying a numerical solution method will also be pursued in this case. The analytic solu-
tion might still come in handy in case it is necessary to confirm the accuracy of numerical
approximation of the solution of the initial value problem.
The specific experiment designed to extract information about the drag coefficient c =
ψ of an object of mass m = 10 kg is a drop from the height h = 30 m with an initial
velocity v0 = 0 and subject to the gravitational constant of acceleration g = 9.81 m/s2 .
CHAPTER 4. CASE STUDIES 42
The observed measurement is the position ξ (t) ≈ s(t). The exact position trajectory was
generated from the analytic solution (4.8) for the exact parameter ψexct = 0.5 kg/m. This
exact system output yexct (t) is sampled every ts = 0.1 s over the time interval [t0 ,t f ] = [0, 3]
resulting in the vector yexct = y(ψexct ). Gaussian measurement noise δ that is added
to this measurement is generated by sampling pseudorandom values from the normal
distribution N (μ, σ ). These disturbances are selected to be rather large with standard
deviation of σ = 1 and zero mean μ = 0. Any experiment i that is performed this way
is assumed to result in an observed system output based on the same exact solution yexct
disturbed by a different sequence of independent random variables δi sampled from the
same normal distribution.
ξi = yexct + δi (4.22)
ξA = yexct + δA . (4.23)
Based on equation (2.7) the cost function of the system identification problem based on
experiment A can be defined as
For problems like this, which involve only one parameter ψ that is to be estimated, it is
possible to visualize the objective function ε(ψ) over a selected region of the parameter
space. Figure 4.1 shows the cost function εA (ξA , ψ) based on experiment A as well as the
theoretical cost function ε(yexct , ψ) based on a measurement without any noise distur-
bances. Both are evaluated for ψ ∈ [0, 1]. As can be seen, for this problem ε(ψ) appears
to be a convex function with one global minimum. The location of this minima is found
by numerical optimization of the objective function.
Table 4.1 shows the results of the numerical optimization of the above cost function using
the Simplex and the Gauss-Newton method. y(ψ) was computed from the state space
representation (4.17) by numerical integration using the Runge-Kutta method. For the
solution using the Gauss-Newton method the derivative of y(ψ) with respect to ψ was
computed as a numerical approximation using a finite difference formula. The initial star-
CHAPTER 4. CASE STUDIES 43
Figure 4.1: Local minima of the objective function for the free fall experiment A.
ing point was set at ψ0 = 0. Because of the mentioned convex form of the cost function,
convergence to the minimum from that initial point in the parameter space isn’t a prob-
lem. The deviation of these two results is not of special relevance here. It is due to the
different path in the parameter space that the applied numerical methods take during their
iterative solution procedure and due to the different convergence criteria that stop the it-
eration. The important result is that both methods lead to approximately the same result
which deviates from the exact parameter ψexct = 0.5 kg/m. This error in the identified
parameter is not due to wrong convergence or getting trapped in a wrong local minima.
The global minimum of the objective function ε(ξA , ψ) actually lies at the identified lo-
cation ψA . The error in the identified parameter with respect to the exact one is due to the
disturbance induced variation of the cost function. This is also visible in Fig. 4.1. The
minima of each curve, which are marked by a circle, are not the same. The large deviation
is to be expected for the large noise level that was purposefully selected to achieve this
clear distinction.
Table 4.1: Results of the system identification based on the data from the free fall experi-
ment A.
Fig. 4.2 shows the exact system output yexct (t), the noise perturbed observations ξA
and the estimated output of the model y(ψA ,t). As the exact parametrization is known, it
is also possible to compare the respective velocity trajectories x2 (ψexct ,t) and x2 (ψA ,t),
CHAPTER 4. CASE STUDIES 44
Figure 4.2: Position data and estimate of the object in free fall with drag due to air resis-
tance.
which are shown in Fig. 4.3. The portrayed performance of the model seems to be the best
that is possible, based in the data that is given. Ways of improving the model uncertainty
would be to improve the measurement accuracy or to reduce the sample time ts or to
increase the measurement interval [t0 ,t f ] by increasing t f in the hope of reducing the
signal to noise ratio in order to reduce the disturbance induced variation of the identified
parameters [17]. This implies changing the experimental setup in a way to be able to make
better or more observations. Another way of improving the model uncertainty without
changing the experimental setup is explored in the next section.
CHAPTER 4. CASE STUDIES 45
Figure 4.3: Velocity estimate of the object in free fall with drag due to air resistance.
ξA = yexct + δA (4.27)
ξB = yexct + δB (4.28)
to be available for the estimation of the drag coefficient ψ. The cost function
can be defined for the measurement ξB like equation (4.25). In a similar way, a cost
function for the simultaneous vertical distance least squares fit to both data sets ξA and ξB
can be defined by stacking the measurement vectors on top of each other.
or
ψ1 < ψ2 < ψexct (4.35)
or
ψ1 < ψexct < ψ2 (4.36)
where ψ1 is the smaller and ψ2 is the larger value of ψA and ψB . As we are necessarily
combining a better and a worse measurement, it can be expected that ψA,B is at least
a better parametrization than the parameter identified based on the worse measurement.
The question that arises from this is whether ψA,B is always in between ψA and ψB . If
true, this would mean that ψA,B is a kind of average result of experiment A and B and that
only in the case of (4.36) could ψA,B possibly be better than both ψA and ψB . However,
without knowledge about the exact parameter ψexct , which wouldn’t be available in a real
system identification problem, the only distinction that can be made between ψA and ψB
is that one of them is larger and the other one smaller. In this case ψA,B would still be the
best guess of a parametrization that depicts the behaviour of the system most accurately.
As ξA and ξB contain measurements samples drawn at the same point in time in reference
to the time interval [t0 ,t f ] of the experiment, another data set can be generated by taking
the mean of those two measurement sets and defining a fourth cost function accordingly.
ξA + ξB
ξAB = (4.37)
2
εAB (ψ) = ε(ξAB , ψ) (4.38)
ψAB = argmin εAB (ψ) (4.39)
ψ
As was presented in Fig. 4.1 for the first data set ξA , Fig. 4.4 shows the cost functions
εA , εB , εA,B and εAB for the region ψ ∈ [0, 1] of the parameter space. For visual clarity the
vertical axis is set to a logarithmic scale. For the system identification procedure, equation
(4.17) was again numerically integrated as described in section 4.1.4. The values of the
parameters that were identified using the Gauss-Newton method can be seen in table 4.2.
The resulting estimated state trajectories based on the identified parameters can be seen
in Fig. 4.5 and 4.6.
CHAPTER 4. CASE STUDIES 47
Table 4.2: Results of the system identification based on the data from the free fall experi-
ment A and B.
As it turns out ψA,B is identical to ψAB with respect to not only the six digits displayed
here but up to the eleventh digit. To analyse this result, we will look at one step of the
Gauss-Newton minimization of the cost function εAB (ψ) that was defined in (4.32).
ξA − y(ψi ) J(ψi )
= Δψi (4.40)
ξB − y(ψi ) J(ψi )
Solving this system of equations is the same as simultaneously solving the two systems
of equations
which is a step in the Gauss-Newton minimization of the cost function εAB (ψ). Therefore,
each step in the minimization of εA,B (ψ) is the same as in the minimization of εAB . This
implies that both iterative solution procedures should lead to the same results, as has been
observed. Deviations from this identical result might occur due to different activation of
the convergence criteria. While the minima of εA,B (ψ) and εAB (ψ) lie at the same place
in the parameter space ψ, the value of εA,B (ψ) = εAB (ψ) as can be seen in Fig. 4.4.
Therefore, a limit set on the minimum allowed change of the objective function value
might trigger the stop of the iterative solution method for one but not the other.
To summarize, this means that fitting a model to multiple measurement data sets which
are based on the same experimental setup is the same as fitting the model to the mean of
those data sets.
CHAPTER 4. CASE STUDIES 48
Figure 4.4: Local minima of the objective functions for the free fall experiment A and B.
Figure 4.5: Estimated position trajectories for the free fall experiment based in the data
sets A and B.
CHAPTER 4. CASE STUDIES 49
Figure 4.6: Estimated velocity trajectories for the free fall experiment based in the data
sets A and B.
x−x
x̃ = . (4.46)
sx
CHAPTER 4. CASE STUDIES 50
Table 4.3 shows the resulting p-values pKS (ψi ) in addition to the sample mean ψ̄ i and
sample standard deviation sψi corresponding each set of identified parameters ψA , ψB ,
ψA,B and ψAB . In particular the results for ψA and ψA,B can be seen in Fig. 4.7, where
they are visualized in the form of empirical and normal cumulative distribution functions
(CDF). In addition to that Fig. 4.8 and 4.9 show the comparison of a histogram and the
normal distribution computed with the sample mean and sample standard deviation for
each of these two identified parameter sets. The important insight that is to be taken from
this is that gaussian measurement noise apparently results in a normal distribution of the
identified parameters due to disturbance induced variation. This is also confirmed by the
Kolmogorov-Smirnov tests for a significance level of α = 0.05 < pKS (ψi ).
As is exhaustively demonstrated – at least for this example problem – normally distributed
disturbances lead to normally distributed identified parameters. One thing to note is that
the results obtained here are based on the numerical optimization of the objective function.
This means that the error of the identified parameter ψ compared to the exact one ψexct is
not only due to the disturbance induced variation of the objective function ε(ψ) but also
due to the numerical accuracy of the solution method. However, as long as the disturbance
induced variation is large compared to the numerical accuracy, the identified parameters
should remain to be approximately normally distributed.
ψA ψB ψA,B ψAB
ψ̄ i 0.500277 0.500331 0.500041 0.500040
sψi 0.0258864 0.0262175 0.018495 0.0184945
pKS 0.256843 0.142756 0.424425 0.42518
Table 4.3: Results of the Monte-Carlo simulation of the system identification of the free
fall experiment based on one and two data sets.
Figure 4.7: Cumulative distribution function of the results of the Monte-Carlo simulation
for the free fall experiment based on one and two data sets A and B.
Figure 4.8: Histogram of the results of the Monte-Carlo simulation for the free fall exper-
iment based on one data set A.
CHAPTER 4. CASE STUDIES 52
Figure 4.9: Histogram of the results of the Monte-Carlo simulation for the free fall exper-
iment based on two data sets A and B.
k
sψ (nexp ) = √ (4.47)
nexp
to the sample standard deviations sψ (nexp ) that resulted in the coefficient kLSQ = 0.0259 ≈
sψ (nexp = 1) can also be seen in the figure. This function seems to coincide with the
equation
sψ (nexp = 1)
sψ (nexp ) = √ (4.48)
nexp
which is to be expected under the hypothesis that the least squares fit of one function to
multiple data sets is equal to the fit of the function to the mean of those data sets. In this
case the disturbance induced variation is reduced by the same factor √n1exp , which follows
from the reduction of the variation of the mean of the noise disturbances based on nexp
CHAPTER 4. CASE STUDIES 53
Figure 4.10: Expected standard deviation due to disturbance induced variation of the
parameter ψ when combining multiple data sets. The results are based on a Monte-Carlo
simulation.
The set of all parameters that are to be identified are combined into one column vector ψ.
⎡ ⎤ ⎡ ⎤
ψ1 c
⎢ ⎥ ⎢ ⎥
ψ = ⎣ψ2 ⎦ = ⎣s0 ⎦ (4.49)
ψ3 v0
The new cost function defined for the system identification problem based on one data set
ξA is now dependent on these three parameters ψ.
This makes it more difficult to analyse the topology of the cost function in order get an
insight about potential local minima. If one of the parameters is set to a fixed value or
expressed as a combination of the other two, the objective function can be visualized as
a three dimensional surface plot over a two dimensional region of the parameter space.
This can be seen as a contour plot of ε(ψ1 , ψ2 ) for ψ1 ∈ [0, 1], ψ2 ∈ [25, 35] and ψ3 =
0 in Fig. 4.11 and for ψ1 = 0.5, ψ2 ∈ [25, 35] and ψ3 ∈ [−5, 5] in Fig. 4.12. These
two visualizations only represent two orthogonal slices through the three dimensional
parameter space. Nevertheless, the objective function appears to be monotonically falling
towards one global minimum. This means that neither larger deviations in the initial guess
for the parameter c nor for the initial conditions s0 or v0 should affect the results of the
identified parameter set.
To evaluate the effect that the addition of the initial conditions to the parameter set has on
the results of the drag coefficient, another Monte-Carlo simulation was performed based
on the same data sets as in section 4.1.4. As the initial condition were previously fixed at
the exact values and are now free to be estimated, it is to be expected that the disturbance
induced variation of the initial conditions will also have a negative effect on the estimated
drag coefficient c. The numerical optimization of εA (ψ) will use the freedom in the initial
conditions to get a better fit on the noisy ξA which results in a worse fit to the exact
system output yexct compared to the optimization of the cost function that was fixed to
the exact initial conditions as in section 4.1.4. The sample mean and standard deviation
of the identified parameters based on the results of N = 1000 identification procedures
are shown in table 4.4. Also listed are the p-values pKS of the Kolmogorov-Smirnov tests
performed on the normalized results.
As was expected, the standard deviation of the drag coefficient sψ1 is larger than in the
results of the Monte-Carlo simulation in section 4.1.4, where only c was identified. This
leads to the conclusion that it might be better to fix the initial conditions of the simulation
model to a best estimate based on the experimental setup instead of including them into
the identification parameter set, if this best estimate is more precise than the disturbance
CHAPTER 4. CASE STUDIES 55
2500
2000
1500
1000
500
Figure 4.11: Contour plot of the objective function ε(ψ) evaluated for a two dimensional
slice of the parameter space: ψ1 ∈ [0, 1], ψ2 ∈ [25, 35] and ψ3 = 0.
3500
3000
2500
2000
1500
1000
500
Figure 4.12: Contour plot of the objective function ε(ψ) evaluated for a two dimensional
slice of the parameter space: ψ1 = 0.5, ψ2 ∈ [25, 35] and ψ3 ∈ [−5, 5].
CHAPTER 4. CASE STUDIES 56
induced variation cause by the additional freedom of the objective function. When dealing
with more accurate measurement data, the disturbance induced variation is less impactful.
Therefore it might be less likely to find a better estimate for the initial conditions based on
the experimental setup compared to the system identification of those initial conditions.
Table 4.4: Results of the Monte-Carlo simulation of the system identification of the pa-
rameter and the initial conditions of free fall experiment.
This seemingly simple system can be used to investigate a range of different nonlinear
systems. Both the force of the spring Fsp as well as the friction force Ff can be modeled as
linear or with varying degrees of nonlinearity [35]. For example the following equations
describe a linear, a hardening, and a softening spring respectively.
Fsp = kx (4.53)
Fsp = k[1 + (hx)2 ]x (4.54)
Fsp = k[1 − (sx)2 ]x (4.55)
This nonlinear behaviour is illustrated in Fig. 4.13 for a spring coefficient k = 64 N/m
and a hardening or softening coefficient of h = s = 2 m−1 respectively. Of course, any
other relation between the displacement of the mass x and the spring force Fsp in form of
a function could be used as well.
The general form of the so called Duffing equation can be formulated as
CHAPTER 4. CASE STUDIES 57
Figure 4.13: Spring characteristic for a linear, a hardening and a softening spring.
Assuming that the friction force is linear viscous Ff = bẋ(t) results in an equivalent equa-
tion. For example
mẍ + k[1 + (hx)2 ]x + bẋ = F. (4.57)
and any input u(t) an approximate solution of the initial value problem can be computed
by numerical integration, e.g. by applying the MATLAB® function ode45().
The system output is sampled every tstep = 0.02 s. Each experiment lasts ten seconds:
[t0 , t f ] = [0 s, 10 s].
The parameters that are to be identified are the model parameters p as well as the initial
conditions of the respective experiment x0,A and x0,B . This can be summarized as an
identification parameter vector
T
ψ = pT xT0,A xT0,B . (4.64)
The synthetic measurement data sets ξA and ξB are generated by adding Gaussian mea-
surement noise, sampled from the normal distribution N (μ = 0, σ = 0.01), to the re-
sponse of the system to uA (t) and uB (t) under the exact parametrization
T T
pexct = kexct hexct bexct = 64 N/m 2 m−1 0.1 kg/s (4.65)
The objective functions that are to be minimized for the purpose of system identification
CHAPTER 4. CASE STUDIES 59
Figure 4.14: System inputs uA (t) and uB (t) for the experiments A and B simulated on the
mass and hardening spring system.
Fig. 4.15 shows the identification data set ξA as well as the exact system output yA (ψexct )
on which that data is based on and a comparison of the estimated system outputs yA (ψA ),
yA (ψB ) and yA (ψAB ) based on the results of the system identification procedure. Fig.
4.16 shows the same for experiment B. As can be seen, the estimated system output of
CHAPTER 4. CASE STUDIES 60
Figure 4.15: Comparison of the estimated system outputs of experiment A based on the
results ψA , ψA and ψAB of the system identification.
yA (ψA ) and yB (ψB ) fit rather nicely to the respective output of the exactly parametrized
system. However, the estimate of yA based on ψB as well as the estimate of yB based on
ψA results in rather poor performance. The estimates based on ψAB for each experiment
are similar to both yA (ψA ) and yB (ψB ).
CHAPTER 4. CASE STUDIES 61
Figure 4.16: Comparison of the estimated system outputs of experiment B based on the
results ψA , ψA and ψAB of the system identification.
As was already mentioned in section 3.2.1 and section 4.1, a visual analysis of the topol-
ogy of an objective function ε(ψ) in a more than two dimensional parameter spaces ψ
is difficult. However, conclusions can be drawn from the results of the numerical op-
timization, especially when combining a larger number of results, as for example after
CHAPTER 4. CASE STUDIES 62
Figure 4.19: Comparison of the results of the Monte-Carlo simulation in the form of
histograms for the identified parameter k based on the identification data generated from
experiment A and B as well as both of them combined.
Figure 4.20: Histogram of the identified parameter kAB after the removal of outliers.
histogram of kAB shows hardly any outliers. Based on the histogram of kAB shown in Fig.
4.19 a range of 62 N/m < kAB,s < 66 N/m can be selected, which does not seem to contain
any vast outliers. ns = 995 of the n = 1000 results of the Monte-Carlo simulation fall into
that range. Fig. 4.20 shows a detailed histogram of these selected parameters kAB,s as well
as the normal distribution based on the sample mean k̄AB,s and sample standard deviation
skAB,s .
When trying to differentiate between outliers due to wrong convergence and parame-
ter induced variation a wider range has to be selected for the results of kA and kB , e.g.
60 N/m < kA,s , kB,s < 68 N/m. Table 4.5 shows the results of a Kolmogorov-Smirnov test
performed on the normalized data set of the selected parameter sets kA,s , kB,s and kAB,s in
the form of p-values pKS . The data sets were normalized according to the sample mean
k̄ and sample standard deviation sk , which are also shown in the table below. Performing
the test on the whole parameter sets including all outliers failed for kA , kB as well as for
kAB (pKS << 1%).
Table 4.5: Results of the Monte-Carlo simulation for the nonlinear mass and spring system
after the removal of outliers.
CHAPTER 4. CASE STUDIES 65
Figure 4.21: Comparison of the system response to the verification test case with input
uV (t). The system is parameterized according to the selected results ψA,s , ψB,s and ψAB,s
as well as ψexct .
Fig. 4.21 shows a qualitative visualisation of the fact that the identification based on
two data sets ξA and ξB leads to better results than the identification based on a single data
set. Here the system response to a verification test case is presented. The input uv (t) is a
multi-step function ⎧
⎪
⎪ 10 if t < 3f
t
⎪
⎨
uv (t) = −10 if t3f ≤ t < 2t3f (4.74)
⎪
⎪
⎪
⎩ 20 if t ≥ 2t f
3
for t ∈ [t0 , t f ]. The presented system responses correspond to all the parameter sets ψA , ψB
and ψAB , which where identified in the Monte-Carlo simulation excluding those outliers
that were detected in the previous look at the histogram of the identified spring stiffness
k. As can be seen, the estimation of the system output yv (ψAB ) results in the best approx-
imation of the behaviour of the exactly parametrized system, even though the excitation
level of this verification test case lies further above experiment A and B.
another try at the parameter estimation after selecting a new, potentially better initial
guess. For system identification problems, where it is difficult to come up with a well-
founded initial guess ψ0 , a multiple starting point procedure can be applied. This means
that a set of initial guesses is selected and the results of each of those can be used in order
to come to a consensus on local and global minima or to at least achieve a single correct
convergence to the global minimum.
In this section the application of a multiple staring point procedure for the identification of
the nonlinear mass and spring system based on two experiments A and B is investigated.
For this purpose the same data sets ξA and ξB as in section 4.2.2 are used. The parameters
that are to be determined are
T
ψ = k h b x0,A ẋ0,A x0,B ẋ0,B . (4.75)
This means that there are 784 different combinations for the initial guess ψ0 in the nu-
merical minimization of εAB (ψ), as defined in equation 4.69.
In this example problem every initial guess ψ0,i lead to convergence to an identified pa-
rameter set ψAB,i . This does not need to be the case in every problem. For example the
ODE-solver might fail to compute a solution for certain parameter sets. This needs to be
handled in the algorithm of the multiple starting point procedure.
In order to evaluate the goodness-of-fit of each results, the objective function value
εAB (ψAB ) is looked at. Fig. 4.22 presents εAB (ψAB,i ) on a logarithmically scaled ver-
tical axis over the the index i which corresponds to a unique initial guess ψ0,i that lead
to the result ψAB,i . As can be seen, there are accumulations of results on different lev-
els of the objective function value. These probably correspond to different local minima.
A majority of the results seems to approximately lie on the same lowest level which is
clearly separated from the other results. Therefore, a simple way of trying to distinguish
between parameter sets that correspond to the global minimum of εAB (ψ) and ones that
belong to different local minima would be to put a threshold on the goodness-of-fit. In this
case a value of εmax = 0.07 is selected. This results in 682 out of the 784 parameter sets
CHAPTER 4. CASE STUDIES 67
Figure 4.22: Comparison of the objective function values ε(ψAB ) for each identified pa-
rameter set ψAB based on a different initial guess that lead to a convergence of the numer-
ical optimization.
which correspond to ε(ψ) < εthresh . Fig. 4.23 shows the estimated system output based
on these selected parameters sets compared to the identification data and the response of
the exactly parametrized system. As can be seen, all identified parameter sets with objec-
tive function value below the selected threshold results in a practically identical system
behaviour.
CHAPTER 4. CASE STUDIES 68
Figure 4.23: Comparison of the simulation output of experiment A and B for all 682
identified parameter sets corresponding to ε(ψ) < εthresh compared to the identification
data.
Only once the body is in motion, a simpler description is given by the dynamic friction
coefficient μ < μ0 :
Ff = μFN sign(ẋ) if ẋ = 0. (4.84)
If a moving body is in contact with a viscous medium like gases of fluids, a nonlinear
viscous friction term can be added:
which simplifies to
Ff = bẋ (4.86)
for linear viscous friction (δẋ = 1). The combination of these effects is defined in the
Stribeck friction model:
Ff = μFn sign(ẋ) + bẋ + Fs (ẋ) (4.87)
where Fs (ẋ) is a function describing the Stribeck effect [37]. In some simulation environ-
ments the characteristic (v, Ff ) curve resulting from the Stribeck effect is implemented as
a lookup table [38]. The accurate representation in the form of mathematical models has
been attempted in different ways. One of which is the LuGre friction model described in
the next section.
By further adding a linear viscous friction term proportional to the velocity of the body
v(t) as in equation (4.86) the dynamic friction force acting on the body is described as
∂z |v|
ż(t) = = v(t) − z(t) (4.90)
∂t g(v)
CHAPTER 4. CASE STUDIES 70
with g(v) describing the transition from static to dynamic Coulomb friction as an expo-
nentially decaying function with respect to v(t)2 :
v 2
σ0 g(v) = FC + (FS − FC )e−( vs ) (4.91)
where FS = μ0 FN is the static friction force and FC = μFN is the dynamic Coulomb friction
force. The Stribeck velocity vs determines the rate at which g(v) transitions from σFS0 to FσC0
with increasing velocity v(t).
where x(t) is the position of the mass m. The friction force Ff is modelled according
to the LuGre model including the linear viscous term. The exact system parameters are
assumed to be
m = 1 kg . . . mass
N
σ0 = 1 × 105 . . . stiffness of the bristle
m
m
v0 = 0.01 . . . Stribeck velocity
s
kg
b = 0.4 . . . linear viscous friction coefficient
s
μ0 = 0.15 . . . static Coulomb friction coefficient
μ = 0.10 . . . dynamic Coulomb friction coefficient
The damping coefficient σ1 of the bristle is set to critical damping as suggested by [39]:
σ0
σ1 = 2 (4.93)
m
Figure 4.24: Triangular signal u(t) that is applied to the LuGre friction model in order to
analyse its properties.
The LuGre friction model is able to produce system behaviour which is similar to experi-
mental observations of real world friction characteristics [37, 39, 40]. The LuGre model
accounts for micro or pre-sliding displacements corresponding to an applied excitation
force u << μFN as well as friction hysteresis of varying form in cyclic processes which
depends not only on the parametrization but also on the rate of change due to the nature
of the dynamic friction model .
For the purpose of identifying the system parameters of the LuGre friction model, it might
therefore be important to design experiments that result in measurement data which gives
insight into these different behaviours at smaller and higher excitation levels of the sys-
tem.
A downside of the LuGre model is, that it can pose difficulties in the numeric integration
due to the stiff parametrization of the bristle [37, 38]. Therefore it would be advantageous
if the experimental design results in a setup that is as easy a possible to solve numerically.
To investigate the behaviour of the model, a triangular signal u(t) as shown in Fig. 4.24
with varying amplitude û is applied to the system. A single cycle of u(t) is looked at with
a period of five seconds.
Fig. 4.25 shows the system response to the described input signal for a selection of small
amplitudes û < μ0 mg. These results show a small pre-sliding displacements x(t) which
the LuGre model allows for due to the fact that it is not perfectly rigid before the static
friction force is overcome. Fig. 4.26 shows the system response to larger input signals
CHAPTER 4. CASE STUDIES 72
û ≥ μ0 mg. As can be seen, once the static friction force is overcome, the system exhibits
macro level displacements x(t). A small displacement goes together with a velocity v > 0,
which lowers the friction force due to the Stribeck effect, which results in faster accel-
eration. As can be seen by comparing the two figures, the system response for different
excitation levels may lie on a completely different magnitude.
The important insight to take away from this is that there exists a sort of boundary be-
tween the system responding with micro or macro level displacements, which is related
the applied excitation force u(t) but which of course also depends on the system parame-
ters.
For example, a smaller bristle stiffness σ0 would result in a larger pre-sliding displace-
ment and therefore a higher velocity v(t) as a response to the same input force u(t). A
smaller v0 would result in a more rapid lowering of the friction force with rising velocity
v(t). These parameter changes would result in the overcoming of the static friction force
at lower input levels. Of course a lower μ0 would result in the same effect.
During the system identification procedure, starting from an initial guess ψ0 , a numeri-
cal optimization algorithm iterates over the parameter space in order to find the optimal
parameter set, which results in the fitting model output y(ψ) to the measured system re-
sponse ξ. When the measurement data corresponds to macro level displacements, but the
initial guess results in micro level displacements or the other way around, correct conver-
gence might be poor or even impossible. Therefore one has to be careful when working
with identification data sets that lie close to that boundary, i.e. where small variations in
the parameters cause the overcoming of that boundary and consequently a drastic change
in the system behaviour.
To conclude the discussion of the system behaviour at varying excitation levels the follow-
ing figures present the (x, Ff ) and (v, Ff ) characteristic for the same triangular waveform
inputs u(t) as discussed so far. Ff was computed according to equation (4.89). Fig. 4.27
and Fig. 4.28 show the friction hysteresis (x, Ff ) at lower and higher excitation of the
system. Fig. 4.29 shows the (v, Ff ) hysteresis for small excitation levels as well as one
flaw of the LuGre model. As the friction is modelled as a dynamic system via the state
z(t), the friction foce Ff does not always point against the direction of motion sign(v(t)).
Fig. 4.30 shows the (v, Ff ) characteristic for higher excitation levels. This results in the
typical Stribeck curve where Ff starts at μ0 mg, sharply falls off to μmg as the motion
begins and rises with the viscous friction term bv(t).
CHAPTER 4. CASE STUDIES 73
Figure 4.25: System response x(t) to the triangular input signal u(t) at smaller excitation
levels.
Figure 4.26: System response x(t) to the triangular input signal u(t) at larger excitation
levels.
CHAPTER 4. CASE STUDIES 74
Figure 4.27: (x(t), Ff (t)) characteristic for smaller excitation levels u(t) < μ0 mg.
Figure 4.28: (x(t), Ff (t)) characteristic for higher excitation levels u(t) > μ0 mg.
CHAPTER 4. CASE STUDIES 75
Figure 4.29: (v(t), Ff (t)) characteristic for smaller excitation levels u(t) < μ0 mg.
Figure 4.30: (v(t), Ff (t)) characteristic for higher excitation levels u(t) > μ0 mg.
CHAPTER 4. CASE STUDIES 76
To look at the effect that a small variation of the parameters has on the system response,
the numerical approximation of the Jacobian J(ψ) of the system response x(t) is now
looked at. The parameters under investigation are
T
ψ = b σ 0 vs μ 0 μ . (4.94)
The Jacobian is approximated by a central finite difference formula for small changes
around ψexct as specified in the beginning of this section 4.3.3. As some of the param-
eters lie on a different scale of magnitude compared to others, an individual step size of
Δψi = 1%ψi was selected for each parameter ψi .
For a small excitation with the triangular signal u(t) with amplitude û = 0.5μ0 mg the Ja-
cobian is presented in Fig. 4.31. For this input, the system shows a larger change in the
system response for a variation of σ0 and μ0 . The variation of the other parameters shows
less of an effect. For clearer visibility, these components of the Jacobian are shown in
detail in Fig. 4.32. Note that the variation of vs shows a similar but not the same change
in the response as a variation of μ.
Fig. 4.33 shows the numerically approximated Jacobian for the same input form but with
a higher amplitude û = 1.5μ0 mg. As can be seen, for this test case the variation of σ0
results in the smallest variation of the response.
Based on this analysis it can be assumed that the system identification based in pre-sliding
displacement data could result in better estimation of σ0 compared to macro level dis-
placement data due to the larger effect of small variations on the value of the objective
function.
CHAPTER 4. CASE STUDIES 77
Figure 4.31: Numerically approximated Jacobian for a triangular input signal u(t) with
û = 0.5μ0 mg.
Figure 4.32: Selected components of the numerically approximated Jacobian for a trian-
gular input signal u(t) with û = 0.5μ0 mg.
CHAPTER 4. CASE STUDIES 78
Figure 4.33: Numerically approximated Jacobian for a triangular input signal u(t) with
û = 1.5μ0 mg.
with ⎡ ⎤ ⎡ ⎤
bexct 0.4 kg/s
⎢ ⎥ ⎢ ⎥
⎢σ0,exct ⎥ ⎢ 1 × 105 N/m ⎥
⎢ ⎥ ⎢ ⎥
ψexct =⎢ ⎥ ⎢ −2
⎢ vs,exct ⎥ = ⎢1 × 10 m/s⎥
⎥ (4.96)
⎢ ⎥ ⎢ ⎥
⎣μ0,exct ⎦ ⎣ 0.15 ⎦
μexct 0.10
In the previous section the cyclic behaviour of the nonlinear friction model was investi-
gated. Testing has shown that attempting the system identification based a corresponding
input-output data set results in problems in the convergence of the nonlinear curve fitting
problem. Therefore, in this section a slightly simplified excitation force u(t), that only
points in one direction is selected for the experimental design. This eliminates zero cross-
ings from the state trajectory of the position x(t).
CHAPTER 4. CASE STUDIES 79
Figure 4.34: Triangular input signal uA (t) with û = 0.5μ0 mg that is applied to the LuGre
friction model on order to generate identification data.
An identification data set ξA is generated based on a triangular input signal uA (t) con-
sisting of a single peak with amplitude ûA = 0.5μ0,exct mg and a period of five seconds
as shown in Fig. 4.34. This input force will not be high enough to overcome the static
friction. Therefore the movement x(t) is limited to pre-sliding displacement.
For the identification the system response is sampled every tstep = 0.2 s. The response of
an arbitrarily parametrized system to the described input uA (t) at these points in time is
denoted as yA (ψ). The identification data set ξA is generated by adding gaussian mea-
surement noise δA , which is sampled from the normal distribution N (0, 2.5 × 10−7 ), to
the response of the exactly parametrized system:
ξA = yA (ψexct ) + δA . (4.97)
The resulting data points can be seen in Fig. 4.34. The initial guess for the system
parameters is selected as
T
ψ0 = 0.3 kg/s 0.75 × 105 N/m 2 × 10−2 m/s 0.12 0.12 . (4.98)
As the system response of to the input uA (t) lies on a reactively small scale, the objective
function is defined as
εA (ψ) = WA (ξA − yA (ψ))22 , (4.99)
CHAPTER 4. CASE STUDIES 80
Figure 4.35: Identification data set generated by adding noise to the sampled response of
the exactly parameterized system to the input signal uA (t).
1
WA = IN (4.100)
max(ξA )
with IN as a unity matrix of size N, the number of data points. Additionally, in order to
further improve the convergence of the numerical optimization the parameter space was
normalized by a linear transformation with the following mapping:
As can be seen, the bristle striffness σ0 , the Stribeck velocity vs and the static friction
coefficient μ0 are identified accurately. However, this can not be said for the identified
values of the viscous friction coefficient b and the dynamic friction coefficient μ.
CHAPTER 4. CASE STUDIES 81
Figure 4.36: Identification data set generated by adding noise to the sampled response of
the exactly parametrized system to the input signal uB (t).
A second identification data set ξB is generated based on a triangular input signal uB (t).
The input has the same form as uA (t), which was shown in Fig. 4.34, consisting of a
single peak but with larger amplitude ûB = 1.5μ0,exct mg, which will overcome the static
friction and lead to macro level displacements. The input is again applied over a period
of five seconds.
As previously, for the identification the system response is sampled every tstep = 0.2 s.
The response of an arbitrarily parametrized system to the described input uB (t) at these
points in time is denoted as yB (ψ). In order to achieve a certain comparability to the
results of the system identification based on the pre-sliding experiment, a similar signal
to noise ratio is selected. The identification data set ξB is generated by adding gaussian
measurement noise δB , which is sampled from the normal distribution N (0, 2.5 × 10−2 ),
to the response of the exactly parametrized system:
ξB = yB (ψexct ) + δB . (4.103)
The resulting data points can be seen in Fig. 4.36. The initial guess for the system
parameters is selected as ψ0 as before. The objective function that needs to be optimized
in order to solve this system identification problem is defined as
1
WB = IN (4.105)
max(ξB )
with IN as a unity matrix of size N, the number of data points. The same rescaling of the
parameter space as before was applied again. Applying a Levenberg-Marquardt algorithm
to minimize εB (ψ) results in the identified parameter set
⎡ ⎤
0.456 503 27 kg/s
⎢ ⎥
⎢ 67 374.558 N/m ⎥
⎢ ⎥
ψB = ⎢ ⎥
⎢0.010 472 251 m/s⎥ . (4.106)
⎢ ⎥
⎣ 0.153 278 04 ⎦
0.095 141 50
For this setup, the viscous friction coefficient b was identified more accurately, as can be
expected for a data set corresponding to a movement at higher velocities. The dynamics of
the bristle, which have only minor effects on the macro level motion of the system, repre-
sented by the bristle stiffness σ0 are overshadowed by the measurement noise. Therefore
this parameter was not identified very accurately. The static as well as the dynamic fric-
tion coefficient μ0 and μ were both identified with reasonable precision.
As has already been discussed in section 4.1.5 multiple identification data sets can be
used simultaneously in order to make a more accurate prediction of the exact system
parameters. In this section the observations of micro level displacements ξA and of macro
level displacements ξB are utilized and the feasibility of overcoming the shortcomings in
the identified parameter sets when using those data sets individually is investigated.
The objective function for the identification of the parameters ψ based on two data sets
can be defined as 2
ξ y (ψ)
A A
εAB (ψ) = − . (4.107)
ξB yB (ψ) 2
As the system responses to uA (t) and uB (t) lie on a completely different scale an individual
weighting for the corresponding data sets is added to the objective function.
2
W 0 ξA yA (ψ)
A
εAB (ψ) = − (4.108)
0 WB ξB yB (ψ) 2
with the same weighting matrices WA and WB , that were defined in the previous sections.
Again, a Levenberg-Marquardt Algorithm was used to minimize this objective function,
CHAPTER 4. CASE STUDIES 83
Compared to the exact parameters, these results are relatively good estimates of σ0 , μ0
and μ, a less accurate identified vs and b.
The resulting estimated system output for ψA , ψB and ψAB compared to the identification
data and the system response with exact parametrization ψexct can be seen in Fig. 4.37
for uA (t) and Fig. 4.38 for uB (t). While the parameter set ψAB identified based on both
data sets ξA and ξB does not produce the best performance in the test case A, it is still a
major improvement compared to the evaluation of the test case A with the parameter set
ψB or ψA in the test case B.
To summarize, the case study of the LuGre friction model clearly demonstrates the pos-
sible difficulty that can be encountered when only limited identification data is available
for the system identification of nonlinear dynamic systems. The correct estimation of the
model parameters is not necessarily only a question of achieving a high enough excitation
in the experiments that are used to extract information about the object of interest. It is
rather dependent on getting enough data at the right levels of system excitation. All in
all it was shown that different identification data sets can be combined in a system iden-
tification procedure in order to be able to compute an improved estimate of all the model
parameters.
CHAPTER 4. CASE STUDIES 84
Figure 4.37: Comparison of the system response to the input signal uA (t) for the exact
and the identified parameter sets.
Figure 4.38: Comparison of the system response to the input signal uB (t) for the exact
and the identified parameter sets.
Chapter 5
Conclusion
85
CHAPTER 5. CONCLUSION 86
nonlinear curve fitting problem is well posed as the objective function appears to be of
convex shape over a wide range of the parameter space around the global minimum. The
influence of gaussian measurement noise on the identified parameters was investigated.
This resulted in the conclusion that for this example problem, a gaussian measurement
noise results in gaussian disturbance induced variation of the identified parameters. How-
ever, it is important to keep in mind that this will not be the case for every nonlinear
system identification problem. Furthermore, the compensation of the disturbance induced
variation by means of utilizing multiple identification data sets was investigated. This
resulted in the conclusion that, based on the results on a Monte-Carlo simulation, the
standard deviation of the identified parameters can be reduced by a factor of the inverse
of the square root of the number of data sets incorporated into the system identification
procedure.
In the case study of the nonlinear mass and spring system the utilization of observational
data from two different experimental setups in the system identification of both the model
parameters as well as the initial conditions of each experiment was investigated. This
example problem shows that the inclusion of multiple different data sets not only leads
to better results in terms of the disturbance induced variation of the identified parameters,
but can also improve the convergence of the nonlinear curve fitting problem. In addition
to that, the application of a multiple starting point procedure in order to find a way around
local minima to a global minimizer was presented.
In the third case study, the system identification of the LuGre friction model demanded
special attention to the experimental design, as the system exhibits a vastly different be-
haviour depending on the excitation level. As was shown, by combining identification
data sets corresponding to micro and macro level excitations of the system, a proper esti-
mation of all the model parameters can be achieved. Even though the technical feasibility
of executing these proposed experiments on the same system might cause practical prob-
lems, the advantage of combining the information contained in multiple data sets in order
to accurately identify a set of model parameters was clearly demonstrated.
Bibliography
[1] M. Bunge. Emergence and Convergence: Qualitative Novelty and the Unity of
Knowledge. Toronto studies in philosophy. University of Toronto Press, 2003.
[2] Lennart Ljung and Torkel Glad. Modeling of Dynamic Systems. Prentice-Hall, Inc.,
USA, 1994.
[3] Gene F. Franklin, Michael L. Workman, and Dave Powell. Digital Control of Dy-
namic Systems. Addison-Wesley Longman Publishing Co., Inc., 3rd edition, 1997.
[6] Amar G. Bose. A Theory of Nonlinear Systems. PhD thesis, Massachusetts Institute
of Technology, Dept. of Electrical Engineering, June 1956.
[7] G. Rath and M. Harker. Global least squares solution for lti system response. In
2017 International Conference on Applied Electronics (AE), pages 1–4, Sep. 2017.
[8] P. O’Leary and M. Harker. A framework for the evaluation of inclinometer data in
the measurement of structures. IEEE Transactions on Instrumentation and Mea-
surement, 61(5):1237–1251, May 2012.
[9] M. Harker and G. Rath. Global least squares for time-domain system identifica-
tion of state-space models. In 2018 7th Mediterranean Conference on Embedded
Computing (MECO), pages 1–6, 2018.
87
BIBLIOGRAPHY 88
[12] Lennart Ljung. Some aspects on nonlinear system identification. IFAC Proceed-
ings Volumes, 39(1):553 – 564, 2006. 14th IFAC Symposium on Identification and
System Parameter Estimation.
[13] Johan Schoukens and Lennart Ljung. Nonlinear system identification: A user-
oriented roadmap. ArXiv, abs/1902.00683, 2019.
[14] L. A. Zadeh. From circuit theory to system theory. Proceedings of the IRE,
50(5):856–865, 1962.
[15] José Borges, Vincent Verdult, Michel Verhaegen, and Miguel Ayala Botto. Separable
least squares for projected gradient identification of composite local linear state-
space models. 07 2004.
[17] Rolf Isermann and Marco Münchhof. Identification of Dynamic Systems: An Intro-
duction with Applications. Springer Publishing Company, Incorporated, 2014.
[18] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery.
Numerical Recipes in C (2nd Ed.): The Art of Scientific Computing. Cambridge
University Press, USA, 1992.
[19] W. Richard Kolk and Robert A. Lerman. Analytic Solutions to Nonlinear Differential
Equations, pages 23–60. Springer US, Boston, MA, 1992.
[20] R. Ashino, M. Nagase, and R. Vaillancourt. Behind and Beyond the Matlab ODE
Suite. Computers and Mathematics with Applications, 40(4):491 – 512, 2000.
[22] Lawrence F. Shampine and Mark W. Reichelt. The matlab ode suite. SIAM J. Sci.
Comput., 18(1):1–22, January 1997.
[23] Kaj Madsen, Hans Bruun Nielsen, and Ole Tingleff. Methods for
Non-Linear Least Squares Problems (2nd ed.). 2004. PDF avail-
able at http://www2.imm.dtu.dk/pubdb/views/edoc_download.
php/3215/pdf/imm3215.pdf (2020/02/11).
[24] Kenneth Levenberg. A method for the solution of certain non-linear problems in
least squares. Quarterly of Applied Mathematics, 2(2):164–168, jul 1944.
BIBLIOGRAPHY 89
[26] C.T. Kelley. Iterative Methods for Optimization. Frontiers in Applied Mathematics.
Society for Industrial and Applied Mathematics, 1999. PDF available at https://
archive.siam.org/books/textbooks/fr18_book.pdf (2020/01/17).
[27] Henri P. Gavin. The levenberg-marquardt method for nonlinear least squares curve-
fitting problems. Available at http://people.duke.edu/˜hpgavin/
ce281/lm.pdf (2019/12/06), 2013.
[28] John A. Nelder and Roger Mead. A simplex method for function minimization.
Computer Journal, 7:308–313, 1965.
[29] Jeffrey Lagarias, James Reeds, Margaret Wright, and Paul Wright. Convergence
properties of the nelder–mead simplex method in low dimensions. SIAM Journal on
Optimization, 9:112–147, 12 1998.
[30] G.H. Golub and Victor Pereyra. The differentiation of pseudo-inverses and nonlin-
ear least squares problems whose variables separate. SIAM Journal on Numerical
Analysis, 10:413–432, 04 1973.
[31] Gene Golub and Victor Pereyra. Separable nonlinear least squares: the variable
projection method and its applications. Inverse Problems, 19:R1–R26(1), 01 2003.
[32] Dianne P. O’Leary and Bert W. Rust. Variable projection for nonlinear least squares
problems. Computational Optimization and Applications, 54(3):579–593, Apr 2013.
[33] J. Bruls, C.T. Chou, B.R.J. Haverkamp, and M. Verhaegen. Linear and non-linear
system identification using separable least-squares. European Journal of Control,
5(1):116 – 128, 1999.
[35] H.K. Khalil. Nonlinear Control, Global Edition. Pearson Education Limited, 2015.
[37] V. van Geven. A study of friction models and friction compensation. Available at
http://www.mate.tue.nl/mate/pdfs/11194.pdf (2019/12/18), 2009.
traineeship report, Technical University Eindhoven, Department Mechanical Engi-
neering, Dynamics and Control Technology Group.
BIBLIOGRAPHY 90
[38] Andreas Krämer and Joachim Kempkes. Modellierung und simulation von nicht-
linearen reibungseffekten bei der lageregelung von servomotoren. FHWS Science
Journal, 1(2):47 – 57, 2013.
[39] Carlos Canudas de Wit, Henrik Olsson, Karl Johan Åström, and Pablo Lischinsky.
A new model for control of systems with friction. IRE Transactions on Automatic
Control, 40(3), 1995.
[40] T. Piatkowski. Dahl and lugre dynamic friction models — the analysis of selected
properties. Mechanism and Machine Theory, 73:91 – 100, 2014.