Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

15.

5 Nonlinear Models 681


S
a
m
p
l
e

p
a
g
e

f
r
o
m

N
U
M
E
R
I
C
A
L

R
E
C
I
P
E
S

I
N

C
:

T
H
E

A
R
T

O
F

S
C
I
E
N
T
I
F
I
C

C
O
M
P
U
T
I
N
G

(
I
S
B
N

0
-
5
2
1
-
4
3
1
0
8
-
5
)
C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

C
a
m
b
r
i
d
g
e

U
n
i
v
e
r
s
i
t
y

P
r
e
s
s
.
P
r
o
g
r
a
m
s

C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

S
o
f
t
w
a
r
e
.

P
e
r
m
i
s
s
i
o
n

i
s

g
r
a
n
t
e
d

f
o
r

i
n
t
e
r
n
e
t

u
s
e
r
s

t
o

m
a
k
e

o
n
e

p
a
p
e
r

c
o
p
y

f
o
r

t
h
e
i
r

o
w
n

p
e
r
s
o
n
a
l

u
s
e
.

F
u
r
t
h
e
r

r
e
p
r
o
d
u
c
t
i
o
n
,

o
r

a
n
y

c
o
p
y
i
n
g

o
f

m
a
c
h
i
n
e
-
r
e
a
d
a
b
l
e

f
i
l
e
s

(
i
n
c
l
u
d
i
n
g

t
h
i
s

o
n
e
)

t
o

a
n
y

s
e
r
v
e
r
c
o
m
p
u
t
e
r
,

i
s

s
t
r
i
c
t
l
y

p
r
o
h
i
b
i
t
e
d
.

T
o

o
r
d
e
r

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

b
o
o
k
s
o
r

C
D
R
O
M
s
,

v
i
s
i
t

w
e
b
s
i
t
e
h
t
t
p
:
/
/
w
w
w
.
n
r
.
c
o
m

o
r

c
a
l
l

1
-
8
0
0
-
8
7
2
-
7
4
2
3

(
N
o
r
t
h

A
m
e
r
i
c
a

o
n
l
y
)
,
o
r

s
e
n
d

e
m
a
i
l

t
o

d
i
r
e
c
t
c
u
s
t
s
e
r
v
@
c
a
m
b
r
i
d
g
e
.
o
r
g

(
o
u
t
s
i
d
e

N
o
r
t
h

A
m
e
r
i
c
a
)
.
Lawson, C.L., and Hanson, R. 1974, Solving Least Squares Problems (Englewood Cliffs, NJ:
Prentice-Hall).
Forsythe, G.E., Malcolm, M.A., and Moler, C.B. 1977, Computer Methods for Mathematical
Computations (Englewood Cliffs, NJ: Prentice-Hall), Chapter 9.
15.5 Nonlinear Models
We now consider tting when the model depends nonlinearly on the set of M
unknown parameters a
k
, k = 1, 2, . . . , M. We use the same approach as in previous
sections, namely to dene a
2
merit function and determine best-t parameters
by its minimization. With nonlinear dependences, however, the minimization must
proceed iteratively. Given trial values for the parameters, we develop a procedure
that improves the trial solution. The procedure is then repeated until
2
stops (or
effectively stops) decreasing.
How is this problem different from the general nonlinear function minimization
problem already dealt with in Chapter 10? Supercially, not at all: Sufciently
close to the minimum, we expect the
2
function to be well approximated by a
quadratic form, which we can write as

2
(a) d a +
1
2
a D a (15.5.1)
where d is an M-vector and D is an M M matrix. (Compare equation 10.6.1.)
If the approximation is a good one, we know how to jump from the current trial
parameters a
cur
to the minimizing ones a
min
in a single leap, namely
a
min
= a
cur
+D
1

2
(a
cur
)

(15.5.2)
(Compare equation 10.7.4.)
On the other hand, (15.5.1) might be a poor local approximation to the shape
of the function that we are trying to minimize at a
cur
. In that case, about all we
can do is take a step down the gradient, as in the steepest descent method (10.6).
In other words,
a
next
= a
cur
constant
2
(a
cur
) (15.5.3)
where the constant is small enough not to exhaust the downhill direction.
To use (15.5.2) or (15.5.3), we must be able to compute the gradient of the
2
function at any set of parameters a. To use (15.5.2) we also need the matrix D, which
is the second derivative matrix (Hessian matrix) of the
2
merit function, at any a.
Now, this is the crucial difference from Chapter 10: There, we had no way of
directly evaluating the Hessian matrix. We were given only the ability to evaluate
the function to be minimized and (in some cases) its gradient. Therefore, we had
to resort to iterative methods not just because our function was nonlinear, but also
in order to build up information about the Hessian matrix. Sections 10.7 and 10.6
concerned themselves with two different techniques for building up this information.
682 Chapter 15. Modeling of Data
S
a
m
p
l
e

p
a
g
e

f
r
o
m

N
U
M
E
R
I
C
A
L

R
E
C
I
P
E
S

I
N

C
:

T
H
E

A
R
T

O
F

S
C
I
E
N
T
I
F
I
C

C
O
M
P
U
T
I
N
G

(
I
S
B
N

0
-
5
2
1
-
4
3
1
0
8
-
5
)
C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

C
a
m
b
r
i
d
g
e

U
n
i
v
e
r
s
i
t
y

P
r
e
s
s
.
P
r
o
g
r
a
m
s

C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

S
o
f
t
w
a
r
e
.

P
e
r
m
i
s
s
i
o
n

i
s

g
r
a
n
t
e
d

f
o
r

i
n
t
e
r
n
e
t

u
s
e
r
s

t
o

m
a
k
e

o
n
e

p
a
p
e
r

c
o
p
y

f
o
r

t
h
e
i
r

o
w
n

p
e
r
s
o
n
a
l

u
s
e
.

F
u
r
t
h
e
r

r
e
p
r
o
d
u
c
t
i
o
n
,

o
r

a
n
y

c
o
p
y
i
n
g

o
f

m
a
c
h
i
n
e
-
r
e
a
d
a
b
l
e

f
i
l
e
s

(
i
n
c
l
u
d
i
n
g

t
h
i
s

o
n
e
)

t
o

a
n
y

s
e
r
v
e
r
c
o
m
p
u
t
e
r
,

i
s

s
t
r
i
c
t
l
y

p
r
o
h
i
b
i
t
e
d
.

T
o

o
r
d
e
r

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

b
o
o
k
s
o
r

C
D
R
O
M
s
,

v
i
s
i
t

w
e
b
s
i
t
e
h
t
t
p
:
/
/
w
w
w
.
n
r
.
c
o
m

o
r

c
a
l
l

1
-
8
0
0
-
8
7
2
-
7
4
2
3

(
N
o
r
t
h

A
m
e
r
i
c
a

o
n
l
y
)
,
o
r

s
e
n
d

e
m
a
i
l

t
o

d
i
r
e
c
t
c
u
s
t
s
e
r
v
@
c
a
m
b
r
i
d
g
e
.
o
r
g

(
o
u
t
s
i
d
e

N
o
r
t
h

A
m
e
r
i
c
a
)
.
Here, life is much simpler. We know exactly the form of
2
, since it is based
on a model function that we ourselves have specied. Therefore the Hessian matrix
is known to us. Thus we are free to use (15.5.2) whenever we care to do so. The
only reason to use (15.5.3) will be failure of (15.5.2) to improve the t, signaling
failure of (15.5.1) as a good local approximation.
Calculation of the Gradient and Hessian
The model to be tted is
y = y(x; a) (15.5.4)
and the
2
merit function is

2
(a) =
N

i=1
_
y
i
y(x
i
; a)

i
_
2
(15.5.5)
The gradient of
2
with respect to the parameters a, which will be zero at the
2
minimum, has components

2
a
k
= 2
N

i=1
[y
i
y(x
i
; a)]

2
i
y(x
i
; a)
a
k
k = 1, 2, . . . , M (15.5.6)
Taking an additional partial derivative gives

2
a
k
a
l
= 2
N

i=1
1

2
i
_
y(x
i
; a)
a
k
y(x
i
; a)
a
l
[y
i
y(x
i
; a)]

2
y(x
i
; a)
a
l
a
k
_
(15.5.7)
It is conventional to remove the factors of 2 by dening

k

1
2

2
a
k

kl

1
2

2
a
k
a
l
(15.5.8)
making [] =
1
2
D in equation (15.5.2), in terms of which that equation can be
rewritten as the set of linear equations
M

l=1

kl
a
l
=
k
(15.5.9)
This set is solved for the increments a
l
that, added to the current approximation,
give the next approximation. In the context of least-squares, the matrix [], equal to
one-half times the Hessian matrix, is usually called the curvature matrix.
Equation (15.5.3), the steepest descent formula, translates to
a
l
= constant
l
(15.5.10)
15.5 Nonlinear Models 683
S
a
m
p
l
e

p
a
g
e

f
r
o
m

N
U
M
E
R
I
C
A
L

R
E
C
I
P
E
S

I
N

C
:

T
H
E

A
R
T

O
F

S
C
I
E
N
T
I
F
I
C

C
O
M
P
U
T
I
N
G

(
I
S
B
N

0
-
5
2
1
-
4
3
1
0
8
-
5
)
C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

C
a
m
b
r
i
d
g
e

U
n
i
v
e
r
s
i
t
y

P
r
e
s
s
.
P
r
o
g
r
a
m
s

C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

S
o
f
t
w
a
r
e
.

P
e
r
m
i
s
s
i
o
n

i
s

g
r
a
n
t
e
d

f
o
r

i
n
t
e
r
n
e
t

u
s
e
r
s

t
o

m
a
k
e

o
n
e

p
a
p
e
r

c
o
p
y

f
o
r

t
h
e
i
r

o
w
n

p
e
r
s
o
n
a
l

u
s
e
.

F
u
r
t
h
e
r

r
e
p
r
o
d
u
c
t
i
o
n
,

o
r

a
n
y

c
o
p
y
i
n
g

o
f

m
a
c
h
i
n
e
-
r
e
a
d
a
b
l
e

f
i
l
e
s

(
i
n
c
l
u
d
i
n
g

t
h
i
s

o
n
e
)

t
o

a
n
y

s
e
r
v
e
r
c
o
m
p
u
t
e
r
,

i
s

s
t
r
i
c
t
l
y

p
r
o
h
i
b
i
t
e
d
.

T
o

o
r
d
e
r

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

b
o
o
k
s
o
r

C
D
R
O
M
s
,

v
i
s
i
t

w
e
b
s
i
t
e
h
t
t
p
:
/
/
w
w
w
.
n
r
.
c
o
m

o
r

c
a
l
l

1
-
8
0
0
-
8
7
2
-
7
4
2
3

(
N
o
r
t
h

A
m
e
r
i
c
a

o
n
l
y
)
,
o
r

s
e
n
d

e
m
a
i
l

t
o

d
i
r
e
c
t
c
u
s
t
s
e
r
v
@
c
a
m
b
r
i
d
g
e
.
o
r
g

(
o
u
t
s
i
d
e

N
o
r
t
h

A
m
e
r
i
c
a
)
.
Note that the components
kl
of the Hessian matrix (15.5.7) depend both on the
rst derivatives and on the second derivatives of the basis functions with respect to
their parameters. Some treatments proceed to ignore the second derivative without
comment. We will ignore it also, but only after a few comments.
Second derivatives occur because the gradient (15.5.6) already has a dependence
on y/a
k
, so the next derivative simply must contain terms involving
2
y/a
l
a
k
.
The second derivative term can be dismissed when it is zero (as in the linear case
of equation 15.4.8), or small enough to be negligible when compared to the term
involving the rst derivative. It also has an additional possibility of being ignorably
small in practice: The term multiplying the second derivative in equation (15.5.7)
is [y
i
y(x
i
; a)]. For a successful model, this term should just be the random
measurement error of each point. This error can have either sign, and should in
general be uncorrelated with the model. Therefore, the second derivative terms tend
to cancel out when summed over i.
Inclusion of the second-derivative term can in fact be destabilizing if the model
ts badly or is contaminated by outlier points that are unlikely to be offset by
compensating points of opposite sign. From this point on, we will always use as
the denition of
kl
the formula

kl
=
N

i=1
1

2
i
_
y(x
i
; a)
a
k
y(x
i
; a)
a
l
_
(15.5.11)
This expression more closely resembles its linear cousin (15.4.8). You should
understand that minor (or even major) ddling with [] has no effect at all on
what nal set of parameters a is reached, but affects only the iterative route that is
taken in getting there. The condition at the
2
minimum, that
k
= 0 for all k,
is independent of how [] is dened.
Levenberg-Marquardt Method
Marquardt [1] has put forth an elegant method, related to an earlier suggestion of
Levenberg, for varying smoothly between the extremes of the inverse-Hessian method
(15.5.9) and the steepest descent method (15.5.10). The latter method is used far from
the minimum, switching continuously to the former as the minimum is approached.
This Levenberg-Marquardt method (also called Marquardt method) works very well
in practice and has become the standard of nonlinear least-squares routines.
The method is based on two elementary, but important, insights. Consider the
constant in equation (15.5.10). What should it be, even in order of magnitude?
What sets its scale? There is no information about the answer in the gradient.
That tells only the slope, not how far that slope extends. Marquardts rst insight
is that the components of the Hessian matrix, even if they are not usable in any
precise fashion, give some information about the order-of-magnitude scale of the
problem.
The quantity
2
is nondimensional, i.e., is a pure number; this is evident from
its denition (15.5.5). On the other hand,
k
has the dimensions of 1/a
k
, which
may well be dimensional, i.e., have units like cm
1
, or kilowatt-hours, or whatever.
(In fact, each component of
k
can have different dimensions!) The constant of
proportionality between
k
and a
k
must therefore have the dimensions of a
2
k
. Scan
684 Chapter 15. Modeling of Data
S
a
m
p
l
e

p
a
g
e

f
r
o
m

N
U
M
E
R
I
C
A
L

R
E
C
I
P
E
S

I
N

C
:

T
H
E

A
R
T

O
F

S
C
I
E
N
T
I
F
I
C

C
O
M
P
U
T
I
N
G

(
I
S
B
N

0
-
5
2
1
-
4
3
1
0
8
-
5
)
C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

C
a
m
b
r
i
d
g
e

U
n
i
v
e
r
s
i
t
y

P
r
e
s
s
.
P
r
o
g
r
a
m
s

C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

S
o
f
t
w
a
r
e
.

P
e
r
m
i
s
s
i
o
n

i
s

g
r
a
n
t
e
d

f
o
r

i
n
t
e
r
n
e
t

u
s
e
r
s

t
o

m
a
k
e

o
n
e

p
a
p
e
r

c
o
p
y

f
o
r

t
h
e
i
r

o
w
n

p
e
r
s
o
n
a
l

u
s
e
.

F
u
r
t
h
e
r

r
e
p
r
o
d
u
c
t
i
o
n
,

o
r

a
n
y

c
o
p
y
i
n
g

o
f

m
a
c
h
i
n
e
-
r
e
a
d
a
b
l
e

f
i
l
e
s

(
i
n
c
l
u
d
i
n
g

t
h
i
s

o
n
e
)

t
o

a
n
y

s
e
r
v
e
r
c
o
m
p
u
t
e
r
,

i
s

s
t
r
i
c
t
l
y

p
r
o
h
i
b
i
t
e
d
.

T
o

o
r
d
e
r

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

b
o
o
k
s
o
r

C
D
R
O
M
s
,

v
i
s
i
t

w
e
b
s
i
t
e
h
t
t
p
:
/
/
w
w
w
.
n
r
.
c
o
m

o
r

c
a
l
l

1
-
8
0
0
-
8
7
2
-
7
4
2
3

(
N
o
r
t
h

A
m
e
r
i
c
a

o
n
l
y
)
,
o
r

s
e
n
d

e
m
a
i
l

t
o

d
i
r
e
c
t
c
u
s
t
s
e
r
v
@
c
a
m
b
r
i
d
g
e
.
o
r
g

(
o
u
t
s
i
d
e

N
o
r
t
h

A
m
e
r
i
c
a
)
.
the components of [] and you see that there is only one obvious quantity with these
dimensions, and that is 1/
kk
, the reciprocal of the diagonal element. So that must
set the scale of the constant. But that scale might itself be too big. So lets divide
the constant by some (nondimensional) fudge factor , with the possibility of setting
1 to cut down the step. In other words, replace equation (15.5.10) by
a
l
=
1

ll

l
or
ll
a
l
=
l
(15.5.12)
It is necessary that
ll
be positive, but this is guaranteed by denition (15.5.11)
another reason for adopting that equation.
Marquardts second insight is that equations (15.5.12) and (15.5.9) can be
combined if we dene a new matrix

by the following prescription

jj

jj
(1 +)

jk

jk
(j = k)
(15.5.13)
and then replace both (15.5.12) and (15.5.9) by
M

l=1

kl
a
l
=
k
(15.5.14)
When is very large, the matrix

is forced into being diagonally dominant, so


equation (15.5.14) goes over to be identical to (15.5.12). On the other hand, as
approaches zero, equation (15.5.14) goes over to (15.5.9).
Given an initial guess for the set of tted parameters a, the recommended
Marquardt recipe is as follows:
Compute
2
(a).
Pick a modest value for , say = 0.001.
() Solve the linear equations (15.5.14) for a and evaluate
2
(a +a).
If
2
(a + a)
2
(a), increase by a factor of 10 (or any other
substantial factor) and go back to ().
If
2
(a + a) <
2
(a), decrease by a factor of 10, update the trial
solution a a + a, and go back to ().
Also necessary is a condition for stopping. Iterating to convergence (to machine
accuracy or to the roundoff limit) is generally wasteful and unnecessary since the
minimum is at best only a statistical estimate of the parameters a. As we will see
in 15.6, a change in the parameters that changes
2
by an amount 1 is never
statistically meaningful.
Furthermore, it is not uncommon to nd the parameters wandering
around near the minimum in a at valley of complicated topography. The rea-
son is that Marquardts method generalizes the method of normal equations (15.4),
hence has the same problem as that method with regard to near-degeneracy of the
minimum. Outright failure by a zero pivot is possible, but unlikely. More often,
a small pivot will generate a large correction which is then rejected, the value of
being then increased. For sufciently large the matrix [

] is positive denite
and can have no small pivots. Thus the method does tend to stay away from zero
15.5 Nonlinear Models 685
S
a
m
p
l
e

p
a
g
e

f
r
o
m

N
U
M
E
R
I
C
A
L

R
E
C
I
P
E
S

I
N

C
:

T
H
E

A
R
T

O
F

S
C
I
E
N
T
I
F
I
C

C
O
M
P
U
T
I
N
G

(
I
S
B
N

0
-
5
2
1
-
4
3
1
0
8
-
5
)
C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

C
a
m
b
r
i
d
g
e

U
n
i
v
e
r
s
i
t
y

P
r
e
s
s
.
P
r
o
g
r
a
m
s

C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

S
o
f
t
w
a
r
e
.

P
e
r
m
i
s
s
i
o
n

i
s

g
r
a
n
t
e
d

f
o
r

i
n
t
e
r
n
e
t

u
s
e
r
s

t
o

m
a
k
e

o
n
e

p
a
p
e
r

c
o
p
y

f
o
r

t
h
e
i
r

o
w
n

p
e
r
s
o
n
a
l

u
s
e
.

F
u
r
t
h
e
r

r
e
p
r
o
d
u
c
t
i
o
n
,

o
r

a
n
y

c
o
p
y
i
n
g

o
f

m
a
c
h
i
n
e
-
r
e
a
d
a
b
l
e

f
i
l
e
s

(
i
n
c
l
u
d
i
n
g

t
h
i
s

o
n
e
)

t
o

a
n
y

s
e
r
v
e
r
c
o
m
p
u
t
e
r
,

i
s

s
t
r
i
c
t
l
y

p
r
o
h
i
b
i
t
e
d
.

T
o

o
r
d
e
r

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

b
o
o
k
s
o
r

C
D
R
O
M
s
,

v
i
s
i
t

w
e
b
s
i
t
e
h
t
t
p
:
/
/
w
w
w
.
n
r
.
c
o
m

o
r

c
a
l
l

1
-
8
0
0
-
8
7
2
-
7
4
2
3

(
N
o
r
t
h

A
m
e
r
i
c
a

o
n
l
y
)
,
o
r

s
e
n
d

e
m
a
i
l

t
o

d
i
r
e
c
t
c
u
s
t
s
e
r
v
@
c
a
m
b
r
i
d
g
e
.
o
r
g

(
o
u
t
s
i
d
e

N
o
r
t
h

A
m
e
r
i
c
a
)
.
pivots, but at the cost of a tendency to wander around doing steepest descent in
very un-steep degenerate valleys.
These considerations suggest that, in practice, one might as well stop iterating
on the rst or second occasion that
2
decreases by a negligible amount, say either
less than 0.01 absolutely or (in case roundoff prevents that being reached) some
fractional amount like 10
3
. Dont stop after a step where
2
increases: That only
shows that has not yet adjusted itself optimally.
Once the acceptable minimum has been found, one wants to set = 0 and
compute the matrix
[C] []
1
(15.5.15)
which, as before, is the estimated covariance matrix of the standard errors in the
tted parameters a (see next section).
The following pair of functions encodes Marquardts method for nonlinear
parameter estimation. Much of the organization matches that used in lfit of 15.4.
In particular the array ia[1..ma] must be input with components one or zero
corresponding to whether the respective parameter values a[1..ma] are to be tted
for or held xed at their input values, respectively.
The routine mrqmin performs one iteration of Marquardts method. It is rst
called (once) with alamda < 0, which signals the routine to initialize. alamda is set
on the rst and all subsequent calls to the suggested value of for the next iteration;
a and chisq are always given back as the best parameters found so far and their

2
. When convergence is deemed satisfactory, set alamda to zero before a nal call.
The matrices alpha and covar (which were used as workspace in all previous calls)
will then be set to the curvature and covariance matrices for the converged parameter
values. The arguments alpha, a, and chisq must not be modied between calls,
nor should alamda be, except to set it to zero for the nal call. When an uphill
step is taken, chisq and a are given back with their input (best) values, but alamda
is set to an increased value.
The routine mrqmin calls the routine mrqcof for the computation of the matrix
[] (equation 15.5.11) and vector (equations 15.5.6 and 15.5.8). In turn mrqcof
calls the user-supplied routine funcs(x,a,y,dyda), which for input values x x
i
and a a calculates the model function y y(x
i
; a) and the vector of derivatives
dyda y/a
k
.
#include "nrutil.h"
void mrqmin(float x[], float y[], float sig[], int ndata, float a[], int ia[],
int ma, float **covar, float **alpha, float *chisq,
void (*funcs)(float, float [], float *, float [], int), float *alamda)
Levenberg-Marquardt method, attempting to reduce the value
2
of a t between a set of data
points x[1..ndata], y[1..ndata] with individual standard deviations sig[1..ndata],
and a nonlinear function dependent on ma coecients a[1..ma]. The input array ia[1..ma]
indicates by nonzero entries those components of a that should be tted for, and by zero
entries those components that should be held xed at their input values. The program re-
turns current best-t values for the parameters a[1..ma], and
2
= chisq. The arrays
covar[1..ma][1..ma], alpha[1..ma][1..ma] are used as working space during most
iterations. Supply a routine funcs(x,a,yfit,dyda,ma) that evaluates the tting function
yfit, and its derivatives dyda[1..ma] with respect to the tting parameters a at x. On
the rst call provide an initial guess for the parameters a, and set alamda<0 for initialization
(which then sets alamda=.001). If a step succeeds chisq becomes smaller and alamda de-
creases by a factor of 10. If a step fails alamda grows by a factor of 10. You must call this
686 Chapter 15. Modeling of Data
S
a
m
p
l
e

p
a
g
e

f
r
o
m

N
U
M
E
R
I
C
A
L

R
E
C
I
P
E
S

I
N

C
:

T
H
E

A
R
T

O
F

S
C
I
E
N
T
I
F
I
C

C
O
M
P
U
T
I
N
G

(
I
S
B
N

0
-
5
2
1
-
4
3
1
0
8
-
5
)
C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

C
a
m
b
r
i
d
g
e

U
n
i
v
e
r
s
i
t
y

P
r
e
s
s
.
P
r
o
g
r
a
m
s

C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

S
o
f
t
w
a
r
e
.

P
e
r
m
i
s
s
i
o
n

i
s

g
r
a
n
t
e
d

f
o
r

i
n
t
e
r
n
e
t

u
s
e
r
s

t
o

m
a
k
e

o
n
e

p
a
p
e
r

c
o
p
y

f
o
r

t
h
e
i
r

o
w
n

p
e
r
s
o
n
a
l

u
s
e
.

F
u
r
t
h
e
r

r
e
p
r
o
d
u
c
t
i
o
n
,

o
r

a
n
y

c
o
p
y
i
n
g

o
f

m
a
c
h
i
n
e
-
r
e
a
d
a
b
l
e

f
i
l
e
s

(
i
n
c
l
u
d
i
n
g

t
h
i
s

o
n
e
)

t
o

a
n
y

s
e
r
v
e
r
c
o
m
p
u
t
e
r
,

i
s

s
t
r
i
c
t
l
y

p
r
o
h
i
b
i
t
e
d
.

T
o

o
r
d
e
r

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

b
o
o
k
s
o
r

C
D
R
O
M
s
,

v
i
s
i
t

w
e
b
s
i
t
e
h
t
t
p
:
/
/
w
w
w
.
n
r
.
c
o
m

o
r

c
a
l
l

1
-
8
0
0
-
8
7
2
-
7
4
2
3

(
N
o
r
t
h

A
m
e
r
i
c
a

o
n
l
y
)
,
o
r

s
e
n
d

e
m
a
i
l

t
o

d
i
r
e
c
t
c
u
s
t
s
e
r
v
@
c
a
m
b
r
i
d
g
e
.
o
r
g

(
o
u
t
s
i
d
e

N
o
r
t
h

A
m
e
r
i
c
a
)
.
routine repeatedly until convergence is achieved. Then, make one nal call with alamda=0, so
that covar[1..ma][1..ma] returns the covariance matrix, and alpha the curvature matrix.
(Parameters held xed will return zero covariances.)
{
void covsrt(float **covar, int ma, int ia[], int mfit);
void gaussj(float **a, int n, float **b, int m);
void mrqcof(float x[], float y[], float sig[], int ndata, float a[],
int ia[], int ma, float **alpha, float beta[], float *chisq,
void (*funcs)(float, float [], float *, float [], int));
int j,k,l;
static int mfit;
static float ochisq,*atry,*beta,*da,**oneda;
if (*alamda < 0.0) { Initialization.
atry=vector(1,ma);
beta=vector(1,ma);
da=vector(1,ma);
for (mfit=0,j=1;j<=ma;j++)
if (ia[j]) mfit++;
oneda=matrix(1,mfit,1,1);
*alamda=0.001;
mrqcof(x,y,sig,ndata,a,ia,ma,alpha,beta,chisq,funcs);
ochisq=(*chisq);
for (j=1;j<=ma;j++) atry[j]=a[j];
}
for (j=1;j<=mfit;j++) { Alter linearized tting matrix, by augmenting di-
agonal elements. for (k=1;k<=mfit;k++) covar[j][k]=alpha[j][k];
covar[j][j]=alpha[j][j]*(1.0+(*alamda));
oneda[j][1]=beta[j];
}
gaussj(covar,mfit,oneda,1); Matrix solution.
for (j=1;j<=mfit;j++) da[j]=oneda[j][1];
if (*alamda == 0.0) { Once converged, evaluate covariance matrix.
covsrt(covar,ma,ia,mfit);
covsrt(alpha,ma,ia,mfit); Spread out alpha to its full size too.
free_matrix(oneda,1,mfit,1,1);
free_vector(da,1,ma);
free_vector(beta,1,ma);
free_vector(atry,1,ma);
return;
}
for (j=0,l=1;l<=ma;l++) Did the trial succeed?
if (ia[l]) atry[l]=a[l]+da[++j];
mrqcof(x,y,sig,ndata,atry,ia,ma,covar,da,chisq,funcs);
if (*chisq < ochisq) { Success, accept the new solution.
*alamda *= 0.1;
ochisq=(*chisq);
for (j=1;j<=mfit;j++) {
for (k=1;k<=mfit;k++) alpha[j][k]=covar[j][k];
beta[j]=da[j];
}
for (l=1;l<=ma;l++) a[l]=atry[l];
} else { Failure, increase alamda and return.
*alamda *= 10.0;
*chisq=ochisq;
}
}
Notice the use of the routine covsrt from 15.4. This is merely for rearranging
the covariance matrix covar into the order of all ma parameters. The above routine
also makes use of
15.5 Nonlinear Models 687
S
a
m
p
l
e

p
a
g
e

f
r
o
m

N
U
M
E
R
I
C
A
L

R
E
C
I
P
E
S

I
N

C
:

T
H
E

A
R
T

O
F

S
C
I
E
N
T
I
F
I
C

C
O
M
P
U
T
I
N
G

(
I
S
B
N

0
-
5
2
1
-
4
3
1
0
8
-
5
)
C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

C
a
m
b
r
i
d
g
e

U
n
i
v
e
r
s
i
t
y

P
r
e
s
s
.
P
r
o
g
r
a
m
s

C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

S
o
f
t
w
a
r
e
.

P
e
r
m
i
s
s
i
o
n

i
s

g
r
a
n
t
e
d

f
o
r

i
n
t
e
r
n
e
t

u
s
e
r
s

t
o

m
a
k
e

o
n
e

p
a
p
e
r

c
o
p
y

f
o
r

t
h
e
i
r

o
w
n

p
e
r
s
o
n
a
l

u
s
e
.

F
u
r
t
h
e
r

r
e
p
r
o
d
u
c
t
i
o
n
,

o
r

a
n
y

c
o
p
y
i
n
g

o
f

m
a
c
h
i
n
e
-
r
e
a
d
a
b
l
e

f
i
l
e
s

(
i
n
c
l
u
d
i
n
g

t
h
i
s

o
n
e
)

t
o

a
n
y

s
e
r
v
e
r
c
o
m
p
u
t
e
r
,

i
s

s
t
r
i
c
t
l
y

p
r
o
h
i
b
i
t
e
d
.

T
o

o
r
d
e
r

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

b
o
o
k
s
o
r

C
D
R
O
M
s
,

v
i
s
i
t

w
e
b
s
i
t
e
h
t
t
p
:
/
/
w
w
w
.
n
r
.
c
o
m

o
r

c
a
l
l

1
-
8
0
0
-
8
7
2
-
7
4
2
3

(
N
o
r
t
h

A
m
e
r
i
c
a

o
n
l
y
)
,
o
r

s
e
n
d

e
m
a
i
l

t
o

d
i
r
e
c
t
c
u
s
t
s
e
r
v
@
c
a
m
b
r
i
d
g
e
.
o
r
g

(
o
u
t
s
i
d
e

N
o
r
t
h

A
m
e
r
i
c
a
)
.
#include "nrutil.h"
void mrqcof(float x[], float y[], float sig[], int ndata, float a[], int ia[],
int ma, float **alpha, float beta[], float *chisq,
void (*funcs)(float, float [], float *, float [], int))
Used by mrqmin to evaluate the linearized tting matrix alpha, and vector beta as in (15.5.8),
and calculate
2
.
{
int i,j,k,l,m,mfit=0;
float ymod,wt,sig2i,dy,*dyda;
dyda=vector(1,ma);
for (j=1;j<=ma;j++)
if (ia[j]) mfit++;
for (j=1;j<=mfit;j++) { Initialize (symmetric) alpha, beta.
for (k=1;k<=j;k++) alpha[j][k]=0.0;
beta[j]=0.0;
}
*chisq=0.0;
for (i=1;i<=ndata;i++) { Summation loop over all data.
(*funcs)(x[i],a,&ymod,dyda,ma);
sig2i=1.0/(sig[i]*sig[i]);
dy=y[i]-ymod;
for (j=0,l=1;l<=ma;l++) {
if (ia[l]) {
wt=dyda[l]*sig2i;
for (j++,k=0,m=1;m<=l;m++)
if (ia[m]) alpha[j][++k] += wt*dyda[m];
beta[j] += dy*wt;
}
}
*chisq += dy*dy*sig2i; And nd
2
.
}
for (j=2;j<=mfit;j++) Fill in the symmetric side.
for (k=1;k<j;k++) alpha[k][j]=alpha[j][k];
free_vector(dyda,1,ma);
}
Example
The following function fgauss is an example of a user-supplied function
funcs. Used with the above routine mrqmin (in turn using mrqcof, covsrt, and
gaussj), it ts for the model
y(x) =
K

k=1
B
k
exp
_

_
x E
k
G
k
_
2
_
(15.5.16)
which is a sum of K Gaussians, each having a variable position, amplitude, and
width. We store the parameters in the order B
1
, E
1
, G
1
, B
2
, E
2
, G
2
, . . . , B
K
,
E
K
, G
K
.
688 Chapter 15. Modeling of Data
S
a
m
p
l
e

p
a
g
e

f
r
o
m

N
U
M
E
R
I
C
A
L

R
E
C
I
P
E
S

I
N

C
:

T
H
E

A
R
T

O
F

S
C
I
E
N
T
I
F
I
C

C
O
M
P
U
T
I
N
G

(
I
S
B
N

0
-
5
2
1
-
4
3
1
0
8
-
5
)
C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

C
a
m
b
r
i
d
g
e

U
n
i
v
e
r
s
i
t
y

P
r
e
s
s
.
P
r
o
g
r
a
m
s

C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

S
o
f
t
w
a
r
e
.

P
e
r
m
i
s
s
i
o
n

i
s

g
r
a
n
t
e
d

f
o
r

i
n
t
e
r
n
e
t

u
s
e
r
s

t
o

m
a
k
e

o
n
e

p
a
p
e
r

c
o
p
y

f
o
r

t
h
e
i
r

o
w
n

p
e
r
s
o
n
a
l

u
s
e
.

F
u
r
t
h
e
r

r
e
p
r
o
d
u
c
t
i
o
n
,

o
r

a
n
y

c
o
p
y
i
n
g

o
f

m
a
c
h
i
n
e
-
r
e
a
d
a
b
l
e

f
i
l
e
s

(
i
n
c
l
u
d
i
n
g

t
h
i
s

o
n
e
)

t
o

a
n
y

s
e
r
v
e
r
c
o
m
p
u
t
e
r
,

i
s

s
t
r
i
c
t
l
y

p
r
o
h
i
b
i
t
e
d
.

T
o

o
r
d
e
r

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

b
o
o
k
s
o
r

C
D
R
O
M
s
,

v
i
s
i
t

w
e
b
s
i
t
e
h
t
t
p
:
/
/
w
w
w
.
n
r
.
c
o
m

o
r

c
a
l
l

1
-
8
0
0
-
8
7
2
-
7
4
2
3

(
N
o
r
t
h

A
m
e
r
i
c
a

o
n
l
y
)
,
o
r

s
e
n
d

e
m
a
i
l

t
o

d
i
r
e
c
t
c
u
s
t
s
e
r
v
@
c
a
m
b
r
i
d
g
e
.
o
r
g

(
o
u
t
s
i
d
e

N
o
r
t
h

A
m
e
r
i
c
a
)
.
#include <math.h>
void fgauss(float x, float a[], float *y, float dyda[], int na)
y(x; a) is the sum of na/3 Gaussians (15.5.16). The amplitude, center, and width of the
Gaussians are stored in consecutive locations of a: a[i] = B
k
, a[i+1] = E
k
, a[i+2] = G
k
,
k = 1, ..., na/3. The dimensions of the arrays are a[1..na], dyda[1..na].
{
int i;
float fac,ex,arg;
*y=0.0;
for (i=1;i<=na-1;i+=3) {
arg=(x-a[i+1])/a[i+2];
ex=exp(-arg*arg);
fac=a[i]*ex*2.0*arg;
*y += a[i]*ex;
dyda[i]=ex;
dyda[i+1]=fac/a[i+2];
dyda[i+2]=fac*arg/a[i+2];
}
}
More Advanced Methods for Nonlinear Least Squares
The Levenberg-Marquardt algorithm can be implemented as a model-trust
region method for minimization (see 9.7 and ref. [2]) applied to the special case
of a least squares function. A code of this kind due to Mor e [3] can be found in
MINPACK [4]. Another algorithm for nonlinear least-squares keeps the second-
derivative term we dropped in the Levenberg-Marquardt method whenever it would
be better to do so. These methods are called full Newton-type methods and
are reputed to be more robust than Levenberg-Marquardt, but more complex. One
implementation is the code NL2SOL [5].
CITED REFERENCES AND FURTHER READING:
Bevington, P.R. 1969, Data Reduction and Error Analysis for the Physical Sciences (New York:
McGraw-Hill), Chapter 11.
Marquardt, D.W. 1963, Journal of the Society for Industrial and Applied Mathematics, vol. 11,
pp. 431441. [1]
Jacobs, D.A.H. (ed.) 1977, The State of the Art in Numerical Analysis (London: Academic Press),
Chapter III.2 (by J.E. Dennis).
Dennis, J.E., and Schnabel, R.B. 1983, Numerical Methods for Unconstrained Optimization and
Nonlinear Equations (Englewood Cliffs, NJ: Prentice-Hall). [2]
Mor e, J.J. 1977, in Numerical Analysis, Lecture Notes in Mathematics, vol. 630, G.A. Watson,
ed. (Berlin: Springer-Verlag), pp. 105116. [3]
Mor e, J.J., Garbow, B.S., and Hillstrom, K.E. 1980, User Guide for MINPACK-1, Argonne National
Laboratory Report ANL-80-74. [4]
Dennis, J.E., Gay, D.M, and Welsch, R.E. 1981, ACM Transactions on Mathematical Software,
vol. 7, pp. 348368; op. cit., pp. 369383. [5].
15.6 Condence Limits on Estimated Model Parameters 689
S
a
m
p
l
e

p
a
g
e

f
r
o
m

N
U
M
E
R
I
C
A
L

R
E
C
I
P
E
S

I
N

C
:

T
H
E

A
R
T

O
F

S
C
I
E
N
T
I
F
I
C

C
O
M
P
U
T
I
N
G

(
I
S
B
N

0
-
5
2
1
-
4
3
1
0
8
-
5
)
C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

C
a
m
b
r
i
d
g
e

U
n
i
v
e
r
s
i
t
y

P
r
e
s
s
.
P
r
o
g
r
a
m
s

C
o
p
y
r
i
g
h
t

(
C
)

1
9
8
8
-
1
9
9
2

b
y

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

S
o
f
t
w
a
r
e
.

P
e
r
m
i
s
s
i
o
n

i
s

g
r
a
n
t
e
d

f
o
r

i
n
t
e
r
n
e
t

u
s
e
r
s

t
o

m
a
k
e

o
n
e

p
a
p
e
r

c
o
p
y

f
o
r

t
h
e
i
r

o
w
n

p
e
r
s
o
n
a
l

u
s
e
.

F
u
r
t
h
e
r

r
e
p
r
o
d
u
c
t
i
o
n
,

o
r

a
n
y

c
o
p
y
i
n
g

o
f

m
a
c
h
i
n
e
-
r
e
a
d
a
b
l
e

f
i
l
e
s

(
i
n
c
l
u
d
i
n
g

t
h
i
s

o
n
e
)

t
o

a
n
y

s
e
r
v
e
r
c
o
m
p
u
t
e
r
,

i
s

s
t
r
i
c
t
l
y

p
r
o
h
i
b
i
t
e
d
.

T
o

o
r
d
e
r

N
u
m
e
r
i
c
a
l

R
e
c
i
p
e
s

b
o
o
k
s
o
r

C
D
R
O
M
s
,

v
i
s
i
t

w
e
b
s
i
t
e
h
t
t
p
:
/
/
w
w
w
.
n
r
.
c
o
m

o
r

c
a
l
l

1
-
8
0
0
-
8
7
2
-
7
4
2
3

(
N
o
r
t
h

A
m
e
r
i
c
a

o
n
l
y
)
,
o
r

s
e
n
d

e
m
a
i
l

t
o

d
i
r
e
c
t
c
u
s
t
s
e
r
v
@
c
a
m
b
r
i
d
g
e
.
o
r
g

(
o
u
t
s
i
d
e

N
o
r
t
h

A
m
e
r
i
c
a
)
.
15.6 Condence Limits on Estimated Model
Parameters
Several times already in this chapter we have made statements about the standard
errors, or uncertainties, in a set of M estimated parameters a. We have given some
formulas for computing standard deviations or variances of individual parameters
(equations 15.2.9, 15.4.15, 15.4.19), as well as some formulas for covariances
between pairs of parameters (equation 15.2.10; remark following equation 15.4.15;
equation 15.4.20; equation 15.5.15).
In this section, we want to be more explicit regarding the precise meaning
of these quantitative uncertainties, and to give further information about how
quantitative condence limits on tted parameters can be estimated. The subject
can get somewhat technical, and even somewhat confusing, so we will try to make
precise statements, even when they must be offered without proof.
Figure 15.6.1 shows the conceptual scheme of an experiment that measures
a set of parameters. There is some underlying true set of parameters a
true
that are
known to Mother Nature but hidden from the experimenter. These true parameters
are statistically realized, along with random measurement errors, as a measured data
set, which we will symbolize as D
(0)
. The data set D
(0)
is known to the experimenter.
He or she ts the data to a model by
2
minimization or some other technique, and
obtains measured, i.e., tted, values for the parameters, which we here denote a
(0)
.
Because measurement errors have a random component, D
(0)
is not a unique
realization of the true parameters a
true
. Rather, there are innitely many other
realizations of the true parameters as hypothetical data sets each of which could
have been the one measured, but happened not to be. Let us symbolize these
by D
(1)
, D
(2)
, . . . . Each one, had it been realized, would have given a slightly
different set of tted parameters, a
(1)
, a
(2)
, . . . , respectively. These parameter sets
a
(i)
therefore occur with some probability distribution in the M-dimensional space
of all possible parameter sets a. The actual measured set a
(0)
is one member drawn
from this distribution.
Even more interesting than the probability distribution of a
(i)
would be the
distribution of the difference a
(i)
a
true
. This distribution differs from the former
one by a translation that puts Mother Natures true value at the origin. If we knewthis
distribution, we would know everything that there is to know about the quantitative
uncertainties in our experimental measurement a
(0)
.
So the name of the game is to nd some way of estimating or approximating
the probability distribution of a
(i)
a
true
without knowing a
true
and without having
available to us an innite universe of hypothetical data sets.
Monte Carlo Simulation of Synthetic Data Sets
Although the measured parameter set a
(0)
is not the true one, let us consider
a ctitious world in which it was the true one. Since we hope that our measured
parameters are not too wrong, we hope that that ctitious world is not too different
from the actual world with parameters a
true
. In particular, let us hope no, let us
assume that the shape of the probability distribution a
(i)
a
(0)
in the ctitious
world is the same, or very nearly the same, as the shape of the probability distribution

You might also like