Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Fitting A Logistic Curve To Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12
At a glance
Powered by AI
The author describes fitting a logistic curve to time-series data on algal biomass using MATLAB. A logistic curve is used to model population growth with limited resources. The author finds the parameters that minimize the error between the logistic curve and the data.

The author plots the given data, then uses MATLAB to solve the logistic differential equation symbolically and numerically. The parameters r, t0, and K that define the logistic curve are estimated by contour mapping and optimized using fminsearch.

The steps are: 1) define an error function, 2) estimate initial parameters using contour mapping, 3) use fminsearch to find parameters minimizing the error, 4) compute K using the estimated parameters

Fitting a Logistic Curve to Data

David Arnold
February 24, 2002

1 Introduction
This activity is based on an excellent article, Fitting a Logistic Curve to Data, by Fabio Cavallini,
which appears in the College Mathematics Journal, 1993, Volume 24, Number 3, Pages: 247-253.
In his article, Dr. Cavallini describes a number of Mathematica routines he designed to fit a logistic
curve to a given set of data. In this activity, we will design Matlab routines to accomplish a similar
fitting of the logistic equation to Dr. Cavallini’s data set.

2 The Data Set


To quote Dr. Cavallini, “Ecological problems are nowadays of general concern. In particular, in
Italy much attention is being paid to the problem of algal blooms in the Adriatic Sea and this has
led also to an increase of interest in mathematical ecology at all levels, from high school teaching
to advanced research. The logistic differential equation, dealt with in the next section, is a classical
but still useful model for describing the dynamics of a one-species population in an environment
with limited resources. For example, the data in Table 1 represent the time evolution of an algal
sample taken in the Adriatic Sea and do seem to follow a logistic curve.”

time 11 15 18 23 26 31
biomass 0.00476 0.0105 0.0207 0.0619 0.337 0.74
time 39 44 54 64 74
biomass 1.7 2.45 3.5 4.5 5.09

Table 1: Measured data. Time is expressed in days and biomass is expressed in mm 2 , since what
is actually measured is the surface covered by biomass in a microscope sample.

We begin by obtaining a plot of the data. First, enter the data in Matlab.
T=[11,15,18,23,26,31,39,44,54,64,74]
M=[0.00476,0.0105,0.0207,0.0619,0.337,0.74,1.7,2.45,3.5,4.5,5.09]
For upcoming calculations, we need to insure that our data is entered in column vectors. This is
accomplished with Matlab’s transpose operator.
T=T’;
M=M’;

1
Next, plot the data as discrete points with the following command.
plot(T,M,’o’)
A title and axis labels add a professional touch.
title(’Time evolution of algal sample.’)
xlabel(’Time (days)’)
ylabel(’Biomass (mm^2)’)
These commands will create an image similar to that shown in Figure 1.

Time evolution of algal sample.


6

5
Biomass (mm2)

0
0 20 40 60 80
Time (days)

Figure 1: Plot of the data listed in Table 1

3 The Logistic Equation


Dr. Cavallini uses the logistic equation in the form
dm  m
= rm 1 − , (1)
dt K
where t is time, m = m(t) is the biomass, and r and K are positive parameters. Using separation
of variables, it can be shown that the solution of the logistic equation is
K
m(t) = , (2)
1 + Ce−rt
where C is an arbitrary constant. It is interesting to allow Matlab to provide a solution.

2
syms m r k t
m=dsolve(’Dm=r*m*(1-m/K)’,’t’)
Matlab responds with the following solution.
m =

K/(1+exp(-r*t)*C1*K)
Letting C = KC1 in this result provides equation (2).
A somewhat tricky calculation provides the second derivative of m with respect to t.

CKr2 ert (C − ert )


m00 (t) = . (3)
(C + ert )3
Of course, you can also allow Matlab to do this calculation for you.
m2=diff(m,t,2)

m2 =

2*K^3/(1+exp(-r*t)*C1*K)^3*r^2*exp(-r*t)^2*C1^2-K^2/(1+exp(-r*t)*C1*K)^2*r^2*exp(-r*t)*C1
Of course, you will want to simplify this result.
m2=simple(m2)

m2 =

K^2*r^2*exp(-r*t)*C1*(exp(-r*t)*C1*K-1)/(1+exp(-r*t)*C1*K)^3
A little algebraic manipulation (let C = KC1 and multiply numerator and denominator by e3rt )
will reveal that this is identical to equation (3).
The graph of m = m(t) has a point of inflection when the second derivative in equation (3) is
zero. The second derivative provided by equation (3) is zero when its numerator is zero; that is,
when C = ert . So, if we put C = ert0 , where t0 is the time when the point of inflection occurs, then
the solution (2) become
K
m(t) = . (4)
1+e −r(t−t 0)

Note that the biomass m approaches the carrying capacity K as t → ∞.

4 The Curve Fitting Algorithm


Dr. Cavallini now states: “We now show a procedure for fitting, in the least squares sense, a logistic
curve (4) to a given data set (ti , mi ) for i = 1, 2, . . . , n. In symbols, the problem is to minimize,
for K, r and t0 varying in the real line, the error
n
X 2
e= m(ti ) − mi . (5)
i=1

3
Time evolution of algal sample.
6

Biomass (mm2)
4 (ti,m(t1))
(ti,mi)
3

0
0 20 40 60 80
Time (days)

Figure 2: Finding the square of the error.

This warrants some explanation. In Figure 2, we’ve plotted the data set as discrete points and
overlayed a potential solution to our curve fitting goal. Note that the coordinates of the given data
point are (ti , mi ), while the coordinates of the point with the same abscissa on the fitted curve are
(ti , m(ti )). The “error” is m(ti ) − mi . Because some of the data points lie below the fitted curve
while others lie above, we square to insure that our error is positive. That is, the squared error
made at the time value ti is 2
m(ti ) − mi . (6)
Thus, the total squared error in fitting the curve to our n data points is given by equation (5).
Clearly, the object of the game is to minimize the total least squared error presented by equation (5).
This is why we say that we are fitting a logistic to the data set in a “least squares sense.”

4.1 Eliminating a Parameter


The difficulty inherent in solving this least squares problem is evident in the nonlinear equation
(4). There are three parameters. Most numerical optimization routines require that the user make
a guess at the solution before the routine proceeds. If we eliminate K as one of the parameters,
then there will remain only two, r and t0 . As we shall see, with only two parameters, there are
some nice Matlab routines that we can apply to find an estimate of the least squares solution.
So, we let
m(t) = Kh(t), (7)
where
1
h(t) = . (8)
1+ e−r(t−t0 )

4
4.2 The Power of Linear Algebra
At this point, Dr. Cavallini takes advantage of the power of linear algebra to simplify the calcula-
tions. He starts by saying that the error given in equation (5) can be written
e = K 2 hH, Hi − 2KhH, Mi + hM, Mi, (9)
where H and M are the vectors
H = hh(t1 ), h(t2 ), . . . , h(tn )i, and
M = hm1 , m2 , . . . , mn i.
This statement definitely warrants some explanation, particularly if you haven’t taken linear algebra
as yet.
First, consider two vectors
a = ha1 , a2 , . . . , an i, and
b = hb1 , b2 , . . . , bn i.
The dot product of a and b, written ha, bi, is defined as
ha, bi = a1 b1 + a2 b2 + · · · + an bn . (10)
The dot product possesses a number of useful algebraic properties.
1. The dot product is commutative.
ha, bi = hb, ai
2. A scalar can be moved about according to the following rule.
hca, bi = ha, cbi = cha, bi

3. The dot product is distributive with respect to addition.


ha, b + ci = ha, bi + ha, ci

The Pythagorean Theorem holds equally well in n dimensions and we define the length of a
vector a, denoted by kak, by
kak2 = a21 + a22 + · · · + a2n . (11)
It is important to note that
kak2 = ha, ai. (12)
We are now in a position to explain Dr. Cavallini’s statement in equation (9). First, note that
n
X 2
e= m(ti ) − mi
i=1
2
= m(t1 ) − m1 )2 + · · · + m(tn ) − mn
2 2
= Kh(t1 ) − m1 + · · · + Kh(tn ) − mn
= khKh(t1 ) − m1 , . . . , Kh(tn ) − mn ik2
= kK < h(t1 ), . . . , h(tn ) > − < m1 , . . . , mn > k2
= kKH − Mk2 .

5
Now, using equation (11) and the algebraic properties of the dot product, we can write

e = kKH − Mk2
= hKH − M, KH − Mi
= K 2 hH, Hi − KhH, Mi − KhM, Hi + hM, Mi
= K 2 hH, Hi − 2KhH, Mi + hM, Mi

4.3 Minimizing
In equation (9), it’s important to note that the total squared error e still contains three parameters,
K, r, and t0 . The idea is to adjust these parameters so as to minimize e. In single variable calculus,
you find a minimum by taking the first derivative and setting it equal to zero. It is no different in
multivariable calculus, the only difference being that we must take partial derivatives with respect
to each parameter. In this case, we set ∂e/∂K equal to zero and solve for K.

∂e
=0
∂K
2KhH, Hi − 2hH, Mi = 0
hH, Mi
K= (13)
hH, Hi

Now, substitute this result in equation (9) to get

hH, Mi2
e = hM, Mi − . (14)
hH, Hi

Equation (14) contains just two parameters, r and t0 , the parameter K being eliminated. This
equation, we shall find, is much easier to deal with, due to the reduction in the number of parameters
present.

4.4 Writing a Function to Evaluate e


We will now plot the graph e, as defined by equation (14), as a function of r and t0 . The graph will
be a surface in three dimensions and we locate the solution to our least squares error by locating
the lowest point on this “error surface.” We begin by defining a domain in the rt0 -plane.
It is a simple matter to determine an interval containing t0 , as it must lie in the range of the
given time data. Thus, considering the data in Table 1, the inflection point occurs at t0 , where t0
is some value such that 11 ≤ t0 ≤ 74.
A tougher project is to determine a likely range for the reproductive growth rate r. However,
taking the derivative of the function in equation (4),

KRe−r(t−t0 )
m0 (t) = . (15)
(1 + e−r(t−t0 ) )2

Thus,
Kr
m0 (t0 ) = ,
4

6
or, equivalently,
4m0 (t0 )
r= . (16)
K
If we examine the data plotted in Figure 1, it would appear that the point of inflection occurs near
the point having t-value t = 44, so let’s take t0 = 44. We can get a close approximation for the slope
at t0 , that is, m0 (t0 ), by calculating the slope of the line passing through the points immediately
preceding and following the point of inflection, that is, (39, 1.7) and (54, 3.5). Thus,
3.5 − 1.7
m0 (t0 ) ≈ ≈ 0.12. (17)
54 − 39
Substitute this result in equation (16), using the largest biomass as an approximation for K, that
is K ≈ 5.09. Thus,
4(0.12)
r≈ ≈ 0.0943. (18)
5.09
Thus, as a starting point, let’s suppose that r falls somewhere in the range defined by 0.01 ≤ r ≤ 0.6.
If we find this approximation inadequate, we will adjust and try again.
r=linspace(0.01,0.6,40);
t0=linspace(11,74,40);
[r,t0]=meshgrid(r,t0);
The meshgrid command creates matrices r and t0 that define a “grid” of points, with r-values
running from 0.01 to 0.6 in 40 equal increments, and t0 values running from 11 to 74 in 40 equal
increments. After the command [r,t0]=meshgrid(r,t0) both r and t 0 are matrices having 40
rows and 40 columns, each containing 1600 entries.
Next, we need to evaluate e at each point of the grid. We will do this by writing a Matlab function
to do the work for us. However, because our intent is to later call one of Matlab’s optimization
routines to find the minimal e-value on our surface, we must write the function in a very special
manner.
function e=myerror(x,t,m)
r=x(1);
t0=x(2);
h=1./(1+exp(-r*(t-t0)));
e=m’*m-(h’*m)^2/(h’*h);
This routine warrants some additional explanation.
First, note that the inputs to the routine, x, t, and m, are special. The variable x contains a
column vector, the first entry of which is r, and the second entry contains t0 . Note that the first
two lines in the function pluck these values from the vector x and store them in the variables r and
t0 , respectively.
Secondly, the variables t and m contain the time and biomass data. Note that these vectors will
hold the time and biomass data presented in Table 1.
Next, the vector h is computed using equation (8). Note the use of Matlab’s array operator ./
as this expression divides 1 by a vector of values. In this manner, we compute h(t) for each time
value in Table 1.

7
The last line of the routine is especially tricky, but not quite as tricky if you know the pertinent
fact from linear algebra. That is, if a and b are column vectors,
   
a1 b1
 a2   b2 
a= .  and b =  . ,
   
 ..   .. 
an bn

then

ha, bi = a1 b1 + a2 b2 + · · · + an bn
 
b1
 b2 

= a1 a2 · · · an  . 
 .. 
bn
= aT b.

Thus, m’*m computes the dot product hm, mi, h’*m computes the dot product hh, mi, and h’*h
computes the dot product hh, hi. Thus, the line e=m’*m-(h’*m)^2/(h’*h) computes the error as
defined by equation (14).

4.5 Plotting the Error Surface


We now need to plot the error surface by evaluating the function myerror at each point of the grid
we defined earlier. We begin by determining the size of the matrices r and t0 (they are both the
same size, so determining the size of either one is sufficient).
[m,n]=size(r);
Next, prepare a matrix having the same size as matrices r and t0 to contain the error at each (r, t0 )
pair.
e=zeros(size(r));
Now, run a double loop to evaluate the error at each point in the grid defined by matrices r and t 0 .
for i=1:m
for j=1:n
e(i,j)=myerror([r(i,j);t0(i,j)],T,M);
end
end
This last piece of code warrants a bit more explanation. Note that we call the function myerror by
passing it three arguments. The first argument is a column vector containing the current value of r
as its first entry and the current value of t0 as its second entry. Then we pass the time and biomass
vectors T and M that contain the data provided in Table 1. The function responds by calculating
the error, then places the answer in the appropriate position of the matrix e.
We can now easily plot the error surface.

8
mesh(r,t0,e)
We add a title and some appropriate axis labels.
xlabel(’r’)
ylabel(’t_0’)
zlabel(’e’)
title(’Plotting the error versus the parameters r and t_0’)
These commands produce the image shown in Figure 3. The solution to the least squares problem

Plotting the error versus the parameters r and t0

40

30

20
e

10

60 0.6
40 0.4
20 0.2
t0 r

Figure 3: A plot of the “error surface.”

lies in our ability to locate the minimum error on this surface; i.e, the lowest point on this surface
on the given domain. This can be approximated by clicking the “rotate” icon in the figure window
and examining the surface at different viewing angles by dragging the figure with the mouse.
However, we can get a better indication of where the minimum lies on this surface by crafting a
contour plot. This is easy to do in Matlab and requires no further preparation on our part. We feed
Matlab’s contour command the same data we gave the mesh command, but we also pass another
parameter which forces the drawing of 40 contours.
contour(r,t0,e,40)
Again, we add labels and a title.
xlabel(’r’)
ylabel(’t_0’)
title(’A contour map of the error versus r and t_0.’)

9
A contour map of the error versus r and t0.

70

60

t0 50

40

30

20

0.1 0.2 0.3 0.4 0.5 0.6


r

Figure 4: A contour map of the error surface.

These commands were used to craft the image in Figure 4. The contour map argues for the existence
of a minimum near the point (r, t0 ) = (0.1, 50). You might easily be led to a better approximation by
refining the contour map, adding more contours, or specifying values at which you want the contours
drawn. Type help contour at the Matlab prompt to get a full discussion on the capabilities of
Matlab’s contour command. We only need a rough estimate to use as a starting point to Matlab’s
sophisticated optimization routines, so we’ll settle for (r, t0 ) = (0.1, 50).

4.6 Finding the Minimum Error


We will now use Matlab’s fminsearch command to find the minimal point on the error surface. For
a full description of the capabilities of fminsearch, type help fminsearch at Matlab’s prompt.
The calling syntax the we will use follows.
X = FMINSEARCH(FUN,X0,OPTIONS,P1,P2,...)
The arguments to fminsearch need some explanation. First, the variable FUN contains the name
used to define the function to be minimized. In our case, this is myerror. Then comes the argument
X0, a column vector containing the initial guess. In our case, we send the column vector [0.1;50].
Next, the variable OPTIONS contains a structure containing options sent to the solver. These options
are set with Matlab’s optimset command. On our first effort, we will not use this feature and pass
an empty matrix to OPTIONS. Finally, we can pass parameters after passing options, and it is our
intent to pass the vector containing the time and biomass data from Table 1. That is why we
defined the function myerror as we did.
There is little to do at this point as all preparation have already been completed. We need only
call the optimization routine which will pass back the minimum error in the variable min.

10
min=fminsearch(@myerror,[0.1;50],[],T,M)

min =

0.1213
45.7748
Hence, r = 0.1213 and t0 = 45.7748 is the location of the minimum error on the error surface.
Finally, we use equation (13) to compute K. First, we pluck r and t0 from the variable min.
r=min(1);
t0=min(2);
Next, we compute the vector H.
H=1./(1+exp(-r*(T-t0)));
Finally, we compute the carrying capacity K using equation (13).
K=(H’*M)/(H’*H)

K =

5.0949

4.7 Plotting the Resulting Logistic


All that remains to be done is the plotting of the fitted logistic on the plot containing the time
and biomass data from Table 1. This is a simple matter, now that we know that r = 0.1213,
t0 = 45.7748, and K = 5.0949 are the parameter values that will minimize the least square error.
t=linspace(10,75);
y=K./(1+exp(-r*(t-t0)));
plot(T,M,’o’,t,y)
We add axis labels and a title.
title(’Time evolution of algal sample.’)
xlabel(’Time (days)’)
ylabel(’Biomass (mm^2)’)
These commands produce an image similar to that in Figure 5.
Well, that’s a pretty impressive fit!

5 Homework
For homework, use the technique of this activity to fit the logistic equation to the United State
population data given in Section 3.2, page 140, Table 2, in Differential Equations, Polking, Boggess,
and Arnold.

11
Time evolution of algal sample.
6

5
Biomass (mm2)

0
0 20 40 60 80
Time (days)

Figure 5: Fitting the logistic.

12

You might also like