Fitting A Logistic Curve To Data
Fitting A Logistic Curve To Data
Fitting A Logistic Curve To Data
David Arnold
February 24, 2002
1 Introduction
This activity is based on an excellent article, Fitting a Logistic Curve to Data, by Fabio Cavallini,
which appears in the College Mathematics Journal, 1993, Volume 24, Number 3, Pages: 247-253.
In his article, Dr. Cavallini describes a number of Mathematica routines he designed to fit a logistic
curve to a given set of data. In this activity, we will design Matlab routines to accomplish a similar
fitting of the logistic equation to Dr. Cavallini’s data set.
time 11 15 18 23 26 31
biomass 0.00476 0.0105 0.0207 0.0619 0.337 0.74
time 39 44 54 64 74
biomass 1.7 2.45 3.5 4.5 5.09
Table 1: Measured data. Time is expressed in days and biomass is expressed in mm 2 , since what
is actually measured is the surface covered by biomass in a microscope sample.
We begin by obtaining a plot of the data. First, enter the data in Matlab.
T=[11,15,18,23,26,31,39,44,54,64,74]
M=[0.00476,0.0105,0.0207,0.0619,0.337,0.74,1.7,2.45,3.5,4.5,5.09]
For upcoming calculations, we need to insure that our data is entered in column vectors. This is
accomplished with Matlab’s transpose operator.
T=T’;
M=M’;
1
Next, plot the data as discrete points with the following command.
plot(T,M,’o’)
A title and axis labels add a professional touch.
title(’Time evolution of algal sample.’)
xlabel(’Time (days)’)
ylabel(’Biomass (mm^2)’)
These commands will create an image similar to that shown in Figure 1.
5
Biomass (mm2)
0
0 20 40 60 80
Time (days)
2
syms m r k t
m=dsolve(’Dm=r*m*(1-m/K)’,’t’)
Matlab responds with the following solution.
m =
K/(1+exp(-r*t)*C1*K)
Letting C = KC1 in this result provides equation (2).
A somewhat tricky calculation provides the second derivative of m with respect to t.
m2 =
2*K^3/(1+exp(-r*t)*C1*K)^3*r^2*exp(-r*t)^2*C1^2-K^2/(1+exp(-r*t)*C1*K)^2*r^2*exp(-r*t)*C1
Of course, you will want to simplify this result.
m2=simple(m2)
m2 =
K^2*r^2*exp(-r*t)*C1*(exp(-r*t)*C1*K-1)/(1+exp(-r*t)*C1*K)^3
A little algebraic manipulation (let C = KC1 and multiply numerator and denominator by e3rt )
will reveal that this is identical to equation (3).
The graph of m = m(t) has a point of inflection when the second derivative in equation (3) is
zero. The second derivative provided by equation (3) is zero when its numerator is zero; that is,
when C = ert . So, if we put C = ert0 , where t0 is the time when the point of inflection occurs, then
the solution (2) become
K
m(t) = . (4)
1+e −r(t−t 0)
3
Time evolution of algal sample.
6
Biomass (mm2)
4 (ti,m(t1))
(ti,mi)
3
0
0 20 40 60 80
Time (days)
This warrants some explanation. In Figure 2, we’ve plotted the data set as discrete points and
overlayed a potential solution to our curve fitting goal. Note that the coordinates of the given data
point are (ti , mi ), while the coordinates of the point with the same abscissa on the fitted curve are
(ti , m(ti )). The “error” is m(ti ) − mi . Because some of the data points lie below the fitted curve
while others lie above, we square to insure that our error is positive. That is, the squared error
made at the time value ti is 2
m(ti ) − mi . (6)
Thus, the total squared error in fitting the curve to our n data points is given by equation (5).
Clearly, the object of the game is to minimize the total least squared error presented by equation (5).
This is why we say that we are fitting a logistic to the data set in a “least squares sense.”
4
4.2 The Power of Linear Algebra
At this point, Dr. Cavallini takes advantage of the power of linear algebra to simplify the calcula-
tions. He starts by saying that the error given in equation (5) can be written
e = K 2 hH, Hi − 2KhH, Mi + hM, Mi, (9)
where H and M are the vectors
H = hh(t1 ), h(t2 ), . . . , h(tn )i, and
M = hm1 , m2 , . . . , mn i.
This statement definitely warrants some explanation, particularly if you haven’t taken linear algebra
as yet.
First, consider two vectors
a = ha1 , a2 , . . . , an i, and
b = hb1 , b2 , . . . , bn i.
The dot product of a and b, written ha, bi, is defined as
ha, bi = a1 b1 + a2 b2 + · · · + an bn . (10)
The dot product possesses a number of useful algebraic properties.
1. The dot product is commutative.
ha, bi = hb, ai
2. A scalar can be moved about according to the following rule.
hca, bi = ha, cbi = cha, bi
The Pythagorean Theorem holds equally well in n dimensions and we define the length of a
vector a, denoted by kak, by
kak2 = a21 + a22 + · · · + a2n . (11)
It is important to note that
kak2 = ha, ai. (12)
We are now in a position to explain Dr. Cavallini’s statement in equation (9). First, note that
n
X 2
e= m(ti ) − mi
i=1
2
= m(t1 ) − m1 )2 + · · · + m(tn ) − mn
2 2
= Kh(t1 ) − m1 + · · · + Kh(tn ) − mn
= khKh(t1 ) − m1 , . . . , Kh(tn ) − mn ik2
= kK < h(t1 ), . . . , h(tn ) > − < m1 , . . . , mn > k2
= kKH − Mk2 .
5
Now, using equation (11) and the algebraic properties of the dot product, we can write
e = kKH − Mk2
= hKH − M, KH − Mi
= K 2 hH, Hi − KhH, Mi − KhM, Hi + hM, Mi
= K 2 hH, Hi − 2KhH, Mi + hM, Mi
4.3 Minimizing
In equation (9), it’s important to note that the total squared error e still contains three parameters,
K, r, and t0 . The idea is to adjust these parameters so as to minimize e. In single variable calculus,
you find a minimum by taking the first derivative and setting it equal to zero. It is no different in
multivariable calculus, the only difference being that we must take partial derivatives with respect
to each parameter. In this case, we set ∂e/∂K equal to zero and solve for K.
∂e
=0
∂K
2KhH, Hi − 2hH, Mi = 0
hH, Mi
K= (13)
hH, Hi
hH, Mi2
e = hM, Mi − . (14)
hH, Hi
Equation (14) contains just two parameters, r and t0 , the parameter K being eliminated. This
equation, we shall find, is much easier to deal with, due to the reduction in the number of parameters
present.
KRe−r(t−t0 )
m0 (t) = . (15)
(1 + e−r(t−t0 ) )2
Thus,
Kr
m0 (t0 ) = ,
4
6
or, equivalently,
4m0 (t0 )
r= . (16)
K
If we examine the data plotted in Figure 1, it would appear that the point of inflection occurs near
the point having t-value t = 44, so let’s take t0 = 44. We can get a close approximation for the slope
at t0 , that is, m0 (t0 ), by calculating the slope of the line passing through the points immediately
preceding and following the point of inflection, that is, (39, 1.7) and (54, 3.5). Thus,
3.5 − 1.7
m0 (t0 ) ≈ ≈ 0.12. (17)
54 − 39
Substitute this result in equation (16), using the largest biomass as an approximation for K, that
is K ≈ 5.09. Thus,
4(0.12)
r≈ ≈ 0.0943. (18)
5.09
Thus, as a starting point, let’s suppose that r falls somewhere in the range defined by 0.01 ≤ r ≤ 0.6.
If we find this approximation inadequate, we will adjust and try again.
r=linspace(0.01,0.6,40);
t0=linspace(11,74,40);
[r,t0]=meshgrid(r,t0);
The meshgrid command creates matrices r and t0 that define a “grid” of points, with r-values
running from 0.01 to 0.6 in 40 equal increments, and t0 values running from 11 to 74 in 40 equal
increments. After the command [r,t0]=meshgrid(r,t0) both r and t 0 are matrices having 40
rows and 40 columns, each containing 1600 entries.
Next, we need to evaluate e at each point of the grid. We will do this by writing a Matlab function
to do the work for us. However, because our intent is to later call one of Matlab’s optimization
routines to find the minimal e-value on our surface, we must write the function in a very special
manner.
function e=myerror(x,t,m)
r=x(1);
t0=x(2);
h=1./(1+exp(-r*(t-t0)));
e=m’*m-(h’*m)^2/(h’*h);
This routine warrants some additional explanation.
First, note that the inputs to the routine, x, t, and m, are special. The variable x contains a
column vector, the first entry of which is r, and the second entry contains t0 . Note that the first
two lines in the function pluck these values from the vector x and store them in the variables r and
t0 , respectively.
Secondly, the variables t and m contain the time and biomass data. Note that these vectors will
hold the time and biomass data presented in Table 1.
Next, the vector h is computed using equation (8). Note the use of Matlab’s array operator ./
as this expression divides 1 by a vector of values. In this manner, we compute h(t) for each time
value in Table 1.
7
The last line of the routine is especially tricky, but not quite as tricky if you know the pertinent
fact from linear algebra. That is, if a and b are column vectors,
a1 b1
a2 b2
a= . and b = . ,
.. ..
an bn
then
ha, bi = a1 b1 + a2 b2 + · · · + an bn
b1
b2
= a1 a2 · · · an .
..
bn
= aT b.
Thus, m’*m computes the dot product hm, mi, h’*m computes the dot product hh, mi, and h’*h
computes the dot product hh, hi. Thus, the line e=m’*m-(h’*m)^2/(h’*h) computes the error as
defined by equation (14).
8
mesh(r,t0,e)
We add a title and some appropriate axis labels.
xlabel(’r’)
ylabel(’t_0’)
zlabel(’e’)
title(’Plotting the error versus the parameters r and t_0’)
These commands produce the image shown in Figure 3. The solution to the least squares problem
40
30
20
e
10
60 0.6
40 0.4
20 0.2
t0 r
lies in our ability to locate the minimum error on this surface; i.e, the lowest point on this surface
on the given domain. This can be approximated by clicking the “rotate” icon in the figure window
and examining the surface at different viewing angles by dragging the figure with the mouse.
However, we can get a better indication of where the minimum lies on this surface by crafting a
contour plot. This is easy to do in Matlab and requires no further preparation on our part. We feed
Matlab’s contour command the same data we gave the mesh command, but we also pass another
parameter which forces the drawing of 40 contours.
contour(r,t0,e,40)
Again, we add labels and a title.
xlabel(’r’)
ylabel(’t_0’)
title(’A contour map of the error versus r and t_0.’)
9
A contour map of the error versus r and t0.
70
60
t0 50
40
30
20
These commands were used to craft the image in Figure 4. The contour map argues for the existence
of a minimum near the point (r, t0 ) = (0.1, 50). You might easily be led to a better approximation by
refining the contour map, adding more contours, or specifying values at which you want the contours
drawn. Type help contour at the Matlab prompt to get a full discussion on the capabilities of
Matlab’s contour command. We only need a rough estimate to use as a starting point to Matlab’s
sophisticated optimization routines, so we’ll settle for (r, t0 ) = (0.1, 50).
10
min=fminsearch(@myerror,[0.1;50],[],T,M)
min =
0.1213
45.7748
Hence, r = 0.1213 and t0 = 45.7748 is the location of the minimum error on the error surface.
Finally, we use equation (13) to compute K. First, we pluck r and t0 from the variable min.
r=min(1);
t0=min(2);
Next, we compute the vector H.
H=1./(1+exp(-r*(T-t0)));
Finally, we compute the carrying capacity K using equation (13).
K=(H’*M)/(H’*H)
K =
5.0949
5 Homework
For homework, use the technique of this activity to fit the logistic equation to the United State
population data given in Section 3.2, page 140, Table 2, in Differential Equations, Polking, Boggess,
and Arnold.
11
Time evolution of algal sample.
6
5
Biomass (mm2)
0
0 20 40 60 80
Time (days)
12