HL Math IA v2
HL Math IA v2
HL Math IA v2
Ms. Hadden
IB HL Math
15 October 2016
IA Final Rough Draft
A History of Ebola Outbreaks through the SIR Model
Abstract
The first outbreak of Ebola occurred in the country currently recognized as the
Democratic Republic of the Congo in 1976. There have been multiple outbreaks in
different areas since then, but by far the most significant and most deadly was the 2014
outbreak.1 From both a mathematical and epidemiological standpoint, there is much to
be learned from this outbreak.
The SIR model is a method of calculating disease spread, working as a function of time,
from three equations of the number of people Susceptible to, Infected with, and
Recovered from a disease.2 By examining the outbreak of 2014 with this model, it can
be seen that //conclusion
Aim
The Ebola outbreak drew the publics attention to a serious deficiency in awareness and
research. It is a matter of public health and safety that the most accurate mathematical
methods are being used to predict and describe the spread of diseases, particularly
those which are capable of killing in the thousands. By analyzing this most recent
outbreak, I aim to explore the efficacy of the SIR model, and determine its values and
limitations in predicting the spread of Ebola.
Rationale
The SIR model has long been a standard in epidemiological models, as an intersection
of accuracy and simplicity.3 In order for a model to be worthwhile, it needs to be as
accurate as possible, clearly, but it is also important to consider things beyond accuracy.
Precision, as opposed to accuracy, is the ability of results to be replicated and
generalized. Data can be precise but not accurate, or accurate but not precise. When it
comes to modeling disease spread, the precision of the results is just as important as
the accuracy. While models are often used and proved retroactively, their most valuable
function is their ability to predict future disease spreads. If a model is too accurate, it will
not generalize well, meaning that it will lose precision when applied to situations outside
of the original. So a model that is accurately derived from a specific outbreak may
match actual results perfectly, but it must be detailed in order to reach that level of
accuracy. This means that if the same model, accurate in one outbreak, is used to
predict the results of a new outbreak, its results would be less predictive of reality than a
less accurate model. Essentially, a detailed or complex model is not necessarily
superior, and in order to generalize a model, and get the use out of it, a certain level of
accuracy must be sacrificed.
This contradiction, this classic struggle, between precision and accuracy is both a
fundamental principle of scientific study and a complex philosophical discussion, which I
find fascinating. It is similar to the Heisenberg principle of uncertainty, which asserts that
it is impossible to measure both the position and velocity of an object. It has a strong
mathematical foundation behind, because at a certain small size of measurement, the
uncertainty becomes large enough that the measurement loses all meaning. 4 However,
this mathematical equation also makes sense on a philosophical level. When you focus
too much on where an object is, you cant see where it is going, and vice versa. As in, if
you are too focused on one moment in time or point in your life, you cant properly see
where your life is headed. Conversely, if you are too focused on your future, you cant
properly appreciate each moment. Everything comes down to striking a perfect balance
between the two. The duality of this principle, the intersection of science and philosophy,
is beautiful to me, an art all its own.
The SIR Model is the perfect example of this conflict between precision and accuracy.
Researchers are constantly creating new and increasingly complex models to map the
spread of specific diseases, but the SIR model requires only three functions, and its
principles apply to a host of different diseases. 5 Therefore by analyzing this significant
Ebola outbreak, the practical efficiency of the SIR model can be explored and
4 (Schombert, 2005)
5 (Weisstein)
3
evaluated. In order to be justified, its results should compare well with actual statistics,
while avoiding unnecessary complicated calculations.
Introduction
This occurrence killed more than five times as many as all other known outbreaks
combined. As of January 2016, 11,315 people have been reported as having died from
the disease in six countries; Liberia, Guinea, Sierra Leone, Nigeria, the US and Mali.
The total number of reported cases is about 28,637. On 13 January, 2016, the World
Health Organization declared the last of the countries affected, Liberia, to be Ebolafree.6 As this outbreak has now come to end, it becomes important to reflect on the
meaningfulness of the data collected. This most recent outbreak caused more attention
to be drawn to Ebola worldwide than ever before. Ironically, it was also largely caused
by a lack of preparation and serious attention being given to the disease, prior to the
outbreak.
The SIR Model
The SIR Model uses the following three variables:
S = number of people that are susceptible to the disease
I = number of people infected with the disease
R = number of people recovered from the disease, with total immunity
The model assumes a fixed population of N people, and only works in a closed system,
where there are no births or deaths not caused by the disease. Therefore the total
population can be written as:
N = S + I + R7
Although it is a simplification, on short time scales, this use of a closed system is
beneficial for keeping the model neat.
Equation 1:
dS
=IS
dt
In Equation 1,
dS
dt
dS
dt
decreases proportionally to
and
the three categories. As people become infected, they are no longer susceptible to the disease.
The only way to leave the set of susceptible people is by becoming infected, therefore the
number of people who are susceptible to the disease is a function of the number those who are
already susceptible, the number of those who are already infected, and the amount of contact
between the susceptible and infected.
Equation 2:
dR
=I
dt
dR
dt
refers to the rate of change of the number of people recovered over time. This
illustrates that the rate of the number of people recovering is dependent upon the
number of people infected, as in order to become recovered, one must have been
infected. If the duration of the time infected is shorter, then the rate of infection
increases. Therefore, it is a proportional relationship between the recovery rate and the
infection rate. Again,
dI
=IS I
dt
Equation 3:
dI
dt
In equation 3,
is dependent on the number of people susceptible and the number of people infected,
as well as the infection rate of the disease between the two compartments. As the
population of
which
dI
dt
be more infected people, there must be a decrease in the number of susceptible people.
dI dS dR
=
dt
dt
dt
into which we
In addition to
1
D
The rate at which the disease is spread is the reciprocal of the duration of the disease,
as a certain individual can only experience one recovery in a given period of time. For
example, if the duration of the time spent infected is 10 days, then the rate at which an
infected person becomes recovered is:
1
=0.1=10
10
Equation 5:
M
S
This equation show that the infection rate of the disease is dependent on the morality
rate and the number of people susceptible to the disease. This value is always between
0 and 1, where a value of 1 suggests a 100% infection rate and a value of 0 suggests a
8 (Epatko, 2014)
7
0% infection rate. For example, if the mortality rate of the population is 50% and the
number of people susceptible is 100, then the rate of infection would be
=
0.5
=0.005,0.5
100
D=10
=
1
=0.1
10
As discussed earlier, the mortality rate of Ebola is 0.7 and the number of people
susceptible is 4292419.
(therate of infection)=
0.7
=1.63 107
4292419
In order to use the SIR model to predict the evolution of the disease, it would be helpful
if we could solve the system of differential equations. Unfortunately, we cannot
completely solve these equations with an explicit formula solution. 11
dI
dR
dS dt
,
dt
dt
S value+
dS
dt
for that point in time. Here can be seen the transition from t = 0 to t =
1. Using equations 1, 2 and 3 from earlier, the following values for the three rates of
change of S, I and R can be calculated.
dS
dt
t=0
dI
dt
t =0
dR
dt
=0.1 846 = 85
t=0
11 (Matemtic, 2013)
9
Therefore, at t = 1,
S(t) = 4292419581=4291838
The following table shows the results of this calculation over a two month period.
Susceptible
Infected
Recovered
ds/dt
dI/dt
dr/dt
S+I+R
4292419
846
735
-581
496
85
4294000
4291838
1342
820
-922
788
134
4294000
4290916
2130
954
-1462
1249
213
4294000
4289454
3379
1167
-2319
1981
338
4294000
4287134
5361
1505
-3677
3141
536
4294000
4283457
8502
2041
-5827
4977
850
4294000
4277631
13478
2891
-9225
7877
1348
4294000
4268406
21355
4239
-14585
12449
2136
4294000
4253821
33804
6374
-23008
19627
3380
4294000
4230814
53432
9755
-36169
30826
5343
4294000
10
4194644
84258
15098
-56549
48123
8426
4294000
11
4138095
132381
23524
-87649
74411
13238
4294000
12
4050446
206792
36762
-134016
113337
20679
4294000
13
3916430
320129
57441
-200602
168589
32013
4294000
14
3715828
488718
89454
-290559
241687
48872
4294000
15
3425269
730405
138326
-400294
327253
73041
4294000
16
3024975
1057658
211366
-511902
406137
105766
4294000
17
2513073
1463795
317132
-588580
442200
146379
4294000
18
1924493
1905995
463512
-586892
396292
190600
4294000
19
1337601
2302288
654111
-492727
262498
230229
4294000
20
844874
2564786
884340
-346707
90229
256479
4294000
21
498167
2655015
1140819
-211622
-53879
265501
4294000
22
286544
2601136
1406320
-119255
-140859
260114
4294000
23
167290
2460277
1666434
-65853
-180175
246028
4294000
24
101437
2280102
1912461
-37006
-191004
228010
4294000
25
64431
2089097
2140471
-21537
-187373
208910
4294000
26
42895
1901724
2349381
-13052
-177121
190172
4294000
27
29843
1724604
2539554
-8235
-164226
172460
4294000
28
21608
1560378
2712014
-5395
-150643
156038
4294000
29
16213
1409735
2868052
-3657
-137316
140973
4294000
30
12556
1272418
3009025
-2556
-124686
127242
4294000
31
10000
1147733
3136267
-1836
-112937
114773
4294000
32
8164
1034796
3251040
-1352
-102128
103480
4294000
10
33
6812
932668
3354520
-1017
-92250
93267
4294000
34
5796
840418
3447787
-779
-83262
84042
4294000
35
5016
757155
3531828
-608
-75108
75716
4294000
36
4409
682047
3607544
-481
-67724
68205
4294000
37
3927
614324
3675749
-386
-61046
61432
4294000
38
3541
553277
3737181
-313
-55014
55328
4294000
39
3228
498263
3792509
-257
-49569
49826
4294000
40
2971
448694
3842335
-213
-44656
44869
4294000
41
2757
404038
3887205
-178
-40226
40404
4294000
42
2579
363813
3927608
-150
-36231
36381
4294000
43
2429
327581
3963990
-127
-32631
32758
4294000
44
2302
294951
3996748
-109
-29386
29495
4294000
45
2193
265564
4026243
-93
-26463
26556
4294000
46
2100
239101
4052799
-80
-23830
23910
4294000
47
2019
215271
4076709
-70
-21458
21527
4294000
48
1950
193814
4098236
-60
-19321
19381
4294000
49
1889
174493
4117618
-53
-17397
17449
4294000
50
1837
157096
4135067
-46
-15663
15710
4294000
51
1791
141433
4150777
-41
-14103
14143
4294000
52
1750
127330
4164920
-36
-12697
12733
4294000
53
1714
114633
4177653
-31
-11432
11463
4294000
54
1683
103201
4189116
-28
-10292
10320
4294000
55
1655
92909
4199436
-25
-9266
9291
4294000
56
1631
83642
4208727
-22
-8342
8364
4294000
57
1609
75300
4217091
-19
-7511
7530
4294000
58
1589
67789
4224621
-17
-6762
6779
4294000
59
1572
61028
4231400
-15
-6087
6103
4294000
60
1557
54940
4237503
-14
-5480
5494
4294000
dS/dt
dI/dt
dR/dt
S+I+R
gamma
beta
B2
C2
D2
E2
F2
G2
B2 + C2 +
D2
3
t+1
B3+E3
C2+F2
D2+G2
-g*I3*B3
B*I3*B3
g*I3
g*I3
B3+E3+C2+
F2+D2+G2
Data Analysis
11
This shows the initial steep increase in the number of infected, that eventually levels
out, while at the same time the number of recovered people increases. The three
equations relate to each other in a way that fits with the way Ebola was likely spread,
with a large increase at the beginning that gradually decreases as awareness of the
disease spreads. This peak in I could also be calculated by taking the derivative of I,
which is
dI
, and finding where it is equal to zero. Checking the table, we see that the
dt
derivative of I goes from positive to negative between t = 20 and t = 21, meaning that
with this model, 20 days into the spread of the disease saw the highest number of
patients actively experiencing Ebola.
Also note that the number of susceptible people will never read zero, only tending
towards it, because the only way for the entire population to be unsusceptible would be
a complete wipe of the population or the introduction of a vaccine.
Discussion of the SIR model
Values
12
It is a very quick and straightforward model. With minimal outside data, we were able to
realistically model the spread of Ebola. As the efficiency of computing increases, this
becomes more and more important. It also has clearly defined parameters for such
outside data, like the mortality rate of a disease, making it easier and more valid to
generalize to another disease.
Limitation
The calculation of the beta values and gamma values are often inaccurate because
small deviation from the correct value can result in great changes in the overall model.
For example, changing the gamma value from 0.1 to 0.3 can lead to the following
changes:
In this situation, a skewed value in the duration of sickness can drastically alter the
results.
A main weakness of this model is that it relies on a closed ecosystem, meaning it
cannot and does not account for any births or any deaths caused by something other
than the disease. This is, of course, unrealistic. On a small scale, the differences may
be negligible, but before too much weight is placed on the SIR models predictions, a
way to compensate for this would need to be created.
13
I model
I actual
has been officially declared ended, we can compare the SIR models predictions to the
actual outcome in Liberia, using statistics from the WHO12.
Time
(days)
I model
I actual
10
84258
1378
15
730405
1680
20
2564786
1871
25
2089097
2046
30
1272418
2407
35
757155
3022
40
448694
3280
45
265564
3696
50
157096
3834
55
92909
4076
60
54940
4262
Because of the limitations of graphing the two models on the same set of axes, the
actual I data appears as like a graph of y = 0 in comparison to the SIR models results.
Therefore, it needs to be graphed separately to see the actual shape of the data.
I(t) actual
We can see that the SIR model has significantly inflated the number of people who were
infected with Ebola, and the overall shape of the graph is quite different. As discussed
earlier, however, a different gamma value can change the SIR model drastically, and is
difficult to calculate accurately. Accordingly, I was able to find a different gamma value
(the rate of recovery) that generated a result similar to the actual data. It is graphed
15
below in blue against the actual data, with a gamma value of 0.679995559.
In order to get a graph as close as this is to the actual data, I had to use nine significant
figures, and it still is not an exact match. This demonstrates the level of accuracy
required in the parameters for the SIR model to work, because the gamma value is
calculated through extreme simplification.
Conclusion
This exploration was able to evaluate the effectiveness of the SIR model as an
intersection of precision and accuracy. Clearly, after being compared to actual data, the
16
model cannot accurately account for all of the variances that affect disease spread, and
resulted in a prediction widely different from reality. However, the model, once adjusted
for an accurate rate of recovery, produced a remarkably similar result with a relatively
small amount of calculations involved. Therefore, while not being the most accurate
model for the spread of Ebola, the SIR model was able to be precise, and therefore
maintains an important role in the modeling of disease spread.
Bibliography
BBC News. (2016). Ebola: Mapping the Outbreak. British Broadcasting Company.
Dolgoarshinnykh, R., & Lalley, S. P. (2002). Epidemic Modeling: SIRS Models.
Epatko, L. (2014, October 16). 70 percent Ebola death rate? Heres how they calculate
it. Retrieved from PBS News Hour: http://www.pbs.org/newshour/rundown/70-percentebola-death-rate-calculate/
IB Maths Resources from British Internaional School Phuket. (2014). Modelling
Infectious Diseases.
Schombert, J. (2005, April 21). Uncertainty Principle. (U. o. Oregon, Producer)
Retrieved from 21st Century Science:
http://abyss.uoregon.edu/~js/21st_century_science/lectures/lec14.html
Smith, D., & Moore, L. (2004, December). The SIR Model for Spread of Disease - The
Differential Equational Model . Retrieved from Mathematical Association of America:
http://www.maa.org/press/periodicals/loci/joma/the-sir-model-for-spread-of-disease-thedifferential-equation-model
17
18