ExamplesR Power Law
ExamplesR Power Law
Colin S. Gillespie
http://tuvalu.santafe.edu/aaronc/powerlaws/data.htm
or loaded directly
library("poweRlaw")
data("moby")
To fit a discrete power law to this data1 , we use the displ constructor
m_pl = displ$new(moby)
The resulting object, m pl, is a displ2 object. It also inherits the discrete distribution class.
After creating the displ object, a typical first step would be to infer model parameters.3 We can
estimate the lower threshold, via
est = estimate_xmin(m_pl)
m_pl$setXmin(est)
For a given value xmin , the scaling parameter is estimated by numerically optimising the log-
likelihood. The optimiser is initialised using the analytical MLE
" n #1
X xi
' 1 + n log .
xmin 0.5
i=1
This yields a threshold estimate of xmin = 7 and scaling parameter = 1.95, which matches
results found in Clauset etal. (2009).
Alternatively, we could perform a parameter scan for each value of xmin
1
The object moby is a simple R vector.
2
displ: discrete power law.
3
When the displ object is first created, the default parameter values are NULL and xmin is set to the minimum
x-value.
1
Examples
1e+00
1e01
1e02
CDF
1e03
1e04
Figure1: Data CDF of the Moby Dick data set. The fitted power law (green line), log-normal
(red line) and poisson (blue) distributions are also given.
m_ln = dislnorm$new(moby)
est = estimate_xmin(m_ln)
which yields a lower threshold of xmin = 3 and parameters (17.9, 4.87). A similar procedure is
applied to fit the Poisson distribution; we create a distribution object using dispois, then fit as
before.
The data CDF and lines of best fit can be easily plotted
plot(m_pl)
lines(m_pl, col=2)
lines(m_ln, col=3)
lines(m_pois, col=4)
to obtain figure 1. It clear that the Poisson distribution is not appropriate for this data set.
However, the log-normal and power law distribution both provide reasonable fits to the data.
4
dislnorm: discrete log normal object
5
For example, bootstrap(m ln).
6
The output of this bootstrapping procedure can be obtained via data(bootstrap moby).
2
Examples
By default, the bootstrap function will use the maximum likelihood estimate to estimate the
parameter and check all values of xmin . When possible xmin values are large, then it is recommend
that the search space is reduced. For example, this function call
2, 4, 6, . . . , 20 .
sd(bs$bootstraps[,2])
## [1] 1.879
sd(bs$bootstraps[,3])
## [1] 0.02447
to obtain figure 2. This top row of graphics in figure 2 give a 95% confidence interval for the mean
estimate of the parameters. The bottom row of graphics give a 95% confidence for the standard
deviation of the parameters. The parameter trim in the plot function controls the percentage of
samples displayed.8 When trim=0.1, we only display the final 90% of data.
We can also construct histograms.
hist(bs$bootstraps[,2])
hist(bs$bootstraps[,3])
to get figure 3.
A similar bootstrap analysis can be obtained for the log-normal distribution
7
For single parameter models, pars should be a vector. For the log-normal distribution, pars should be a matrix
of values.
8
When trim=0, all iterations are displayed.
3
Examples
6.8
1.950
6.7
1.949
Par 1
Xmin 6.6 1.948
1.947
6.5
1.946
1000 2000 3000 4000 5000 1000 2000 3000 4000 5000
Iteration Iteration
0.027
2.0
0.026
Par 1
Xmin
1.9
0.025
1.8
0.024
1000 2000 3000 4000 5000 1000 2000 3000 4000 5000
Iteration Iteration
Figure2: Results from the standard bootstrap procedure (for the power law model) using the
Moby Dick data set: bootstrap(m pl). The top row shows the mean estimate of
parameters xmin and . The bottom row shows the estimate of standard deviation for
each parameter. The dashed-lines give approximate 95% confidence intervals. After
5,000 iterations, the standard deviation of xmin and is estimated to be 2.1 and 0.03
respectively.
bs1 = bootstrap(m_ln)
in this case we would obtain uncertainty estimates for both of the log-normal parameters.
4
Examples
500
1500
400
Frequency
Frequency
1000 300
200
500
100
0 0
Figure3: Characterising uncertainty in parameter values. (a) xmin uncertainty (standard devia-
tion 2) (b) uncertainty (std dev. 0.03)
bs_p = bootstrap_p(m_pl)
The point estimate of the p -value is one of the elements of the bs p object9
bs_p$p
## [1] 0.6778
plot(bs_p)
to obtain figure 4. The graph in the top right hand corner gives the cumulative estimate of the
p -value; the final value of the purple line corresponds to bs p$p. Also given are approximate 95%
confidence intervals.
m_ln$setXmin(m_pl$getXmin())
9
Also given is the average time of a single bootstrap: bs p$sim time = 1.75 seconds.
10
While the bootstrap method is useful, it is computationally intensive and will be unsuitable for most models.
5
Examples
pvalue
0.68
Par 1
Xmin
7.7
1.953
7.6 0.66
1.952
7.5
0.64
7.4 1.951
1000 2000 3000 4000 5000 1000 2000 3000 4000 5000 1000 2000 3000 4000 5000
3.2
3.1 0.024
3.0
0.023
2.9
1000 2000 3000 4000 5000 1000 2000 3000 4000 5000
Iteration Iteration
Figure4: Results from the bootstrap procedure (for the power law model) using the Moby Dick
data set: bootstrap p(m pl). The top row shows the mean estimate of parameters
xmin , and the p -value. The bottom row shows the estimate of standard deviation for
each parameter. The dashed-lines give approximate 95% confidence intervals.
est = estimate_pars(m_ln)
m_ln$setPars(est)
This comparison gives a p-value of 0.6824. This p -value corresponds to the p-value on page 29 of
the Clauset etal. paper (the paper gives 0.69).
Overall these results suggest that one model cant be favoured over the other.
xmins = 1:1500
6
Examples
2.8
2.6
2.4
2.2
2.0
1.8
1.6
Figure5: Estimated parameter values conditional on the threshold, xmin . The horizontal line
corresponds to = 1.95.
est_scan = 0*xmins
Next, we loop over the xmin values and estimate the parameter value conditional on the xmin value
for(i in seq_along(xmins)){
m_pl$setXmin(xmins[i])
est_scan[i] = estimate_pars(m_pl)$pars
}
The results are plotted figure 5. For this data set, as the lower threshold increases, so does the
point estimate of .
7
Examples
http://tuvalu.santafe.edu/aaronc/powerlaws/data/blackouts.txt
blackouts = read.table("blackouts.txt")
Although the blackouts data set is discrete, since the values are large it makes sense to treat the
data as continuous. Continuous power law objects take vectors as inputs, so
m_bl = conpl$new(blackouts$V1)
est = estimate_xmin(m_bl)
This gives a point estimate of xmin = 50000. We can then update the distribution object
m_bl$setXmin(est)
plot(m_bl)
lines(m_bl, col=2, lwd=2)
m_bl_ln = conlnorm$new(blackouts$V1)
est = estimate_xmin(m_bl_ln)
m_bl_ln$setXmin(est)
It is clear from figure 6 that the log-normal distribution provides a better fit to this data set.
8
Examples
1.000
0.500
0.200
0.100
CDF
0.050
0.020
0.010
0.005
Figure6: CDF plot of the blackout dataset with line of best fit. Since the minimum value of x is
large, we fit a continuous power law as this is more it efficient. The power law fit is the
green line, the discrete log-normal is the red line.
9
Examples
data("native_american")
data("us_american")
Each data set is a data frame with two columns. The first column is number of casualties recorded,
the second the conflict date
head(native_american, 3)
## Cas Date
## 1 18 1776-07-15
## 2 26 1776-07-20
## 3 13 1776-07-20
The records span around one hundred years, 1776 1890. The data is plotted in figure 7.
It is straightforward to fit a discrete power law to this data set. First, we create discrete power
law objects
1000
500
#Casualties
100
50
10
5
Figure7: Casualty record for the Indian-American war, 1776 1890. Native Americans casual-
ties (purple circles) and US Americans casualties (green triangles). Data taken from
Friedman (2014).
10
Examples
1.000
0.500
0.100
0.050
CDF
0.010
0.005
0.001
1 5 10 50 500
Figure8: Plots of the CDFs for the Native American and US American casualties. The lines of
best fit are also given.
m_na = displ$new(native_american$Cas)
m_us = displ$new(us_american$Cas)
m_na$setXmin(est_na)
m_us$setXmin(est_us)
plot(m_na)
lines(m_na)
## Don't create a new plot, just store the output
d = plot(m_us, draw=FALSE)
points(d$x, d$y, col=2)
lines(m_us, col=2)
The result is given in figure 8. The tails of the distributions appear to follow a power law. This
is consistent with the expectation that smaller-scale engagements are less likely to be recorded.
However, for larger scale engagements, it is very likely that a record is made.
References
J.C. Bohorquez, S.Gourley, A.R. Dixon, M.Spagat, and N.F. Johnson. Common ecology quan-
tifies human insurgency. Nature, 462(7275):911914, 2009.
11
Examples
A.Clauset, C.R. Shalizi, and M.E.J. Newman. Power-law distributions in empirical data. SIAM
Review, 51(4):661703, 2009.
J.A. Friedman. Using power laws to estimate conflict size. The Journal of Conflict Resolution,
2014.
M.E.J. Newman. Power laws, Pareto distributions and Zipfs law. Contemporary Physics, 46(5):
323351, 2005.
Q.H. Vuong. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica:
Journal of the Econometric Society, 57:307333, 1989.
12