Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Opening The Black Box: An Open-Source Release of Maxent

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Ecography 40: 887–893, 2017

doi: 10.1111/ecog.03049
© 2017 The Authors. Ecography © 2017 Nordic Society Oikos
Subject Editor: Michael Borregaard. Editor-in-Chief: Miguel Araújo. Accepted 9 March 2017

Opening the black box: an open-source release of Maxent

Steven J. Phillips, Robert P. Anderson, Miroslav Dudík, Robert E. Schapire and Mary E. Blair­
S. J. Phillips (http://orcid.org/0000-0002-6991-608X) (mrmaxent@gmail.com) and M. E. Blair, Center for Biodiversity and Conservation,
American Museum of Natural History, New York, NY, USA. – R. P. Anderson, Dept of Biology, City College of New York, City Univ. of New York,
New York, NY, USA, and Program in Biology, Graduate Center, City Univ. of New York, New York, NY, USA, and Div. of Vertebrate Zoology
(Mammalogy), American Museum of Natural History, New York, NY, USA. – M. Dudík and R. E. Schapire, Microsoft Research, New York,
NY, USA.

This software note announces a new open-source release of the Maxent software for modeling species distributions from
occurrence records and environmental data, and describes a new R package for fitting such models. The new release (ver.
3.4.0) will be hosted online by the American Museum of Natural History, along with future versions. It contains small
functional changes, most notably use of a complementary log-log (cloglog) transform to produce an estimate of occurrence
probability. The cloglog transform derives from the recently-published interpretation of Maxent as an inhomogeneous
Poisson process (IPP), giving it a stronger theoretical justification than the logistic transform which it replaces by default. In
addition, the new R package, maxnet, fits Maxent models using the glmnet package for regularized generalized linear mod-
els. We discuss the implications of the IPP formulation in terms of model inputs and outputs, treating occurrence records
as points rather than grid cells and interpreting the exponential Maxent model (raw output) as as an estimate of relative
abundance. With these two open-source developments, we invite others to freely use and contribute to the software.

New hosting and licensing for Maxent at the American Museum of Natural History, at < https://
biodiversityinformatics.amnh.org/open_source/maxent >,
Maxent is a self-contained Java application for species dis- extending the role that the CBC has played in fostering the
tribution modeling (SDM) based on occurrence records development of Maxent and hosting the New York Species
(locations where the species has been found) together with Distribution Modeling Discussion Group for the past 15 yr.
environmental variables such as rainfall and temperature for In addition to the Maxent application, the new site contains
a surrounding study area (Phillips et al. 2006, Phillips and the existing tutorial and a few key publications, as well as a
Dudík 2008). Since performing well in a comparison of spe- link to the source code on GitHub (< https://github.com/
cies distribution modeling methods (Elith et al. 2006), it has mrmaxent/maxent >).
been widely used: Google Scholar reports more than 6000 In addition to the Java source code, we announce a new
citations for Phillips et al. (2006) at the time of writing. Until R package, maxnet (< https://CRAN.R-project.org/package = 
now the software source code has been owned by AT&T, but maxnet >; ver. 0.1.2, < https://github.com/mrmaxent/
the application has been freely available and hosted online maxnet >, authored by SJP), which implements Maxent
by Princeton Univ. (< www.cs.princeton.edu/%7Eschapire/ using the glmnet R package (Friedman et al. 2010) for model
maxent >). fitting. This new package takes advantage of the derivation
Despite documentation of the underlying mathematics, of Maxent as a form of infinitely-weighted logistic regres-
the software has sometimes been referred to as a black box, sion. It fits Maxent models using the same feature classes
since the underlying source code was not available. The source (linear, quadratic, hinge, etc.) and regularization options as
code is now released under the MIT open-source licence, and the Java version.
we invite interested developers to use and contribute to the
code. The geospatial community has been a leader in open
source and free software development (Bocher and Neteler Maxent and inhomogeneous Poisson processes
2012) and many in the ecology, evolution, and environmen-
tal science communities have called for increased openness as Maxent estimates the distribution (geographic range) of
not only an ethical imperative but also a necessity to answer a species by finding the distribution which has maximum
key pressing questions about global change (Wolkovich et al. entropy (i.e. is closest to geographically uniform) subject
2012). The Maxent application will henceforth be hosted to constraints derived from environmental conditions at
by the Center for Biodiversity and Conservation (CBC) recorded occurrence locations. The constraints are defined

887
in terms of ‘features’ (environmental variables such as tem- records are considered to be points (with zero area) in
perature, and simple functions of those variables such as geographic space rather than sites or quadrats of some non-
quadratic terms), and require that the mean of each feature zero area. The IPP then models the process of drawing some
should match the sample mean. This formulation is equiva- point locations randomly from the locations of all individu-
lent to maximizing the likelihood of a parametric exponen- als of the species (leaving aside complications due to sample
tial distribution (Phillips et al. 2004). More recently, it was selection bias; Phillips et al. 2009, Fithian and Hastie 2013,
noted that the exact same maximum likelihood exponential Renner et  al. 2015). Hence, it models occurrence records
model can be obtained from an inhomogeneous Poisson as being obtained with probability proportional to the local
process (IPP) (Aarts et  al. 2012, Fithian and Hastie 2013, abundance of the species. This contrasts with Phillips et al.
Renner and Warton 2013). This development is important (2006), who outlined an idealized data model in which the
for Maxent users, as it yields new interpretations of model domain D is a finite grid of equal-sized cells, with occur-
inputs and outputs, and allows the use of other software rence records corresponding to grid cells randomly selected
packages for fitting Maxent models. Here we give a very from those occupied by the species. In this latter case, a grid
brief overview of the IPP formulation, following Fithian and cell with a single individual is considered as likely to have
Hastie (2013); note that in following their notation, some of an occurrence record as a grid cell in which the species is
the same symbols (e.g. l) are used differently from previous very abundant; the difference between the two data models
papers describing Maxent. is more pronounced for larger grid sizes. Given the realities
An IPP is a widely-used model for a random set Z of of biological sampling, the truth surely lies somewhere
points falling in some domain D (Cressie 1993, Diggle between these two extremes, and depends strongly on the
2003). To apply it to species distribution modeling, we can data-collection methods used in the field (Renner et  al.
use the set of occurrence records for Z, while D is the geo- 2015). For example, the density of records derived from
graphic study area. The IPP can be defined by an intensity incidental sightings will likely be strongly affected by local
function l which assigns a non-negative real-valued intensity abundance; in contrast, presence-only data sets of occur-
l(z) to each point z in D. It can be thought of as indexing rence records from intensive sampling of transects will not
the likelihood that a point (here, an occurrence record of the distinguish between areas with a few individuals truly present
species) falls at or near z. We can define a probability density per transect versus those with many.
over the domain D by: For the IPP model, we expect to have more occurrence
records in areas and environmental conditions where the
pλ ( z ) = λ( z )/ ∫ λ( z )dz (1)
D species is abundant. However, care should be taken when
where the denominator simply makes pl(z) sum to 1. An IPP multiple records lie close together, since co-located or nearby
with intensity l is defined as an independent and identically records are often an artifact of spatially auto-correlated
distributed (i.i.d.) sample from pl, whose size (number of sampling (for example, records may be clustered around a
points) is a Poisson random variable with mean ∫ λ( z )dz . research station). For these reasons, the occurrence records
D may need to be thinned (Boria et al. 2014, Aiello-Lammens
Warton and Shepherd (2010) suggested modeling the occur- et al. 2015) to better match the IPP’s assumption of inde-
rence records for a species as arising from an IPP whose pendent samples. Note that this particular issue of clustered
intensity l(z) is a log-linear function of a vector of real-valued sampling is separate from that of spatially biased sampling
features x(z): (Fithian et al. 2015) and it may be necessary to apply both
λ( z ) = exp(α + b ′x( z )) (2) thinning and bias correction to the same dataset (Syfert et al.
2013).
The coefficient a is essentially a normalizing constant, giv- Additionally, the primary goal of SDM is often to model
ing no information about the species’ distribution – its and understand the environmental conditions inhabited
maximum likelihood value simply ensures that ∫ λ( z )dz by the species, rather than simply its geographic distribu-
D
equals the total number of occurrence records. Conditioned tion. Both for this use and to better estimate the geographic
on the number of occurrence records, the likelihood of the distribution, it is important that (to the degree possible)
IPP is the same as Maxent’s likelihood, since pl is exactly the occurrence data represent a random sample of suitable
Maxent’s exponential distribution. The maximum likelihood conditions in D. In addition to consideration of sampling
values of the coefficient vector b are therefore exactly the biases (see above), this requires a careful choice of the study
same as those given by Maxent. This equivalence still holds area D – see for example Renner et al. (2015) and discussion
true when using regularization (as is done by the Maxent of environmental equilibrium and noise assumptions of
software), by which a penalty term (Phillips et al. 2004, Elith Anderson (2013).
et al. 2011) is added to the log likelihood to penalize larger
values of the coefficients and thereby produce a simpler
Implications for model outputs
model.
The IPP model gives an estimate l(z) of the intensity of
Implications for model inputs occurrence records at or near the point z. If the sampling
effort is unbiased (an unlikely assumption – see Reddy
While the IPP model can be defined for a finite discrete and Dávalos 2003), this is also an estimate of the relative
domain (such as a regular grid), it is perhaps most natural abundance of the species: i.e. it is linearly proportional to
for a continuous D; for SDM, this means that occurrence the average number of individuals per unit area at or near z.

888
The proportionality constant c equal to the ratio of the true specifically, the new cloglog transformation estimates
abundance of the species to that predicted by the model probability of presence as:
cannot be determined from occurrence records alone (Fithian Probability of presence  1 − exp( − exp( H ) pλ ( z )) . (5)
and Hastie 2013). Rather, some independent measure or
estimate of total population size is required to estimate c Note that this estimate is appropriate for some quadrat
and hence absolute abundance. Maxent models (derived size, but we cannot say explicitly what that size is, since it
from occurrence records) have indeed been found to show depends on the (unknown) c (this is the important caveat
correlation with independently measured local abundance mentioned at the beginning of this section). At best, we
(VanDerWal et al. 2009, Weber et al. 2016). We note, how- can define the quadrat size implicitly: consider a point z
ever, that although a linear relationship is theoretically pre- whose log probability under pl equals the mean log prob-
dicted between l(z) (Maxent’s ‘raw’ output format) and local ability, i.e. ln(pl(z))  El[ln(pl)]. Such a point could be
abundance, a nonlinear (but still monotonic) relationship is called a ‘typical’ location of the species, as its predicted log
predicted for transformed outputs such as the logistic trans- abundance is average among all points where individu-
form used by VanDerWal et al. (2009). als of the species are found. The predicted probability of
occurrence in a quadrat centered at such a typical point z is
1–1/e ≈ 0.632, corresponding to a predicted abundance
of one individual per quadrat. This is similar to Maxent’s
From relative abundance to probability of presence logistic output, which gives a predicted probability of
occurrence of 0.5 for such a location. In general, the cloglog
The interpretation of Maxent as an IPP allows Maxent’s
transformed output is somewhat greater than the logistic
‘raw’ output format to be used directly as a model of rela-
one (Fig. 1), especially at higher values. The main effect of
tive abundance. However, many SDM uses call for models of
using the cloglog rather than the logistic transform is that
probability of presence. Additionally, maps made from raw
areas of moderately high output (yellow and orange in Fig. 2
output do not often match ecologists’ intuition about the
left) are more strongly predicted (relatively warmer colors in
(potential or realized) distribution of their study species. For
Fig. 2 right). We emphasize that the use of entropy as an
these reasons, Maxent’s default output scaling is a model of
offset is somewhat arbitrary, but has the advantage of being
probability of presence, with an important caveat. Consider
scale independent (for example, it would not be affected
a quadrat within the domain D, and assume the environ-
by changing units from meters to kilometers) and produces
mental conditions are constant within the quadrat. The IPP
output values (and hence mapped predictions) with good
estimates the species’ absolute abundance in the quadrat as a
visual discrimination across the same range of values (0–1)
Poisson variable with the following mean:
for all species. Importantly, whenever more is known about
Predicted mean abundance  c p A exp(α + b ′x( z )) (3) the species, such as its absolute abundance at some sites
where A is the area of the quadrat. The probability of pres- or its total population size, the use of entropy as an offset
ence of the species in the quadrat is the probability that can be avoided by deriving an estimate of c and therefore
there is at least one individual there, which according to the the probability of presence for quadrats of any given size.
Poisson distribution is: This is analogous to using additional information about the
Probability of presence  1 − exp( −c p A exp(α + b ′x( z )))  (4)
thus yielding a Bernoulli generalized linear model whose
link function is termed a complementary log-log (cloglog)
link (Fithian et al. 2015). Note, however, that this deriva-
tion relies on the species’ presence or absence at nearby sites
being independent. Therefore, it may not be appropriate
when species distributions (including patterns of abun-
dance) exhibit spatial dependence beyond that owing to
spatial autocorrelation of the utilized predictor variables.
For example, positive autocorrelation of individuals occurs
for flocking birds and for plants with limited dispersal abili-
ties, and negative autocorrelation for territorial mammals.
Note that the IPP literature includes methods to detect
spatial dependence (such as Ripley’s K-function) and to
incorporate it into the model (e.g. using area-interaction
processes) at the cost of increasing modeling complexity
(Renner et al. 2015).
Because of the above derivation, a cloglog transform
appears to be most appropriate for estimating probability of
presence, and Maxent ver. 3.4.0 now uses it by default. The
Figure 1. Comparison of logistic and cloglog transforms of Maxent
previous default (a logistic transform, Phillips and Dudík output. For simplicity, no constant offset is shown in the
2008) is now available as an option. For both transforms, transforms. A constant offset (such as the entropy of pl, as described
the entropy H = − E λ [ ln( pλ )] of the probability distri- in the text) would shift both curves to the left or to the right,
bution pl is used as a constant offset to the linear model; depending on its sign.

889
Figure 2. A Maxent model for the brown-throated three-toed sloth Bradypus variegatus using the logistic transform (left) and cloglog
transform (right). Occurrence data from Anderson and Handley (2001); for details on predictor variables, see Phillips et al. (2006).

species’ prevalence to derive an appropriate offset for the Maxent as infinitely-weighted logistic regression
logistic transform (Guillera-Arroita et al. 2014). (IWLR): the maxnet package
Although the above derivation of the cloglog trans-
form provides a stronger theoretical justification than the Because Maxent is an IPP, standard generalized linear mod-
robust Bayes argument (Phillips and Dudík 2008) for the eling software can be used to fit Maxent models via Poisson
logistic transform, the cloglog transform may have only a regression (Renner and Warton 2013), or even more con-
small effect on model performance. On a large reference veniently, using standard logistic regression (Fithian and
data set (that of Elith et  al. 2006), the cloglog transform Hastie 2013). Specifically, the latter authors showed that
marginally lowered values of model calibration (measured the coefficients b of the Maxent or IPP model can be fitted
by correlation with 0/1 evaluation data encoding observed via a weighting process they call infinitely-weighted logis-
absences/presences, known as the COR statistic) relative tic regression (IWLR). The idea is to fit a logistic model
to the logistic transform for models made with random to occurrence records (with response variable y  1) and
background data (Table 1). In contrast, and more impor- background data (points chosen randomly from the domain
tantly, it improved this measure of model performance D, with response variable y  0). This process yields coef-
when target-group background (Phillips et  al. 2009) was ficients b for an exponential model, and has been used in
used to reduce the effects of sample selection bias. The raw studies of resource selection by animals (Manly et al. 2002),
output (the exponential model of Eq. 1 and 2) substan- but may not produce the same values of the coefficients
tially underperformed cloglog (and other output formats), as Maxent. The novel contribution of Fithian and Hastie
which is as expected given that COR measures ability to (2013) was to give a large weight W to all the background
predict probability of presence rather than abundance. We data and to show that the limit (as W tends to infinity) of
note that rank-based statistics such as AUC (area under the the resulting vector of logistic regression coefficients equals
receiver operating characteristic curve) are unaffected by the Maxent (and IPP) coefficients. This allows Maxent (and
the logistic and cloglog transforms. IPP) models to be fitted using standard GLM software. The

Table 1. Comparison of performance of Maxent models with varying choices of feature classes and output transforms, for a reference data
set of occurrence records of 226 species, and presence/absence data for model evaluation (Elith et al. 2006). Maxent feature classes are
abbreviated as ‘l’ (linear), ‘q’ (quadratic), ‘p’ (product), ‘t’ (threshold) and ‘h’ (hinge). Results are shown for analyses run with random back-
ground pixels, as well as for those implementing a target-group background (Phillips et al. 2009). The AUC statistic measures area under the
receiver operating characteristic curve, while COR measures the correlation between model output and 0/1 evaluation data representing
observed absence/presence.

Random background Target-group background


Feature classes Output scaling Study with these defaults AUC COR AUC COR
Lqpt Cumulative Elith et al. 2006 0.7220 0.1989 0.7534 0.2368
Lqpth Logistic Phillips and Dudík 2008 0.7282 0.2110 0.7575 0.2447
Lqph Raw 0.7296 0.1855 0.7593 0.2404
Lqph Logistic 0.7296 0.2125 0.7593 0.2465
Lqph Cloglog Present paper: Maxent 0.7296 0.2120 0.7593 0.2479
Lqph Cloglog Present paper: maxnet 0.7271 0.2100 0.7587 0.2490

890
new R package for fitting Maxent models (maxnet, available generally and results in models that are smoother and sim-
at < https://CRAN.R-project.org/package = maxnet >) does pler, and hence likely to be more realistic. Avoiding use of
just this – leveraging the glmnet R package (Friedman et al. threshold features makes a small but useful improvement
2010) to fit an l1-regularized logistic regression model with a to the performance of Maxent (Table 1) on the data set of
large weight W. A weight of W  100 is used by default, in Elith et  al. (2006). The differences in AUC and COR are
contrast to a weight of 1 for occurrence records. similar to differences within the three groupings of modeling
Instead of upweighting background points, we may equiv- methods in Elith et al. (2006), but smaller than differences
alently downweight presence points (Renner et al. 2015). In among groupings. Apparently hinge features, which were
addition, the intercept term (a, above) can be manipulated introduced to Maxent later than threshold features (Phillips
by choosing the background weights based on the area of the and Dudík 2008), are best used as a replacement for thresh-
study region, so that the resulting IPP is scaled in units of old features rather than as a complement. Hinge features
occurrence records per unit area (Renner et al. 2015). Given provide at least as much flexibility in the fitted response to
the area of the study region, this weighting scheme could predictor variables as threshold features, while tending to
easily be used with maxnet, though it would not affect any reduce over-fitting to the training data. The scripts used to
of the standard output formats (raw, cloglog etc.) since none run Maxent with various settings for Table 1 appear in the
of them use a. Supplementary material Appendix 1. Unfortunately, the
There is a variety of R packages available for fitting point data set of Elith et al. (2006) is not yet publicly available, but
process models (Renner et  al. 2015), so why introduce if and when it is publicly released, a link to it will be added
another? The novel contribution of maxnet is to implement to the AMNH web site, so that the values in Table 1 can be
all the derived feature classes (especially hinge features) and easily replicated.
default tuned regularization values of the Maxent Java appli- We emphasize that the settings used in the analyses
cation, so that Maxent models can be fitted natively and reported in Table 1 are merely defaults, and the best choice
easily in R. The package is brief – about 200 lines of code for other data sets may be different (Merow et  al. 2013,
implementing feature classes and regularization parameters, Radosavljevic and Anderson 2014). We have also found
model fitting, predicting from a model, and plotting of that product features barely improve average performance
response curves. Additionally, it provides some simple use on the data set of Elith et al. (2006) (not shown), and could
examples based on the Bradypus variegatus brown-throated usually be omitted in order to make simpler and more
three-toed sloth data set from Phillips et  al. (2006). The easily interpreted models. Importantly, avoiding product
purpose of the package is to replicate the behaviour of the features enables the use of the Explain tool to interactively
Maxent Java application by using the equivalence with explore model predictions (Elith et al. 2010, Renner et al.
IPPs; this complements Renner et  al. (2015), who show 2015).
(Supplementary material Appendix 1 section 6) how to
adjust default settings in the Maxent application in order
to fit an IPP. Future directions
When run on the data set of Elith et  al. (2006), max-
net has similar performance to the Maxent Java applica- Species distribution models based on Maxent and IPP
tion. Small differences are likely due primarily to different remain an active area of research, as new methods are devel-
implementations of hinge features and different random oped to accommodate the challenging nature of occurrence
choices of background data. In order to limit computation data and species distributions (Fithian et al. 2015, Merow
time, the maxnet implementation of hinge features uses 50 et  al. 2016). The open-source release of the Maxent Java
hinge features per environmental variable by default, with code, together with the maxnet R package, will facilitate
evenly spaced knots, in contrast to Maxent which may use the work of others in improving the science of modeling
one knot per unique value of the environmental variable. species distributions. Similarly, we hope that it will facili-
The scripts used to run maxnet on that data set appear in tate the practical and public use of Maxent for mapping
the Supplementary material Appendix 1, along with further and preserving biodiversity, as done by the Atlas of Living
examples of usage. Australia (< www.ala.org.au/ >).
The IPP perspective on Maxent input requirements
and interpretation of model outputs should provide new
A change in default feature classes direction for research regarding studies of population
abundance. One conclusion is that under certain assump-
Both the previous and current releases of Maxent allow the tions regarding the occurrence data, Maxent’s raw (expo-
use of quadratic, product (or interaction), threshold (or step- nential) output can be interpreted as a model of relative
function) and hinge (piecewise linear) features, in addition abundance (Renner et al. 2015). A natural question arising
to the original environmental variables (linear features). from this is whether in practice, raw output indeed bet-
The default selection of feature classes for use in the model ter correlates with local abundance than logistic output (as
depends only on sample size, though l1-regularization forces used by VanDerWal et al. 2009). This should also inform
many coefficients to zero. Given enough occurrence records meta-analyses of relationships between SDM output and
(80), all of the derived feature classes were previously abundance and other measures of population performance
considered by the model. Version 3.4.0 differs by omitting (Weber et al. 2016).
threshold features by default (although they are available as The IPP model also offers new insights into spatial
an option), since this appears to improve model performance dependence for Maxent models, along with some tools for

891
detecting spatial trends in residuals (Renner et  al. 2015). and innovate towards improvement of the software, and to
When we use the IPP intensity to infer a Poisson distribu- generate new open-source software resources and tools.
tion for abundance within a quadrat (Eq. 3) and thereby To cite Maxent or acknowledge its use, cite this Software
determine probability of presence therein, we make a strong note as follows, substituting the version of the application
independence assumption, namely that presence of the spe- that you used for ‘version 0’:
cies at nearby points within the quadrat is conditionally inde- Phillips, S. J., Anderson, R. P., Dudík, M., Schapire, R. E. and
pendent given the predictor variables. This assumption may Blair, M. E. 2017. Opening the black box: an open-source
often be violated on fine spatial scales, for example because release of Maxent. – Ecography 40: 000–000 (ver. 0).
individuals in close proximity interact through competition,
reproduction, etc., and disturbances such as fire impose spa-
tial signatures on species’ patterns of distribution (including Acknowledgements – We thank the Center for Biodiversity and
Conservation at the American Museum of Natural History for
abundance). Outstanding questions include: how impor-
hosting the open-source release of Maxent, especially via the efforts
tant is this violation in practice when using IPP/Maxent for of Eleanor Sterling, Peter Ersts and Ned Horning. We appreciate
modeling abundance or probability of presence from occur- the thoughtful and helpful comments made by reviewers of the first
rence data? In what circumstances should modelers resort draft of this paper.
to more complex variants of IPPs (such as Gibbs or Cox Funding – RPA acknowledges the support of the U.S. National
processes; Renner et  al. 2015) to explicitly model spatial Science Foundation (NSF DEB-1119915 and DBI-1650241).­­
dependence? Similar issues arise when modeling abundance
from count data: patterns of occurrence and abundance may
be affected by different processes, resulting in excess zeroes References
in the abundance data (Wenger and Freeman 2008). This is
likely related to the finding that Maxent models predict a Aarts, G. et  al. 2012. Comparative interpretation of count,
‘potential/maximal abundance’, which may not be attained presence–absence and point methods for species distribution
at all sites (VanDerWal et al. 2009). Zero-inflated models are models. – Methods Ecol. Evol. 3: 177–187.
Aiello-Lammens, M. E. et  al. 2015. spThin: an R package for
often used in place of simpler Poisson models when mod- spatial thinning of species occurrence records for use in
eling abundance from count data that exhibit excess zeroes ecological niche models. – Ecography 38: 541–545.
(Barry and Welsh 2002). Perhaps similar ideas can be applied Anderson, R. P. 2013. A framework for using niche models to
to occurrence data. estimate impacts of climate change on species distributions.
The maxnet R package encodes feature classes and – Ann. N. Y. Acad. Sci. 1297: 8–28.
regularization defaults in order to fit the same models as Anderson, R. P. and Handley, C. O. 2001. A new species of
the Maxent Java application, opening up new ways to bet- three-toed sloth (Mammalia: Xenarthra) from Panamá, with a
ter integrate Maxent modeling with the wide variety of review of the genus Bradypus. – Proc. Biol. Soc. Washington
114: 1–33.
visualization and analysis tools available in R. Future con- Barry, S. C. and Welsh, A. 2002. Generalized additive
tributions to maxnet could facilitate this integration, for modelling and zero inflated count data. – Ecol. Model. 157:
example by contributing code and/or a vignette that links 179–188.
maxnet with the dismo package or the ENMeval package Bocher, E. and Neteler, M. (eds) 2012. Geospatial free and open
(Muscarella et al. 2014) which manage data preparation, source software in the 21st century. – Springer.
modeling and evaluation for SDM. A test suite would be Boria, R. A. et al. 2014. Spatial filtering to reduce sampling bias
very helpful to ensure consistency as new functionality can improve the performance of ecological niche models.
gets added to the package. Only some capabilities of glm- – Ecol. Model. 275: 73–77.
Cressie, N. 1993. Statistics for spatial data. – Wiley.
net are used by maxnet, and others (such as elastic net Diggle, P. 2003. Statistical analysis of spatial point patterns.
regularization and data-driven feature selection) could be mathematics in biology. – Arnold.
incorporated. Alternatively, glmnet or other IPP pack- Elith, J. et  al. 2006. Novel methods improve prediction of
ages (such as ppmlasso; Renner et al. 2015) could be used species’ distributions from occurrence data. – Ecography 29:
directly on the data set of Elith et al. (2006) (i.e. not via 129–151.
maxnet) and compared with the performance of maxnet Elith, J. et  al. 2010. The art of modelling range-shifting species.
described here in order to determine the most effective – Methods Ecol. Evol. 1: 330–342.
use of IPPs on species occurrence data. In addition, the Elith, J. et  al. 2011. A statistical explanation of MaxEnt for
ecologists. – Divers. Distrib. 17: 43–57.
standard collection of statistical analyses and maps pro- Fithian, W. and Hastie, T. 2013. Finite-sample equivalence in
duced as html output by the Maxent Java application statistical models for presence-only data. – Ann. Appl. Stat. 7:
could be assembled in R, and would then be available 1917–1939.
for any species distribution modeling method with an Fithian, W. et  al. 2015. Bias correction in species distribution
implementation in R. models: pooling survey and collection data for multiple species.
Finally, we invite developers and modelers to imag- – Methods Ecol. Evol. 6: 424–438.
ine new capabilities for the Maxent Java application, and Friedman, J. et  al. 2010. Regularization paths for generalized
to contribute to its development. Free and open access to linear models via coordinate descent. – J. Stat. Softw. 33:
1–22.
data, software, tools, publications and other resources facili- Guillera-Arroita, G. et al. 2014. Maxent is not a presence–absence
tate key steps towards more informed and powerful mod- method: a comment on Thibaud et al. – Methods Ecol. Evol.
els and more inclusive research outcomes (Soberón and 5: 1192–1197.
Peterson 2004). Through these open-source releases we aim Manly, B. et  al. 2002. Resource selection by animals: statistical
to empower Maxent users as a community to use, contribute design and analysis for field studies, 2nd ed. – Kluwer Press.

892
Merow, C. et al. 2013. A practical guide to MaxEnt for modeling Renner, I. W. and Warton, D. I. 2013. Equivalence of Maxent and
species’ distributions: what it does, and why inputs and settings Poisson point process models for species distribution modeling
matter. – Ecography 36: 1058–1069. in ecology. – Biometrics 69: 274–281.
Merow, C. et  al. 2016. Improving niche and range estimates with Renner, I. W. et al. 2015. Point process models for presence-only
Maxent and point process models by integrating spatially explicit analysis. – Methods Ecol. Evol. 6: 366–379.
information. – Global Ecol. Biogeogr. 25: 1022–1036. Soberón, J. and Peterson, A. T. 2004. Biodiversity informatics:
Muscarella, R. et al. 2014. ENMeval: an R package for conducting managing and applying primary biodiversity data. – Phil.
spatially independent evaluations and estimating optimal Trans. R. Soc. B 359: 689–698.
model complexity for Maxent ecological niche models. Syfert, M. M. et al. 2013. The effects of sampling bias and model
– Methods Ecol. Evol. 5: 1198–1205. complexity on the predictive performance of MaxEnt species
Phillips, S. J. and Dudík, M. 2008. Modeling of species distributions distribution models. – PLoS One 8: e55158.
with Maxent: new extensions and a comprehensive evaluation. VanDerWal, J. et  al. 2009. Abundance and the environmental
– Ecography 31: 161–175. niche: environmental suitability estimated from niche models
Phillips, S. J. et al. 2004. A maximum entropy approach to species predicts the upper limit of local abundance. – Am. Nat. 174:
distribution modeling. – In: Proceedings of the Twenty-First 282–291.
International Conference on Machine Learning. ACM Press, Warton, D. and Shepherd, L. 2010. Poisson point process models
pp. 472–486. solve the “pseudo-absence problem” for presence-only data in
Phillips, S. J. et al. 2006. Maximum entropy modeling of species ecology. – Ann. Appl. Stat. 4: 1383–1402.
geographic distributions. – Ecol. Model. 190: 231–259. Weber, M. et al. 2016. Is there a correlation between abundance
Phillips, S. J. et al. 2009. Sample selection bias and presence-only and environmental suitability derived from ecological niche
distribution models: implications for background and pseudo- modelling? A meta-analysis. – Ecography doi: 10.1111/
absence data. – Ecol. Appl. 19: 181–197. ecog.02125
Radosavljevic, A. and Anderson, R. P. 2014. Making better Maxent Wenger, S. J. and Freeman, M. C. 2008. Estimating species occur-
models of species distributions: complexity, overfitting and rence, abundance, and detection probability using zero-inflated
evaluation. – J. Biogeogr. 41: 629–643. distributions. – Ecology 89: 2953–2959.
Reddy, S. and Dávalos, L. M. 2003. Geographical sampling bias Wolkovich, E. M. et al. 2012. Advances in global change research
and its implications for conservation priorities in Africa. – J. require open science by individual researchers. – Global Change
Biogeogr. 30: 1719–1727. Biol. 18: 2102–2110.

Supplementary material (Appendix ECOG-03049 at < www.


ecography.org/appendix/ecog-03049 >). Appendix 1.­

893

You might also like