SaTScan Users Guide
SaTScan Users Guide
TM
By Martin Kulldorff
February, 2015
http://www.satscan.org/
Table of Contents
Introduction .................................................................................................................................................. 1
The SaTScan Software ..................................................................................................................... 1
Download and Installation ............................................................................................................... 2
Test Run ........................................................................................................................................... 2
Help System ..................................................................................................................................... 3
Sample Data Sets.............................................................................................................................. 3
Statistical Methodology ................................................................................................................................ 6
Spatial, Temporal and Space-Time Scan Statistics .......................................................................... 6
Bernoulli Model ............................................................................................................................... 8
Discrete Poisson Model .................................................................................................................... 8
Space-Time Permutation Model ....................................................................................................... 9
Multinomial Model .......................................................................................................................... 9
Ordinal Model .................................................................................................................................10
Exponential Model ..........................................................................................................................11
Normal Model .................................................................................................................................11
Continuous Poisson Model..............................................................................................................12
Probability Model Comparison .......................................................................................................13
Likelihood Ratio Test ......................................................................................................................15
Secondary Clusters ..........................................................................................................................16
Adjusting for More Likely Clusters ................................................................................................16
Covariate Adjustments ....................................................................................................................17
Spatial and Temporal Adjustments .................................................................................................19
Missing Data ...................................................................................................................................21
Multivariate Scan with Multiple Data Sets .....................................................................................23
Comparison with Other Methods ..............................................................................................................24
Scan Statistics .................................................................................................................................24
Spatial and Space-Time Clustering .................................................................................................24
Input Data ....................................................................................................................................................26
Data Requirements ..........................................................................................................................26
Case File ..........................................................................................................................................27
Control File .....................................................................................................................................28
Population File ................................................................................................................................28
Coordinates File ..............................................................................................................................29
Grid File ..........................................................................................................................................30
Non-Euclidian Neighbors File ........................................................................................................30
Meta Location File ..........................................................................................................................31
Max Circle Size File ........................................................................................................................31
Adjustments File .............................................................................................................................32
Alternative Hypothesis File .............................................................................................................33
SaTScan Import Wizard ..................................................................................................................33
SaTScan ASCII File Format ...........................................................................................................35
Basic SaTScan Features ..............................................................................................................................37
Input Tab .........................................................................................................................................37
Analysis Tab ...................................................................................................................................40
Output Tab ......................................................................................................................................44
Advanced Features ......................................................................................................................................46
Multiple Data Sets Tab....................................................................................................................46
Data Checking Tab ..........................................................................................................................47
Spatial Neighbors Tab .....................................................................................................................48
Spatial Window Tab ........................................................................................................................49
Temporal Window Tab ...................................................................................................................52
Spatial and Temporal Adjustments Tab ..........................................................................................54
Inference Tab ..................................................................................................................................56
SaTScan User Guide v9.4
Introduction
The SaTScan Software
Purpose
SaTScan is a free software that analyzes spatial, temporal and space-time data using the spatial, temporal,
or space-time scan statistics. It is designed for any of the following interrelated purposes:
Perform geographical surveillance of disease, to detect spatial or space-time disease clusters, and
to see if they are statistically significant.
Test whether a disease is randomly distributed over space, over time or over space and time.
Evaluate the statistical significance of disease cluster alarms.
Perform prospective real-time or time-periodic disease surveillance for the early detection of
disease outbreaks.
The software may also be used for similar problems in other fields such as archaeology, astronomy,
botany, criminology, ecology, economics, engineering, forestry, genetics, geography, geology, history,
neurology or zoology.
National Cancer Institute, Division of Cancer Prevention, Biometry Branch [v1.0, 2.0, 2.1]
National Cancer Institute, Division of Cancer Control and Population Sciences, Statistical
Research and Applications Branch [v3.0 (part), v6.1 (part), 8.0 (part), v9.0 (part)]
Alfred P. Sloan Foundation, through a grant to the New York Academy of Medicine (Farzad
Mostashari, PI) [v3.0 (part), 3.1, 4.0, 5.0, 5.1]
Centers for Disease Control and Prevention, through Association of American Medical Colleges
Cooperative Agreement award number MM-0870 [v6.0, 6.1 (part)].
National Institute of Child Health and Development, through grant #RO1HD048852 [7.0, 8.0, 9.0
(part)]
National Cancer Institute, Division of Cancer Epidemiology and Genetics [v9.0 (part),v9.1]
National Institute of General Medical Sciences, through a Modeling Infectious Disease Agent
Studies grant #U01GM076672 [v9.0 (part)]
Their financial support is greatly appreciated. The contents of SaTScan are the responsibility of the
developer and do not necessarily reflect the official views of the funders.
Related Topics: Statistical Methodology, SaTScan Bibliography
Test Run
Before using your own data, we recommend trying one of the sample data sets provided with the
software. Use these to get an idea of how to run SaTScan. To perform a test run:
1. Click on the SaTScan application icon.
2. Click on Open Saved Session.
3. Select one of the parameter files, for example nm.prm (Poisson model), NHumberside.prm
(Bernoulli model) or NYCfever.prm (space-time permutation model).
4. Click on Open.
5. Click on the Execute
button. A new window will open with the program running in the top
section and a Warnings/Errors section below. When the program finishes running the results will
be displayed.
Note: The sample files should not produce warnings or errors.
Related Topics: Sample Data Sets.
Help System
The SaTScan help system consists of four parts:
i.
SaTScan User Guide in PDF format, located in the same folder as the SaTScan executable. It can
also be obtained from the SaTScan web site (www.satscan.org/techdoc.html) or directly within
the SaTScan software by selecting Help > User Guide. You may print this as a single document
for easy reference.
ii.
SaTScan help entries, extracted from the User Guide. The complete set of entries can be found
within the SaTScan software by typing F1 or by selecting Help > Help Content. The system can
be searched by clicking the magnifying glass. Many individual entries can also be reached
directly by clicking on any sub-title seen on the input tabs.
iii.
Methodological papers describe the details about the statistical methods available in the SaTScan
software. These papers are listed in the SaTScan bibliography, which can be found both at the end
of the User Guide and on the web (http://www.satscan.org/references.html). The bibliography
also contains a large number of papers that have applied different SaTScan features for a wide
range of different types of data. These can serve as inspiration for how SaTScan can be used for
different types of scientific and public health problems.
iv.
Sample data sets are provided for each of the SaTScan probability models. Described below, they
make it easy to familiarize oneself with the software.
This is a condensed version of a more complete data set with the population given for each year from
1973 to 1991, and with ethnicity as a third covariate. The complete data set can be found at:
http://www.satscan.org/datasets/
Statistical Methodology
For all discrete spatial and space-time analyses, the user must provide data containing the spatial
coordinates of a set of locations (coordinates file). For each location, the data must furthermore contain
information about the number of cases at that location (case file). For temporal and space-time analyses,
the number of cases must be stratified by time, e.g. the time of diagnosis. Depending on the type of
analysis, other information about cases such as age, gender, weight, length of survival and/or cancer stage
may also be provided. For the Bernoulli model, it is also necessary to specify the number of controls at
each location (control file). For the discrete Poisson model, the user must specify a population size for
each location (population file). The population may vary over time.
Scan statistics are used to detect and evaluate clusters of cases in either a purely temporal, purely spatial
or space-time setting. This is done by gradually scanning a window across time and/or space, noting the
number of observed and expected observations inside the window at each location. In the SaTScan
software, the scanning window is an interval (in time), a circle or an ellipse (in space) or a cylinder with a
circular or elliptic base (in space-time). It is also possible to specify your own non-Euclidian distance
structure in a special file. Multiple different window sizes are used. The window with the maximum
likelihood is the most likely cluster, that is, the cluster least likely to be due to chance. A p-value is
assigned to this cluster.
Scan statistics use a different probability model depending on the nature of the data. A Bernoulli, discrete
Poisson or space-time permutation model is used for count data such as the number of people with
asthma; a multinomial model is used for categorical data such as cancer histology; an ordinal model for
ordered categorical data such as cancer stage; an exponential model for survival time data with or without
censoring; and a normal model for other continuous data such as birth weight or blood lead levels. The
general statistical theory behind the spatial and space-time scan statistics used in the SaTScan software is
described in detail by Kulldorff (1997)1 for the Bernoulli, discrete Poisson and continuous Poisson
models; by Kulldorff et al. (2005)5 for the space-time permutation model; by Jung et al. (2008)6 for the
multinomial model; by Jung et al. (2007)7 for the ordinal model; by Huang et al. (2006)8 for the
exponential model, by Kulldorff et al. (2009)9 for the normal model and by Huang et al. (2009)10 for the
normal model with weights. Please read these papers for a detailed description of each model. Here we
only give a brief non-mathematical description.
For all discrete probability models, the scan statistic adjusts for the uneven geographical density of a
background population. For all models, the analyses are conditioned on the total number of cases
observed.
Related Topics: The SaTScan Software, Basic SaTScan Features, Advanced Features, Analysis Tab,
Methodological Papers.
The user defines the set of grid points used through a grid file. If no grid file is specified, the grid points
are set to be identical to the coordinates of the location IDs defined in the coordinates file. The latter
option ensures that each data location is a potential cluster in itself, and it is the recommended option for
most types of analyses.
As an alternative to the circle, it is also possible to use an elliptic window shape, in which case a set of
ellipses with different shapes and angles are used as the scanning window together with the circle. This
provides slightly higher power for true clusters that are long and narrow in shape, and slightly lower
power for circular and other very compact clusters.
It is also possible to define your own non-Euclidian distance metric using a special neighbors file.
Related Topics: Analysis Tab, Coordinates File, Elliptic Scanning Window, Grid File, Maximum Spatial
Cluster Size, Spatial Window Tab.
The spatial variation in temporal trends scan statistic can only be run with the discrete Poisson probability
model. For it to work, it is important that the total study period length is evenly divisible by the length of
the time interval aggregation, so that all time intervals have the same number of years, if it is specified in
years, the same number of months if it is specified in months or the same number of days if it is specified
in days.
Related Topics: Analysis Tab, Space-Time Scan Statistic.
Bernoulli Model
With the Bernoulli model1,2, there are cases and non-cases represented by a 0/1 variable. These variables
may represent people with or without a disease, or people with different types of disease such as early and
late stage breast cancer. They may reflect cases and controls from a larger population, or they may
together constitute the population as a whole. Whatever the situation may be, these variables will be
denoted as cases and controls throughout the user guide, and their total will be denoted as the population.
Bernoulli data can be analyzed with the purely temporal, the purely spatial or the space-time scan
statistics.
Example: For the Bernoulli model, cases may be newborns with a certain birth defect while controls are
all newborns without that birth defect.
The Bernoulli model requires information about the location of a set of cases and controls, provided to
SaTScan using the case, control and coordinates files. Separate locations may be specified for each case
and each control, or the data may be aggregated for states, provinces, counties, parishes, census tracts,
postal code areas, school districts, households, etc, with multiple cases and controls at each data location.
To do a temporal or space-time analysis, it is necessary to have a time for each case and each control as
well.
Related Topics: Analysis Tab, Case File, Control File, Coordinates File, Likelihood Ratio Test,
Methodological Papers, Probability Model Comparison.
population size for a given location and time period, the population size, as defined above, is integrated
over the time period in question.
Related Topics: Analysis Tab, Case File, Continuous Poisson Model, Coordinates File, Likelihood Ratio
Test, Methodological Papers, Population File, Probability Model Comparison.
Multinomial Model
With the multinomial model6, each observation is a case, and each case belongs to one of several
categories. The multinomial scan statistic evaluates whether there are any clusters where the distribution
of cases is different from the rest of the study region. For example, there may be a higher proportion of
cases of types 1 and 2 and a lower proportion of cases of type 3 while the proportion of cases of type 4 is
about the same as outside the cluster. If there are only two categories, the ordinal model is identical to the
Bernoulli model, where one category represents the cases and the other category represents the controls.
The cases in the multinomial model may be a sample from a larger population or they may constitute a
complete set of observations. Multinomial data can be analyzed with the purely temporal, the purely
spatial or the space-time scan statistics.
Example: For the multinomial model, the data may consist of everyone diagnosed with meningitis, with
five different categories representing five different clonal complexes of the disease6. The multinomial
scan statistic will simultaneously look for high or low clusters of any of the clonal complexes, or a group
of them, adjusting for the overall geographical distribution of the disease. The multiple comparisons
inherent in the many categories used are accounted for when calculating the p-values.
The multinomial model requires information about the location of each case in each category. A unique
location may be specified for each case, or the data may be aggregated for states, provinces, counties,
parishes, census tracts, postal code areas, school districts, households, etc, with multiple cases in the same
location. To do a temporal or space-time analysis, it is necessary to have a time for each case as well.
With the multinomial model it is not necessary to specify a search for high or low clusters, since there is
no hierarchy among the categories, but in the output it is shown what categories are more prominent
inside the cluster. The order or indexing of the categories does not affect the analysis in terms of the
clusters found, but it may influence the randomization used to calculate the p-values.
Related Topics: Analysis Tab, Case File, Coordinates File, Likelihood Ratio Test, Methodological
Papers, Probability Model Comparison.
Ordinal Model
With the ordinal model7, each observation is a case, and each case belongs to one of several ordinal
categories. If there are only two categories, the ordinal model is identical to the Bernoulli model, where
one category represents the cases and the other category represent the controls in the Bernoulli model.
The cases in the ordinal model may be a sample from a larger population or they may constitute a
complete set of observations. Ordinal data can be analyzed with the purely temporal, the purely spatial or
the space-time scan statistics.
Example: For the ordinal model, the data may consist of everyone diagnosed with breast cancer during a
ten-year period, with three different categories representing early, medium and late stage cancer at the
time of diagnosis.
The ordinal model requires information about the location of each case in each category. Separate
locations may be specified for each case, or the data may be aggregated for states, provinces, counties,
parishes, census tracts, postal code areas, school districts, households, etc, with multiple cases in the same
or different categories at each data location. To do a temporal or space-time analysis, it is necessary to
have a time for each case as well.
With the ordinal model it is possible to search for high clusters, with an excess of cases in the high-valued
categories, for low clusters with an excess of cases in the low-valued categories, or simultaneously for
both types of clusters. Reversing the order of the categories has the same effect as changing the analysis
from high to low and vice versa.
Related Topics: Analysis Tab, Case File, Coordinates File, Likelihood Ratio Test, Methodological
Papers, Probability Model Comparison.
10
Exponential Model
The exponential model8 is designed for survival time data, although it could be used for other continuous
type data as well. Each observation is a case, and each case has one continuous variable attribute as well
as a 0/1 censoring designation. For survival data, the continuous variable is the time between diagnosis
and death or depending on the application, between two other types of events. If some of the data is
censored, due to loss of follow-up, the continuous variable is then instead the time between diagnosis and
time of censoring. The 0/1 censoring variable is used to distinguish between censored and non-censored
observations.
Example: For the exponential model, the data may consist of everyone diagnosed with prostate cancer
during a ten-year period, with information about either the length of time from diagnosis until death or
from diagnosis until a time of censoring after which survival is unknown.
When using the temporal or space-time exponential model for survival times, it is important to realize that
there are two very different time variables involved. The first is the time the case was diagnosed, and that
is the time that the temporal and space-time scanning window is scanning over. The second is the survival
time, that is, time between diagnosis and death or for censored data the time between diagnosis and
censoring. This is an attribute of each case, and there is no scanning done over this variable. Rather, we
are interested in whether the scanning window includes exceptionally many cases with a small or large
value of this attribute.
It is important to note, that while the exponential model uses a likelihood function based on the
exponential distribution, the true survival time distribution must not be exponential and the statistical
inference (p-value) is valid for other survival time distributions as well. The reason for this is that the
randomization is not done by generating observations from the exponential distribution, but rather, by
permuting the space-time locations and the survival time/censoring attributes of the observations.
Related Topics: Likelihood Ratio Test, Analysis Tab, Probability Model Comparison, Methodological
Papers.
Normal Model
The normal model10 is designed for continuous data. For each individual or for each observation, called a
case, there is a single continuous attribute that may be either negative or positive. The model can also be
used for ordinal data when there are many categories. That is, different cases are allowed to have the same
attribute value.
Example: For the normal model, the data may consist of the birth weight and residential census tract for
all newborns, with an interest in finding clusters with lower birth weight. One individual is then a case.
Alternatively, the data may consist of the average birth weight in each census tract. It is then the census
tract that is the case, and it is important to use the weighted normal model, since each average will have
a different variance due to a different number of births in each tract.
It is important to note that while the normal model uses a likelihood function based on the normal
distribution, the true distribution of the continuous attribute must not be normal. The statistical inference
(p-value) is valid for any continuous distribution. The reason for this is that the randomization is not done
by generating simulated data from the normal distribution, but rather, by permuting the space-time
locations and the continuous attribute (e.g. birth weight) of the observations. While still being formally
valid, the results can be greatly influenced by extreme outliers, so it may be wise to truncate such
observations before doing the analysis.
11
In the standard normal model9, it is assumed that each observation is measured with the same variance.
That may not always be the case. For example, if an observation is based on a larger sample in one
location and a smaller sample in another, then the variance of the uncertainty in the estimates will be
larger for the smaller sample. If the reliability of the estimates differs, one should instead use the
weighted normal scan statistic10that takes these unequal variances into account. The weighted version is
obtained in SaTScan by simply specifying a weight for each observation as an extra column in the input
file. This weight may for example be proportional to the sample size used for each estimate or it may be
the inverse of the variance of the observation.
If all values are multiplied with or added to the same constant, the statistical inference will not change,
meaning that the same clusters with the same log likelihoods and p-values will be found. Only the
estimated means and variances will differ. If the weight is the same for all observations, then the weighted
normal scan statistic will produce the same results as the standard normal version. If all the weights are
multiplied by the same constant, the results will not change.
Related Topics: Analysis Tab, Likelihood Ratio Test, Methodological Papers, Probability Model
Comparison.
12
circle though, so the Obs/Exp ratios provided should be viewed as a lower bound on the true value
whenever the circle extends outside the spatial study region.
The continuous Poisson model can only be used for purely spatial data. It uses a circular scanning
window of continuously varying radius up to a maximum specified by the user. Only circles centered on
one of the observations are considered, as specified in the coordinates file. If the optional grid file is
provided, the circles are instead centered on the coordinates specified in that file. The continuous Poisson
model has not been implemented to be used with an elliptic window.
Related Topics: Analysis Tab, Likelihood Ratio Test, Methodological Papers, Poisson Model,
Probability Model Comparison.
13
Temporal Data
For temporal and space-time data, there is an additional difference among the probability models, in the
way that the temporal data is handled. With the Poisson model, population data may be specified at one or
several time points, such as census years. The population is then assumed to exist between such time
points as well, estimated through linear interpolation between census years. With the Bernoulli, spacetime permutation, ordinal, exponential and normal models, a time needs to be specified for each case and
for the Bernoulli model, for each control as well.
Related Topics: Bernoulli Model, Poisson Model, Space-Time Permutation Model, Likelihood Ratio
Test, Methodological Papers.
14
E [c ]
C c
C E [c ]
C c
I ()
where C is the total number of cases, c is the observed number of cases within the window and E[c] is the
covariate adjusted expected number of cases within the window under the null-hypothesis. Note that since
the analysis is conditioned on the total number of cases observed, C-E[c] is the expected number of cases
outside the window. I() is an indicator function. When SaTScan is set to scan only for clusters with high
rates, I() is equal to 1 when the window has more cases than expected under the null-hypothesis, and 0
otherwise. The opposite is true when SaTScan is set to scan only for clusters with low rates. When the
program scans for clusters with either high or low rates, then I()=1 for all windows.
The space-time permutation model uses the same function as the Poisson model. Due to the conditioning
on the marginals, the observed number of cases is only approximately Poisson distributed. Hence, it is no
longer a formal likelihood ratio test, but it serves the same purpose as the test statistic.
For the Bernoulli model the likelihood function is1,2:
c
c n c
n n
nc
C c
N n
C c
( N n ) (C c )
N n
( N n )(C c )
I ()
where c and C are defined as above, n is the total number of cases and controls within the window, while
N is the combined total number of cases and controls in the data set.
The likelihood function for the multinomial, ordinal, exponential, and normal models are more complex,
due to the more complex nature of the data. We refer to papers by Jung, Kulldorff and Richards6, Jung,
Kulldorff and Klassen7; Huang, Kulldorff and Gregorio8; Kulldorff et al9, and Huang et al. 10 for the
likelihood functions for these models. The likelihood function for the spatial variation in temporal trends
scan statistic is also more complex, as it involves the maximum likelihood estimation of several different
trend functions.
The likelihood function is maximized over all window locations and sizes, and the one with the maximum
likelihood constitutes the most likely cluster. This is the cluster that is least likely to have occurred by
chance. The likelihood ratio for this window constitutes the maximum likelihood ratio test statistic. Its
distribution under the null-hypothesis is obtained by repeating the same analytic exercise on a large
number of random replications of the data set generated under the null hypothesis. The p-value is
obtained through Monte Carlo hypothesis testing14, by comparing the rank of the maximum likelihood
from the real data set with the maximum likelihoods from the random data sets. If this rank is R, then p =
R / (1 + #simulation). In order for p to be a nice looking number, the number of simulations is restricted
to 999 or some other number ending in 999 such as 1999, 9999 or 99999. That way it is always clear
whether to reject or not reject the null hypothesis for typical cut-off values such as 0.05, 0.01 and 0.001.
The SaTScan program scans for areas with high rates (clusters), for areas with low rates, or
simultaneously for areas with either high or low rates. The latter should be used rather than running two
separate tests for high and low rates respectively, in order to make correct statistical inference. The most
common analysis is to scan for areas with high rates, that is, for clusters.
15
Secondary Clusters
For purely spatial and space-time analyses, SaTScan also identifies secondary clusters in the data set in
addition to the most likely cluster, and orders them according to their likelihood ratio test statistic. There
will almost always be a secondary cluster that is almost identical with the most likely cluster and that
have almost as high likelihood value, since expanding or reducing the cluster size only marginally will
not change the likelihood very much. Most clusters of this type provide little additional information, but
their existence means that while it is possible to pinpoint the general location of a cluster, its exact
boundaries must remain uncertain. The user can decide to what extent overlapping clusters are reported in
the results files. The default is that geographically overlapping clusters are not reported.
There may also be secondary clusters that do not overlap spatially with the most likely cluster, and they
may be of great interest. These are always reported. The p-values for such clusters should be interpreted
in terms of the ability of the secondary cluster to reject the null hypothesis on its own strength, whether or
not the more likely clusters are true clusters or not. Hence, these p-values are not adjusted for the fact that
there may be other clusters in the data. If such adjustments are desired, the iterative scan statistic should
be used.
For purely temporal analyses, only the most likely cluster is reported.
Related Topics: Adjusting for More Likely Clusters, Likelihood Ratio Test, Spatial Output Tab, Criteria
for Reporting Secondary Clusters, Standard Results File.
16
As an advanced option, SaTScan is able to adjust the inference of secondary clusters for more likely
clusters in the data24. This is done in an iterative manner. In the first iteration SaTScan runs the standard
analysis but only reports the most likely cluster. That cluster is then removed from the data set, including
all cases and controls (Bernoulli model) in the cluster while the population (Poisson model) is set to zero
for the locations and the time period defining the cluster. In a second iteration, a completely new analysis
is conducted using the remaining data. This procedure is then repeated until there are no more clusters
with a p-value less than a user specified maxima or until a user specified maximum number of iterations
have been completed, whichever comes first.
For purely spatial analyses it has been shown that the resulting p-values for secondary clusters are quite
accurate and at most marginally biased.
Note that the circle of a secondary cluster may overlap with the circle of a previously detected more likely
cluster, and it may even completely encircle it so that the latter is a subset of the former. This does not
mean that the more likely cluster is detected twice. Rather, the more likely cluster is treated as a lake
with no population and no cases, and the new secondary cluster consist of the areas around that lake.
This may for example happen if a city has a very high elevated risk, while the surrounding suburbs have a
modest elevated risk. The same phenomena may occur when doing purely temporal or space-time
analyses.
This feature is not available for the continuous Poisson model.
Related Topics: Spatial Output Tab, Criteria for Reporting Secondary Clusters, Iterative Scan,
Likelihood Ratio Test, Secondary Clusters, Standard Results File.
Covariate Adjustments
A covariate should be adjusted for when all three of the following are true:
The covariate is related to the disease in question.
The covariate is not randomly distributed geographically.
You want to find clusters that cannot be explained by that covariate.
Here are three examples:
If you are studying cancer mortality in the United States, you should adjust for age since (i)
older people are more likely to die from cancer (ii) some areas such as Florida have a higher
percent older people, and (iii) you are presumably interested in finding areas where the risk of
cancer is high as opposed to areas with an older population.
If you are interested in the geographical distribution of birth defects, you do not need to
adjust for gender. While birth defects are not equally likely in boys and girls, the
geographical distribution of the two genders is geographically random at time of birth.
If you are studying the geography of lung cancer incidence, you should adjust for smoking if
you are interested in finding clusters due to non-smoking related risk factors, but you should
not adjust for smoking if you are interested in finding clusters reflecting areas with especially
urgent needs to launch an anti-smoking campaign.
When the disease rate varies, for example, with age, and the age distribution varies in different areas, then
there is geographical clustering of the disease simply due to the age covariate. When adjusting for
categorical covariates, the SaTScan program will search for clusters above and beyond that which is
expected due to these covariates. When more than one covariate is specified, each one is adjusted for as
well as all the interaction terms between them.
17
Related Topics: Covariate Adjustment Using the Input Files, Covariate Adjustment using Statistical
Regression Software, Covariate Adjustment Using Multiple Data Sets, Methodological Papers.
E[c] = p*C/P
where c is the observed number of cases and p the population in the location of interest, while C and P are
the total number of cases and population respectively. Let ci, pi, Ci and Pi be defined in the same way, but
for covariate category i. The indirectly standardized covariate adjusted expected number of cases (spatial
analysis) is:
E[c] = i E[ci] = i pi * Ci / Pi
The same principle is used when calculating the covariate adjusted number of cases for the space-time
scan statistic, although the formula is more complex due to the added time dimension.
Since the space-time permutation model automatically adjusts for purely spatial and purely temporal
variation, there is no need to adjust for covariates in order to account for different spatial or temporal
densities of these covariates. For example, there is no need to adjust for age simply because some places
have a higher proportion of old people. Rather, covariate adjustment is used if there is space-time
interaction due to this covariate rather than to the underlying disease process. For example, if children get
sick mostly in the summer and adults mostly in the winter, then there will be age generated space-time
interaction clusters in areas with many children in the summer and vice versa. When including child/adult
as a covariate, these clusters are adjusted away.
Note: Too many covariate categories can create problems. For the space-time permutation model, the
adjustment is made at the randomization stage, so that each covariate category is randomized
independently. If there are too many covariate categories, so that all or most cases in a category belong to
the same spatial location or the same aggregated time interval, then there is very little to randomize, and
the test becomes meaningless.
Related Topics: Covariate Adjustments, Covariate Adjustment using Statistical Regression Software,
Covariate Adjustment Using Multiple Data Sets, Methodological Papers, Poisson Model, Space-Time
Permutation Model, Case File, Population File.
18
obtain risk estimates for each of the covariates. The second step is to adjust the survival and censoring
time up or down for each individual based on the risk estimates his or her covariates.
For the normal model, covariates can be adjusted for by first doing linear regression using standard
statistical software, and then replacing the observed value with their residuals.
Related Topics: Covariate Adjustments, Covariate Adjustment Using the Input Files, Covariate
Adjustment Using Multiple Data Sets, Exponential Model, Methodological Papers, Poisson Model,
Population File.
19
specify whether a temporal adjustment should be made, and if so, whether to adjust with a percent change
or non-parametrically.
Sometimes, the best way to adjust for a temporal trend is by specifying the percent yearly increase or
decrease in the rate that is to be adjusted for. This is a log linear adjustment. Depending on the
application, one may adjust either for a trend that SaTScan estimates from the data being analyzed, or
from the trend as estimated from national or other similar data. In the latter case, the percent increase or
decrease must be calculated using standard statistical regression software such as SAS or R, and then
inserted on the Risk Adjustments Tab.
For space-time analyses, it is also possible to adjust for a temporal trend non-parametrically. This adjusts
the expected count separately for each aggregated time interval, removing all purely temporal clusters.
The randomization is then stratified by time interval to ensure that each time interval has the same
number of events in the real and random data sets.
The ability to adjust for temporal trends is much more limited for the Bernoulli, multinomial, ordinal,
normal and exponential models, as none of the above features can be used. Instead, the time must be
divided into discrete time periods, with the cases and controls in each period corresponding to a separate
data set with separate case and control files. The analysis is then done using multiple data sets.
Related Topics: Spatial and Temporal Adjustments Tab, Time Aggregation, Adjusting for Day-of-Week
Effects, Poisson Model.
20
This option is not available for the Bernoulli, multinomial, ordinal, exponential, normal or space-time
permutation models, in the latter case because the method automatically adjusts for any purely spatial
clusters.
Note: It is not possible to simultaneously adjust for spatial clusters and purely temporal clusters using
stratified randomization, and if both types of adjustments are desired, the space-time permutation model
should be used instead.
Related Topics: Spatial and Temporal Adjustments Tab, Poisson Model, Adjusting for Temporal Trends.
Missing Data
If there is missing data for some locations and times, it is important to adjust for that in the analysis. If
not, you may find statistically significant low rate clusters where there is missing data, or statistically
significant high rate clusters in other locations, even though these are simply artifacts of the missing data.
Bernoulli Model
To adjust a Bernoulli model analysis for missing data, do the following. If cases are missing for a
particular location and time period remove the controls for that same location and time. Likewise, if
controls are missing for a particular location and time, remove the cases for that same location and time.
This needs to be done before providing the data to SaTScan. If both cases and controls are missing for a
location and time, you are fine, and there is no need for any modification of the input data.
21
22
23
24
In a power comparison2, it was shown that Turnbull's method has higher power if the true cluster size is
within about 20 percent of what is specified by that method, while the spatial scan statistic has higher
power otherwise. Note that the cluster size in Turnbull's method must be specified before looking at the
data, or the procedure is invalid.
25
Input Data
Data Requirements
Required Files: The input data should be provided in a number of files. A coordinates file is always
needed and a case file is needed for all probability models except the continuous Poisson model. The
Poisson model also requires a population file while the Bernoulli model requires a control file.
Optional Files: One may also specify an optional special grid file that contains geographical coordinates
of the centroids defining the circles used by the scan statistic. If such a file is not specified, the
coordinates in the coordinate file will be used for that purpose. As part of the advanced features, there is
also an optional max circle size file, an optional adjustments file, and optional non-Euclidian neighbors
file and an optional meta location file.
File Format: The data input files must be in SaTScan ASCII file format or you may use the SaTScan
import wizard for shapefiles, dBase, comma delimited or space delimited files. Using such files, the
wizard will automatically generate SaTScan file format files. Both options are described below.
Spatial Resolution: For the discrete scan statistics, separate data locations may be specified for
individuals or data may be aggregated for states, provinces, counties, parishes, census tracts, postal code
areas, school districts, households, etc.
Temporal Information: To do a temporal, a space-time or a spatial variation in temporal trends analysis,
it is necessary to have a time related to each case, and if the Bernoulli model is used, for each control as
well. This time can be specified as a day, month or year. When the discrete Poisson model is used the
background denominator population is assumed to exist continuously over time, although not necessarily
at a constant level. The population file requires a date to be specified for each population count. For times
in-between those dates, SaTScan will estimate the population through linear interpolation. If all
population counts have the same date, the population is assumed to be constant over time.
Multiple Data Sets: It is possible to specify multiple case files, each representing a different data set,
with information about different diseases or about men versus women respectively. For the Bernoulli
model, each case file must be accompanied with its own control file, and for the Poisson model, each case
file must be accompanied with its own population file. The maximum number of data sets that SaTScan
can analyze is twelve.
Covariate Adjustments: With the Poisson and space-time permutation models, it is possible to adjust for
multiple categorical covariates by including them in the case and population files. For the Bernoulli,
ordinal or exponential models, covariates can be adjusted for using multiple data sets.
Related Topics: Input Tab, Multiple Data Sets Tab, Case File, Control File, Population File,
Coordinates File, Grid File, SaTScan Import Wizard, SaTScan ASCII File Format, Covariate
Adjustments.
26
Case File
The case file provides information about cases, and it is used for all probability models. It should contain
the following information:
Location ID: Any numerical value or string of characters. Empty spaces may not form part of the
id.
Number of Cases: The number of cases for the specified location, time and covariates. For the
discrete Poisson, binomial and space-time permutation models, this is the number of observations
or individuals with the characteristic of interest, such as cancer or low birth weight. For the
ordinal, multinomial, normal and exponential models, it is the total number of observations or
individuals in the locations, irrespectively of the value of their categorical characteristic or
continuous attribute value.
Date/Time: Optional. May be specified either in years, months or days, or in a generic format.
The format must coincide with the time precision format specified on the Input Tab. Unless
temporal data check is disabled, all case times must fall within the study period as specified on
the Input Tab.
Attribute: For the multinomial, ordinal, exponential and normal models only. A variable
describing some characteristic of the case. These may be a category (multinomial or ordinal
model), survival time (exponential model), or a continuous variable value (normal model). The
categories for the multinomial and ordinal models can be specified as any positive or negative
numerical value. Survival times must be positive numbers. The numbers for the normal model
can be positive or negative.
Censored: For the exponential model only. Censored is a 0/1 variable with censored=1 and
uncensored=0.
Weight: Optional. For the normal model only. Required if covariates are used, even if all
observations have the same variance, in which case all weights should be set to one.
Covariates: Optional. For discrete Poisson, space-time permutation and normal models only.
Any number of categorical covariates may be specified as either numbers or through characters.
For the normal model, covariates can only be included if weights are also provided.
Example: If on April 1, 2004 there were 17 male and 12 female cases in New York, the following
information would be provided:
Note: For the weighted normal model, there can be only one case (observation) per line, and hence, if
weights are specified, the second column must be all ones.
Note: Multiple lines may be used for different cases with the same location, time and attributes. SaTScan
will automatically add them.
Note: This file is not used for the continuous Poisson model.
Related Topics: Input Tab, Case File Name, Multiple Data Sets Tab, Covariate Adjustment Using Input
Files, SaTScan Import Wizard, SaTScan ASCII File Format.
27
Control File
The control file is only used with the Bernoulli model. It should contain the following information:
location id: Any numerical value or string of characters. Empty spaces may not form part of the
id.
#controls: The number of controls for the specified location and time.
time: Optional. Time may be specified either in years, months or days, or in a generic format.
All control times must fall within the study period as specified on the Analysis tab. The format of
the times must be the same as in the case file.
Note: Multiple lines may be used for different controls with the same location, time and attributes.
SaTScan will automatically add them.
Related Topics: Input Tab, Control File Name, Multiple Data Sets Tab, SaTScan Import Wizard,
SaTScan ASCII File Format.
Population File
The population file is used for the discrete Poisson model, providing information about the background
population at risk. This may be actual population count from a census, or it could be for example
covariate adjusted expected counts from a statistical regression model. It should contain the following
information:
location id: Any numerical value or string of characters. Empty spaces may not form part of the
id.
time: The time to which the population size refers. May be specified either in years, months or
days, or in a generic format. If the population time is unknown but identical for all population
numbers, then a dummy year must be given, the choice not affecting result.
population: Population size for a particular location, year and covariate combination. If the
population size is zero for a particular location, year, and set of covariates, then it should be
included in the population file specified as zero. The population can be specified as a decimal
number to reflect a population size at risk rather than an actual number of people.
covariates: Optional. Any number of categorical covariates may be specified, each represented
by a different column separated by empty spaces. May be specified numerically or through
characters. The covariates must be the same as in the case file.
Example: If age and sex are the covariates included, with 18 different age groups, then there
should be 18x2=36 rows for each year and census area. With 3 different census years, and 32
census areas, the file will have a total of 3456 rows and 5 columns.
Note: Multiple lines may be used for different population groups with the same location, time and
covariate attributes. SaTScan will automatically add them.
Note: For a purely temporal analysis with the discrete Poisson model, it is not necessary to specify a
population file if the population is constant over time.
Related Topics: Input Tab, Population File Name, Multiple Data Sets Tab, Covariate Adjustment Using
Input Files, Max Circle Size File, SaTScan Import Wizard, SaTScan ASCII File Format.
28
Coordinates File
The coordinates file provides the geographic coordinates for each location ID. Each line of the file
represents one geographical location. Area-based information may be aggregated and represented by one
single geographical point location. Coordinates may be specified either using the standard Cartesian
coordinate system or in latitude and longitude. If two different location IDs have exactly the same
coordinates, then the data for the two are combined and treated as a single location.
A coordinates file is not needed for purely temporal analyses.
Related Topics: Input Tab, Coordinates File Name, Coordinates, Cartesian Coordinates, Latitude and
Longitude, Grid File.
Cartesian Coordinates
Cartesian is the mathematical name for the regular planar x,y-coordinate system taught in high school.
These may be specified in two, three or any number of dimensions. The SaTScan program will
automatically read the number of dimensions, which must be the same for all coordinates. If Cartesian
coordinates are used, the coordinates file should contain the following information:
location id: Any numerical value or string of characters. Empty spaces may not form part of the
id.
coordinates: The coordinates must all be specified in the same units. There is no upper limit on
the number of dimensions.
Note: If you have more than 10 dimensions you cannot use the SaTScan Import Wizard for the
coordinates and grid files, but must specify them using the SaTScan ASCII file format.
Note: The continuous Poisson model only works in two dimensions.
Related Topics: Input Tab, Coordinates, Latitude and Longitude, Coordinates File, Grid File, SaTScan
Import Wizard, SaTScan ASCII File Format.
location id: Any numerical value or string of characters. Empty spaces may not form part of the
id.
29
Note: When coordinates are specified in latitudes and longitudes, SaTScan does not perform a projection
of these coordinates onto a planar space. Rather, SaTScan draws perfect circles on the surface of the
spherical earth.
Note: Latitude and longitude cannot be used for the continuous Poisson model, or when an elliptic spatial
window is used.
Related Topics: Input Tab, Coordinates File, Coordinates, Cartesian Coordinates, Latitude and
Longitude, Grid File, SaTScan Import Wizard, SaTScan ASCII File Format, Computing Time.
Grid File
The optional grid file defines the centroids of the circles used by the scan statistic. If no grid file is
specified, the coordinates given in the coordinates file are used for this purpose. Each line in the file
represents one circle centroid. There should be at least two variables representing Cartesian (standard)
x,y-coordinates or exactly two variables representing latitude and longitude. The choice between
Cartesian and latitude/longitude must coincide with the coordinates file, as must the number of
dimensions.
The grid file will normally only include spatial coordinates, while the temporal range of potential clusters
is specified on the Temporal Window Tab. That does not allow the user to specify a different temporal
range for different locations. If that is desired, four more columns can be added to the grid file
representing the earliest allowed start time of the cluster, the latest allowed start time, the earliest allowed
end time and the latest allowed end time, in that order. If these columns are specified for some grid points,
but not for remainder, then the remaining grid points will have the temporal cluster specifications defined
on the Temporal Window Tab.
If only one centroid is specified in the grid file, one gets a focused cluster test rather than a scan statistic.
Such focused tests are useful to evaluate whether there is a cluster around a pre-defined location such as a
toxic waste site. If more than one but still a small number of centroids is specified in the grid file, one gets
a multi-focused tests, looking for clusters around one or more of the centroids.
Related Topics: Input Tab, Grid File Name, Coordinates, Cartesian Coordinates, Latitude and
Longitude, Coordinates File, SaTScan Import Wizard, SaTScan ASCII File Format, Temporal Window
Tab, Computing Time.
30
The first column of this file contains the location IDs defining the centroids of the scanning window. The
subsequent entries on each row are then the centroids neighbors in order of closeness. The scanning
window will expand in size until there are no more neighbors provided for that row. For example,
with the row 1 2 3 4, SaTScan will evaluate [1], [1,2], [1,2,3] and [1,2,3,4] as potential clusters. If the
second row is 4 1 3 2, SaTScan will also evaluate [4], [1,4] and [1,3,4]. Note that, with these two rows
the cluster [1,2,3,4] is specified twice, which is okay but redundant. It is allowed to have multiple rows
for the same location ID centroid, each with a different set of closest neighbors, so the third row could for
example be 1 3 5 7 9
The non-Euclidian neighbor file automatically defines the maximum cluster size for each centroid, since
SaTScan will stop adding locations to the cluster when the end of the row is reached. Hence, there is
typically ne need to specify a Maximum Spatial Cluster Size on the Spatial Windows tab. If a maximum
is specified on that tab, then SaTScan will honor that maximum and not continue adding locations until it
reaches the end of the row in the non-Euclidian neighbors file.
Note: The neighbors file cannot be used with the continuous Poisson model.
Related Topics: Coordinates File, Input Tab, Maximum Spatial Cluster Size, Meta Location File, Spatial
Neighbors Tab, SaTScan ASCII File Format.
31
The name of the special max circle size file is specified on the Analysis Tab Advanced Features
Spatial Window Tab.
Note: If a location ID is missing from this file, the population is assumed to be zero. If a location ID
occurs more than once, the population numbers will be added.
Related Topics: Input Tab, Population File, Spatial Window Tab, SaTScan Import Wizard, SaTScan
ASCII File Format.
Adjustments File
The adjustments file can be used to adjust a discrete Poisson model analysis for any temporal, spatial and
space-time anomalies in the data, with a known relative risk. It can for example be used to adjust for
missing or partially missing data. (Note: Covariates are adjusted for by using the case and population files
or by analyzing multiple data sets, not with this file). The adjustments file should contain one or more
lines for each location for which adjustments are warranted, with the following information:
Location ID: Any numerical value or string of characters. Empty spaces may not form part of the id.
Alternatively, it is possible to specify All, in which all location will be adjusted with the same relative
risk.
Relative Risk: Any non-negative number. The relative risk representing how much more common
disease is in this location and time period compared to the baseline. Setting a value of one is equivalent of
not doing any adjustments. A value of greater than one is used to adjust for an increased risk and a value
of less than one to adjust for lower risk. A relative risk of zero is used to adjust for missing data for that
particular time and location.
Start Time: Optional. The start of the time period to be adjusted using this relative risk.
End Time: Optional. The end of the time period to be adjusted using this relative risk.
If no start and end times are given, the whole study period will be adjusted for that location. If All is
selected instead of a location ID, but no start or end times are given, that has the same effect as when no
adjustments are done.
The name of the adjustments file is specified on the Analysis Tab Advanced Features Risk
Adjustments.
Note: Assigning a relative risk of x to half the locations is equivalent to assigning a relative risk of 1/x to
the other half. Assigning the same relative risk to all locations and time periods has the same effect as not
adjusting at all.
Note: It is permissible to adjust the same location and time periods multiple times, through different rows
with different relative risks. SaTScan will simply multiply the relative risks. For example, if you adjust
location A with a relative risk of 2 for all time periods, and you adjust 1990 with a relative risk of 2 for all
locations, then the 1990 entry for location A will be adjusted with a relative risk of 2*2=4.
Related Topics: Adjustments with Known Relative Risk, Missing Data, Spatial and Temporal
Adjustments Tab, SaTScan Import Wizard, SaTScan ASCII File Format.
32
33
Step 5: Saving the Imported File or Read Directly from the Source File
The imported file, which is in SaTScan ASCII file format, can be saved for current and future use. That
way, you do not have to go through the Import Wizard the next time you want to use the same file.
Alternatively, you may save the settings used, and read directly from the source file rather than through
the creation of an intermediate file in SaTScan format. If the latter is done, it is okay if the content of the
source file changes for a subsequent analysis, but the structure must be unchanged.
Related Topics: Input Tab, Case File, Control File, Population File, Coordinates File, Grid File, Max
Circle Size File, Adjustments File.
34
<location id>
<covariate#N>
The use of attributes, censored, weight and covariates depends on the probability model, as
shown in Table 1.
<#cases>
<time>
<attribute><censored><weight><covariate#1>
Probability Model
attribute
censored weight
covariates
Discrete Poisson
n/a
n/a
n/a
optional
Bernoulli
n/a
n/a
n/a
n/a
n/a
n/a
optional
Multinomial
category
n/a
n/a
n/a
Ordinal
category
n/a
n/a
n/a
Exponential
survival time
optional
n/a
n/a
Normal
optional optional
<latitude> <longitude> OR
35
...
<location ID> <location ID of closest neighbor> <location ID of 2nd closest neighbor> etc
Time Formats
Times must be entered in a specific format. Generic time is specified using any negative or positive
integer in the range (-200,000 to 2,900,000). If you have times outside this interval, simply add or
subtract the same constant to all the times. The valid date formats are:
2010
2010/06, 2010/06/26
2010-06, 2010-06-26
06/2010, 06/26/2010
06-2010, 06-26-2010
Single digit days and months may be specified with one or two digits. For example, September 9, 2002,
can be written as 2002/9/9, 2002/09/09, 2002/09/9, 2002/9/09, 2002-9-9, etc.
Note: SaTScan also support a few other time formats used in earlier versions, but they are no longer
recommended.
Related Topics: Input Tab, Case File, Control File, Population File, Coordinates File, Grid File, Max
Circle Size File, Neighbors File, Adjustments File, SaTScan Import Wizard.
36
Input Tab
37
Time Precision
Indicate whether the case file and the control file (when applicable) contain information about the time of
each case (and control), and if so, whether the precision should be read as generic days, months or years.
If the time precision is specified to be days but the precision in the case or control file is in month or year,
then there will be an error. If the time precision is specified as years, but the case or control file includes
some dates specified in terms of the month or day, then the month or day will be ignored.
For a purely spatial analysis, the case and control file need not contain any times. If they do, it has to be
specified that they do contain this information so that SaTScan knows how to read the file, but the
information is ignored.
Note: The choice defines only the precision for the times in the case and control files. The precision of
the times in the population file can be different, except that if one has generic times the other must also
have generic times.
Related Topics: Input Tab, Case File, Control File, Study Period, Time Aggregation.
Study Period
Specify the start and end date of the time period under study. This must be done even for a purely spatial
analysis in order to calculate the expected number of cases correctly. Allowable years are those between
1753 and 9999.
All times in the case and control files should fall on or between the start and end date of the study period.
Dates in the population file are allowed to be outside the start and end date of the study period.
Start Date/Time: The earliest date/time to be included in the study period.
End Date/Time: The latest date/time to be included in the study period.
Note: The start and end dates cannot be specified to a higher precision than the precision of the times in
the case and control files.
If the user does not specify month, then by default it will be set to January for the start date and to
December for the end date. Likewise, if day is not specified, then by default it will be set to the first of the
month for the start date and the last of the month for the end date.
Related Topics: Input Tab, Case File, Control File, Time Precision, Time Aggregation.
38
Coordinates
Specify the type of coordinates used by the coordinates file and the grid file, as either Cartesian or
latitude/longitude. Cartesian is the mathematical name for the regular x/y-coordinate system taught in
high school. Latitude/longitude cannot be used for the continuous Poisson model.
Related Topics: Cartesian Coordinates, Latitude/Longitude, Coordinates File, Grid File.
39
Analysis Tab
Type of Analysis
SaTScan may be used for a purely spatial, purely temporal, space-time analyses and spatial variation in
temporal trends. A purely spatial analysis ignores the time of cases, even when such data are provided. A
purely temporal analysis ignores the geographical location of cases, even when such information is
provided.
Purely temporal and space-time data can be analyzed in either retrospective or prospective fashion. In a
retrospective analysis, the analysis is done only once for a fixed geographical region and a fixed study
period. SaTScan scans over multiple start dates and end dates, evaluating both alive clusters, lasting
until the study period and date, as well as historic clusters that ceased to exist before the study period
end date. The prospective option is used for the early detection of disease outbreaks, when analyses are
repeated every day, week, month or year. Only alive clusters, clusters that reach all the way to current
time as defined by the study period end date, are then searched for.
Related Topics: Spatial Temporal and Space-Time Scan Statistics, Analysis Tab, Methodological
Papers, Computing Time, Spatial Window Tab, Temporal Window Tab, Time Aggregation.
40
Probability Model
There are eight different probability models that can be used: discrete Poisson, Bernoulli, space-time
permutation, multinomial, ordinal, exponential, normal and continuous Poisson. For purely spatial
analyses, the Poisson and Bernoulli models are good approximations for each other in many situations.
Temporal data are handled differently, so the models differ more for temporal and space-time analyses.
Discrete Poisson Model: The discrete Poisson model should be used when the background
population reflects a certain risk mass such as total person years lived in an area. The cases are
then included as part of the population count.
Bernoulli Model: The Bernoulli model should be used when the data set contains individuals who may
or may not have a disease and for other 0/1 type variables. Those who have the disease are cases and
should be listed in the case file. Those without the disease are 'controls', listed in the control file. The
controls could be a random set of controls from the population, or better, the total population except for
the cases. The Bernoulli model is a special case of the ordinal model when there are only two categories.
Space-Time Permutation Model: The space-time permutation model should be used when only case
data is available, and when one wants to adjust for purely spatial and purely temporal clusters.
Multinomial Model: The multinomial model is used when individuals belong to one of three or more
categories, and when there is no ordinal relationship between those. When there are only two categories,
the Bernoulli model should be used instead.
Ordinal Model: The ordinal model is used when individuals belong to one of three or more categories,
and when there is an ordinal relationship between those categories such as small, medium and large.
When there are only two categories, the Bernoulli model should be used instead.
Exponential Model: The exponential model is used for survival time data, to search for spatial and/or
temporal clusters of exceptionally short or long survival. The survival time is a positive continuous
variable. Censored survival times are allowed for some but not all individuals.
Normal Model: The normal model is used for continuous data. Observations may be either positive or
negative.
Continuous Poisson Model: The continuous Poisson model should be used when the null
hypothesis is that observations are distributed randomly with constant intensity according to a
homogeneous Poisson process over a user defined study area.
Related Topics: Analysis Tab, Bernoulli Model, Exponential Model, Methodological Papers, Ordinal
Model, Poisson Model, Probability Model Comparison, Space-Time Permutation Model, .
41
A new polygon is defined by first clicking the add button on the left to add a polygon, and then clicking
the add button on the right to add a linear inequality. The first inequality is then specified using the
equation editor at the bottom, followed by a click on the update button. After that, another inequality is
added, and so on, until all the polygons have been defined. If you need to change an inequality, use the
mouse to highlight the inequality you want to change, make the desired change in the equation editor, and
then click on the update button.
Note: The polygons must be non-overlapping. They do not need to be contiguous.
Related Topics: Analysis Tab, Continuous Poisson Model.
Time Aggregation
Space-time analyses are sometimes very computer intensive. To reduce the computing time, case times
may be aggregated into time intervals. Another reason for doing so is to adjust for cyclic temporal trends.
For example, when using intervals of one year, the analysis will automatically be adjusted for seasonal
variability in the counts, and when using time intervals of 7 days, it will automatically adjust for weekday
effects.
Units: The units in which the length of the time intervals are specified. This can be in years, months, days
or generic. The units of the time intervals cannot be more precise than the time precision specified on the
input tab. If generic time is used in the case file the unit for time aggregation must also be in generic
time, as vice versa.
Length: The length of the time intervals in the specified units.
Example: If interval units are years and the length is two, then the time intervals will be two years long.
Note: If the time interval length is not a fraction of the length of the whole study period, the earliest time
interval will be the remainder after the other intervals have received their proper length. Hence, the first
time interval may be shorter than the specified length. For a spatial variation in temporal trends analysis,
all time intervals must be of equal length, and if the first time interval is shorter, a warning is generated
and that interval is ignored in the analysis.
42
Important: For prospective space-time analyses, the time interval must be equal to the length between
the time-periodic analyses performed. So, if the time-period analyses are performed every week, then the
time interval should be set to 7 days.
Related Topics: Analysis Tab, Time Precision, Study Period, Computational Speed.
43
Output Tab
44
and show you the results when the analysis is completed. For this to work, you need to have the free
Google Earth software installed on your computer. If you want a KML file but do not want it to
automatically launch Google Earth, you can deselect the automatic launch on the Advanced Output Tab,
and instead simply click on the KML file when you want to show the results. In addition to Google Earth,
there are many other geographical software packages that can also read KML files.
Shape File: For spatial and space-time analyses, SaTScan will create a Shape file that can be used to
depict the detected clusters in geographical information systems. Two different files are created with
extensions .shp and .shx.
The KML and shapefile can only be generated when the geographical coordinates are specified using
latitudes and longitudes.
The names of these output files are the same as the names of the text output format files, but with
different filename extensions.
Related Topics: Output Tab, Results of Analysis, Column Format Output Files, Cluster Information File,
Location Information File, Temporal Graph Output File.
Cluster Information File: One row for each cluster, with information about that cluster.
Stratified Cluster Information File: For each cluster, there is one row for each data set when
multiple input data sets are used, and there is one row for each category used by the multinomial
or ordinal model. For each cluster, data set and category, the file contains observed and expected
cases, their ratio and the relative risk. This file is only useful for the multinomial and ordinal
models or when there are multiple data sets. For other analyses this file is redundant as it
contains a subset of the information already in the Cluster Information File.
Location Information File: One row for each location ID, with information about that location
and its cluster membership.
Risk Estimates File: One row for each location ID, with the estimated risk in that location.
Simulated Log Likelihood Ratios File: One row for each simulated data set, with the log
likelihood ratio test statistic for that data set. This file is primarily used by statisticians interested
in the distributional properties of scan statistics.
You must manually open all these files after the run is completed. They are provided in either ASCII or
dBase format so that they can be easily imported into spreadsheets, geographical information systems or
other database software. They have the same name as the standard text based output file, but with a
different filename extension.
Related Topics: Output Tab, Results of Analysis, Cluster Information File, Column Headers, Location
Information File, Risk Estimates for Each Location, Geographical Output Format, Temporal Graph
Output File.
45
Advanced Features
While most SaTScan analyses can be performed using the features on the three basic tabs for input,
analysis and output parameters, additional options are warranted for some types of analyses, and these are
available as advanced features. These features are reached through the Advanced button on the lower
right corner of each of the three main tabs. Advanced should be interpreted as additional or
uncommon rather than complex, difficult or better.
Since many of the advanced options depend on the selections made on the Input and Analysis Tabs, it is
recommended that those two tabs be filled in first.
Related Topics: Basic SaTScan Features, Multiple Data Sets Tab, Spatial Window Tab, Temporal
Window Tab, Spatial and Temporal Adjustments Tab, Inference Tab, Spatial Output Tab.
46
more data sets. The other purpose is to adjust for covariates. In this case the evidence of a cluster is based
on all data sets. The difference is discussed in more detail in the statistical methodology section.
Note: Multiple data sets cannot be used for the continuous Poisson model.
Warning: The computing time is considerably longer when analyzing multiple data sets as compared to a
single data set. Hence, it is not recommended to use multiple data sets when there are many locations in
the coordinates file.
Related Topics: Advanced Features, Input Tab, Multivariate Scan with Multiple Data Sets, Covariate
Adjustments Using Multiple Data Sets, Computing Time, Case File, Control File, Population File.
47
ignored. This may be used if, for example, you only want to analyze a geographical subset of the data, in
which case only the geographical coordinates file has to be modified for a discrete scan statistic while the
other files can be used as they are.
Related Topics: Advanced Features, Case File, Input Tab, Study Time Period.
48
such as both (x1,y1) and (x2,y2). The location ID can then be defined to be included in the circular
scanning window either (i) if at least one of the coordinate sets is located within the circle, or (ii) if and
only if all of the coordinate sets are within the circle. The multiple sets of coordinates are specified in the
coordinates file, with each one on a separate row.
Related Topics: Advanced Features, ASCII File Format, Input Tab, Meat Location File, Special
Neighbors File.
Use the Spatial Window Tab to define the exact nature of the scanning window with respect to
space.
Related Topics: Advanced Features, Analysis Tab, Temporal Window Tab, Maximum Spatial Cluster
Size, Include Purely Temporal Clusters.
49
doubt, choose a high percentage, since SaTScan will then look for clusters of both small and large sizes
without any pre-selection bias in terms of the cluster size. When calculating the percentage, SaTScan uses
the population defined by the cases and controls for the Bernoulli model, the covariate adjusted
population at risk from the population file for the discrete Poisson model, the cases for the space-time
permutation, multinomial, ordinal, exponential and normal models and the size of the circle as a
percentage of the total area in the polygons for the continuous Poisson model. When there are multiple
data sets, the maximum is defined as a percentage of the combined total population/cases in all data sets.
It is also possible to specify the maximum circle size in terms of actual geographical size rather than
population. If latitude/longitude coordinates are used, then the maximum radius should be specified in
kilometers. If Cartesian coordinates are used, the maximum radius should be specified in the same units
as the Cartesian coordinates.
Alternatively, for the discrete scan statistics, it is possible to specify a max circle size file to define the
maximum circle size. This file must contain a population for each location, and the maximum circle
size is then defined as a percentage of this population rather than the regular one. This feature may be
used when, for example, you want to define the circles in the Bernoulli or space-time population models
based on the actual population rather than the locations of cases and controls. It may also be used if you
want the geographical circles to include for example at most 10 counties out of a total of 100,
irrespectively of the population in those counties. This is accomplished by assigning a population of 1 to
each county in the special max circle size file and then set the maximum circle size to be 10% of this
population.
If a prospective space-time analysis is performed, adjusting for earlier analyses, and if the max circle size
is defined as a percentage of the population, then the special max circle size file must be used. This is to
ensure that the evaluated geographical circles do not change over time.
Related Topics: Advanced Features, Spatial Window Tab, Max Circle Size File, Include Purely
Temporal Clusters, Computing Time.
50
shape and angle, all possible sizes of the ellipses are used, up to an upper limit specified by the user in the
same way as for the circular window.
When using an elliptic window shape, it is possible to request a non-compactness (eccentricity) penalty,
which will favor more compact over less compact ellipses even when they have slight lower likelihood
ratios but the less compact ellipses when the difference is larger. The formula for the penalty is
[4s/(s+1)2]a, where s is the elliptic window shape defined as the ratio of the length of the longest to the
shortest axis of the ellipse. With a strong penalty a=1, with a medium penalty a=1/2 and with no penalty
a=0.
Note: In batch mode, it is possible to request SaTScan to use any other collection of ellipses to define the
scanning window and any value of the eccentricity penalty parameter greater than zero.
Note: The elliptic window option can only be used when regular two-dimensional Cartesian coordinates
are used, but not when they are specified as latitude/longitude. If you have the latter, you must first do a
planar map projection from the latitude/longitude coordinates, of which there are many different ones
proposed in the geography literature.
Note: The elliptic scanning window is not available for the continuous Poisson model.
Related Topics: Advanced Features, Computing Time, Include Purely Spatial Clusters, Likelihood Ratio
Test, Maximum Spatial Cluster Size, Spatial Temporal and Space-Time Scan Statistics, Spatial Window
Tab.
51
Use the Temporal Window Tab to define the exact nature of the scanning window with respect to
time.
Related Topics: Advanced Features, Analysis Tab, Spatial Window Tab, Maximum Temporal Cluster
Size, Include Purely Spatial Clusters, Flexible Temporal Window Definition.
52
53
Covariates are adjusted for either by including them in the case and population files or by using multiple
data sets, depending on the probability model used. The features on this tab are used to adjust for
temporal, spatial and space-time trends and variation. Most are only available when using the discrete
Poisson probability model. The one exception is the space by day-of-week interaction adjustment for the
space-time permutation model.
Related Topics: Advanced Features, Analysis Tab, Spatial and Temporal Adjustments, Temporal Trend
Adjustment, Spatial Adjustment, Adjustment with Known Relative Risk, Poisson Model.
54
Day-of-Week Adjustment
For some data, such as physician visits, there may be natural day-of-week variation that should be
adjusted for. This is done in a non-parametric fashion. The effect of requesting a day-of-week adjustment
on the Space and Time Adjustments Tab is the same as the effect of including a day-of-week covariate in
the input files.
With the space-time permutation model, day-of-week is automatically adjusted for whether specifically
requested or not, as part of its complete adjustment for any purely temporal variation. Instead, it is
possible to request an adjustment for day-of-week by space interaction. This adjusts for the fact that some
geographical areas may have a different day-of-week pattern than other areas. For example, one medical
clinic may have a large number of weekend visits while another clinic may be closed on weekends. The
effect is the same as including day-of-week as a covariate in the case file used by the space-time
permutation model.
Related Topics: Spatial and Temporal Adjustment Tab, Temporal Trend Adjustments, Adjustment with
Known Relative Risk, Space-Time Permutation Model.
Spatial Adjustment
When a purely spatial analysis is performed the purpose is to find purely spatial clusters. For the spacetime scan statistic, this feature adjusts away all such clusters, to see if there are any space-time clusters
not explained by purely spatial clusters. This is done in a non-parametric fashion, through stratified
randomization by location, so that the total number of cases in each specific location is the same in the
real and random data sets. That is, only the time of a case is randomized.
The default is no spatial adjustment.
Note: It is not possible to simultaneously adjust for spatial clusters and purely temporal clusters using
stratified randomization. If both types of adjustments are desired, the space-time permutation model
should be used instead. It is possible to adjust for purely spatial clusters with stratified randomization
together with a temporal adjustment using a log linear trend.
Related Topics: Spatial and Temporal Adjustment Tab, Spatial and Temporal Adjustments, Temporal
Trend Adjustment, Adjustment with Known Relative Risk, Poisson Model.
55
Inference Tab
P-Value
To calculate p-values for detected clusters, SaTScan program uses computer simulations to generate a
number of random replications of the data set under the null hypothesis. If the maximum likelihood ratio
calculated for the most likely cluster in the real data set is high compared to the maximum likelihood
ratios calculated for the most likely clusters in the random data sets, that is evidence against the null
hypothesis and for the existence of clusters. The comparison can be done in either of three ways, or by
using a combination of them, the latter being the default option.
Standard Monte Carlo: The test statistic is calculated for each random replication as well as for the real
data set, and if the latter is among the 5 percent highest, then the test is significant at the 0.05 level. If it is
among the 1 percent highest, the test is significant at the 0.01 level, and so on. This is called Monte Carlo
hypothesis testing, and was first proposed by Dwass15. Irrespective of the number of Monte Carlo
replications chosen, the hypothesis test is unbiased, resulting in a correct significance level that is neither
conservative nor liberal nor an estimate. The number of replications does affect the power of the test, with
more replications giving slightly higher power.
In SaTScan, the number of replications must be at least 999 to ensure excellent power for all types of data
sets. For small to medium size data sets, 9999 replications are recommended since computing time is not
a major issue.
SaTScan User Guide v9.4
56
Sequential Monte Carlo: With more Monte Carlo replications, the power of the scan statistic is higher,
but it is also more time consuming to run. When the p-value is small, this is often worth the effort, but for
large p-values it is often irrelevant whether for example p=0.7535 or p=0.8545. SaTScan provides the
option to terminate the Monte Carlo simulations early when the p-value is large, by employing the
sequential Monte Carlo test.16, 17 With this option, the SaTScan calculations will terminate as soon as a
fixed number of Monte Carlo replicas has a likelihood ratio that is larger than the likelihood ratio from the
real data set. The default value is 50 replicas. If the fixed number is never reach, the calculations will
continue until the maximum number of Monte Carlo replicas has been reach. With the default values of
50 and 999, there is no loss of power at the alpha=0.05 level, when comparing the sequential to the
standard Monte Carlo test.
Gumbel Approximation: With 999 random replicas, the lowest p-value that the Monte Carlo hypothesis
testing can report is 1/(999+1)=0.001. Likewise, with 9999 replicas, the lowest possible p-value is 0.0001.
As an alternative option to Monte Carlo hypothesis testing, it is possible to employ the Gumbel extreme
value distribution to estimate approximate p-values.18 With this approach, there is no lower limit on the
resulting p-values (other than p>0, of course). The methods works by first generating 999, or some other
number of random replicas of the data under the null hypothesis. The maximum likelihood ratio from
each replica is then used to fit a Gumbel distribution to the data using methods of moments estimation.
Once the Gumbel distribution that best fits the data has been obtained, the p-value is calculated as the
probability that this distribution generates a value greater than the maximum likelihood ratio observed for
the most likely cluster from the real data set.
For the purely spatial scan statistic with the discrete Poisson and Bernoulli probability models, it has been
shown that the Gumbel distribution fits the data very well and that it generates very accurate p-values18.
There have not yet been any similar studies for the other scan statistics, so this option is for the time being
only available for purely spatial analyses with those two probability models.
Default P-value: As the default, SaTScan will calculate the p-values by using a combination of the three
manners described above. For example, it may present the sequential Monte Carlo based p-value unless
the p-value is very small, in which case it will report the Gumbel approximation. The exact approach
depends on the type of analysis requested and the nature of the data, since the sequential Monte Carlo and
the Gumbel approximation does not work for all analyses and data sets
Note: In prior versions of SaTScan, the standard Monte Carlo was the default method for calculating pvalues, and the other options are new to version 9.0.
Related Topics: Analysis Tab, Computing Time, Inference Tab, Likelihood Ratio Test, Monte Carlo
Rank, Monte Carlo Replications, Random Number Generator, Results of Analysis.
57
For the adjustment to be correct, it is important that the scanning spatial window is the same for each
analysis that is performed over time. This means that the grid points defining the circle centroids must
remain the same. If the location IDs in the coordinates file remain the same in each time-periodic
analysis, then there is no problem. On the other hand, if new IDs are added to the coordinates file over
time, then you must use a special grid file and retain this file through all the analyses. Also, when you
adjust for earlier analyses, and if the max circle size is defined as a percentage of the population, then the
special max circle size file must be used.
Related Topics: Inference Tab, Computing Time, Type of Analysis, Spatial Temporal and Space-Time
Scan Statistics.
58
Oliveiras F
This option is available for a purely spatial Poisson analysis, when the most likely clusters are reported
hierarchically, with no geographical overlap.
Oliveiras F is a measure of how likely a particular location is in the true cluster, if and when there is a
true cluster. It is calculated for each location, such as a census tract, postal-code or country, depending on
the input data. A higher F-value means that it is more likely to be part of the true cluster, but, and this is
important, even though it takes a value in the interval [0,1], it should NOT be interpreted as the
59
probability of belonging to the cluster. The only interpretation should be that a higher value means that it
is more likely to be part of the true cluster. If there are no statistically significant clusters, there is no
Oliveiras F to report, as they are all zero.
As implemented in SaTScan, Oliveiras F is a variation on the method described by Oliveira et al. (2011).
When there is only a single statistically significant cluster reported, it is calculated in the following
manner:
1. Conditioned on the total number of cases in the data set, generate a random data set, where the
expected count for a location is equal to the observed count, using a multinomial probability
distribution. These are the counts generated by independent Poisson distributions conditioned on
the total number of cases observed.
2. For this random data set, run the spatial scan statistic with the original population based expected
counts. Note which locations are inside the most likely cluster.
3. Repeat steps 1 and 2 a large number of time, e.g. 1000. For each location, note how many of the
1000 times it belonged to the most likely cluster. Oliveiras F is defined as the proportion of those
times that it belonged to the most likely cluster.
When the SaTScan analysis generates two statistically significant clusters, step 2 above makes use of the
two most likely clusters, and so on.
Since the calculation of Oliveiras F require a second set of simulations, it increases the required
computing time.
The results Oliveiras F values are provided in the Risk Estimates for Each Location results file. This is
normally an optional results file, but when Oliveiras F is calculated, this file will be created even if it was
not requested on the Output Tab. Oliveiras F is also provided in the Location Information results file,
but this file is only generated if it was explicitly requested on the Output Tab.
The F values provided by SaTScan can be used to create a color coded map where the locations in a
cluster are depicted with different colors depending on its F value.
Related Topics: Border Analysis Tab, Criteria for Reporting Overlapping Clusters, Risk Estimates for
Each Location File.
60
61
The statistical power will vary greatly depending on the total number of cases in the data set, on the
population size of the cluster and on the relative risk in the cluster, with higher values resulting in higher
power.
This feature is only available for the discrete Poisson probability model
Related Topics: Poisson Probability Model, Alternative Hypothesis File
62
63
64
including clusters with p=1.0. This means that the number of clusters reported is identical to the number
of grid points. WARNING: This option may create output files that are very large in size.
Gini Index Cluster Reporting Option
This option is only valid for purely spatial analyses with the Poisson or Bernoulli models. To create the
collection of clusters based on the Gini index, SaTScan first defines a collection of upper limits on the
cluster size, with the default collection being 1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 25, 30, 35, 40, 45 and 50
percent of the population. For each upper limit, the hierarchical no-geographical overlap cluster collection
criterion is used to define a set of clusters, as described above. The Gini index is then calculated for this
set of clusters, and when repeated for each upper limit, we get twelve different Gini indicies. SaTScan
then picks the collection that miximizes the Gini index. These are called Gini clusters. As an optional
feature, SaTScan will report the values of the Gini indicies for each of the upper limits used.
Note: It is possible to request that both the hierarchical and Gini clusters are reported, and this is the
default setting. This means that there may be overlapping clusters even when the hierarchical nonoverlapping cluster option is selected, since some of the hierarchical clusters may overlap with some of
the Gini clusters, without being identical. If none are requested, SaTScan will only report the most likely
cluster.
Note: The criteria for determining overlap is based only on geography, ignoring time. Hence, in a spacetime analysis, a secondary cluster may not be reported if it is in the same location as a more likely cluster,
even if they are non-overlapping in time.
Related Topics: Advanced Features, Inference Tab, Results of Analysis, Maximum Spatial Cluster Size,
Report Only Small Clusters.
65
Temporal Graphs
When a purely temporal or a space-time analysis is conducted, you can request SaTScan to produce
temporal graphs depicted the observed and expected counts over time, both inside and outside the cluster.
It will also show the ratio of observed over expected.
This option is available for the Poisson, Bernoulli, space-time permutation and exponential probability
models, but not for the multinomial, ordinal or normal probability models.
You can select whether you only want a graph for the most likely cluster, or if you want multiple graphs
for a fixed number of clusters or all clusters with a p-value less than some specified value.
The graphs are generated as HTML files that you can open in any web browser. The name and location of
the file is listed in the parameter section of the standard results file. Once you have opened the HTML
file, you can edit it, and also generate the temporal graphs in PNG, JPEG, PDF and SVG formats. These
other formats do not need to be pre-specified.
Related Topics: Output Tab, Results of Analysis, Temporal Graphs HTML File.
66
Critical Values
When selecting this option, SaTScan will report the critical values needed in order for a cluster to be
statistically significant at the 0.05 and 0.01 alpha levels. The critical values are reported on the standard
results file.
Related Topics: Standard Results File.
Column Headers
Check this box if you want column headers in the Other Output files.
Related Topics: Column Output Format, Column Information File, Location Information.
67
Running SaTScan
Specifying Analysis and Data Options
The SaTScan program requires that you specify parameters defining input, analysis and output options for
the analysis you wish to conduct. A tabbed dialog is provided for this purpose. To access the parameter
tab dialog, either press the
button or select the File/New menu item. Specify the parameters for your
session on the following tabs:
Input Tab
Analysis Tab
Output Tab
See the section on Basic SaTScan Features for instructions on how to fill in these tabs.
Most analyses can be performed using only these three tabs. For each tab, there are additional features
that can be selected by first clicking on the Advanced button in the lower right corner of the tab. These
additional features may be useful in special circumstances.
The available choices for some features may depend on what was selected in other places. For example, if
a purely spatial analysis is chosen, the space-time permutation model is not available, and vice versa.
Related Topics: Basic SaTScan Features, Input Tab, Analysis Tab, Output Tab, Advanced Features,
Launching the Analysis.
Status Messages
Status messages are displayed as the program executes the analysis, as the data is read, and at each step of
the analysis. Normal status messages are displayed in the top box of the job status window. Warnings and
error messages are displayed in the bottom box of the job status window. Upon successful completion of
the calculations, the standard results file will be shown in the job status window.
Related Topics: Launching the Analysis, Warnings and Errors.
68
Warning Messages
SaTScan may produce warnings as the job is executing. If a warning occurs, a message is displayed in the
Warnings/Errors box on the bottom of the job status window. A warning will not stop the execution of the
analysis. If a warning occurs, please review the message and access the help system if further information
is required.
If you do not want to see the warning messages, they can be turned off by clicking Session > Execute
Options > Do not report warning messages.
Error Messages
If a serious problem occurs during the run, an error message will be displayed in the Warnings/Errors box
on the bottom of the job status window and the job will be terminated. The user may resolve most errors
by reviewing the message and using the help system.
One of the most common errors is that the input files are not in the required format, or that the file
contents are incompatible with each other. When this occurs, an error message will be shown specifying
the nature and location of the problem. Such error messages are designed to help with data cleaning.
If the error message cannot be resolved, you may press the email button on the job status window. This
will generate an automatic email message to SaTScan technical support. The contents of the
Warnings/Errors box will be automatically placed in the e-mail message. All a user needs to do is press
their e-mail Send key. Users may also print the contents of the Warnings/Errors box and even select, copy
(ctrl c) and paste (ctrl v) the contents if necessary.
Related Topics: Input Data, Data Requirements, SaTScan Support.
69
Parallel Processors
If you have parallel processors on your computer, SaTScan can take advantage of this by running
different Monte Carlo simulations using different processors, thereby increasing the speed of the
calculations. The default is that SaTScan will use all processors that the computer has. If you want to
restrict the number, you can do that by clicking on Session > Execute Options, and selecting the
maximum number of processors that SaTScan is allowed to use.
Batch Mode
SaTScan is most easily run by clicking the Execute
button at the top of the SaTScan window, after
filling out the various parameter fields in the Windows interface.
An alternative approach is to skip the windows interface and launch the SaTScan calculation engine
directly by either:
70
Computing Time
The spatial and space-time scan statistics are computer intensive to calculate. The computing time
depends on a wide variety of variables, and depending on the data set and the analytical options chosen; it
could range from a few seconds to several days or weeks. The multinomial, ordinal and normal models
are in general much more computer intensive than the other discrete scan statistics. Other than that, the
three main things that increase the computing time is the number of locations in the coordinates and
special grid files, the number of time intervals (for space-time analyses) and the number of data sets used.
LGMT kmS / P
where:
L = number of geographical data locations in the coordinates file (L=1 for purely temporal
analyses)
G = number of geographical coordinates in the special grid file. If there is no such file, G=L.
M = maximum geographical cluster size, as a proportion of the population ( 0 < M = , M=1 for
a purely temporal analysis)
71
T = number of time intervals into which the temporal data is aggregated (T=1 for a purely spatial
analysis)
m = maximum temporal cluster size, as a proportion of the study period ( 0 < m = 0.9, m=1 for
purely spatial analysis)
S = number of Monte Carlo simulations
P = number of processors available on the computer for SaTScan use
k = 1 for purely spatial, prospective temporal and prospective space-time analyses without
adjustments for earlier analyses
k = 2 for retrospective temporal and retrospective space-time analyses
The unit of the above formula depends on the probability model used and on the speed of the computer.
When the total number of cases is very large compared to the number of locations and time intervals, the
computing time for the discrete Poisson, Bernoulli and exponential models is instead on the order of:
CS / P
where:
Memory Requirements
SaTScan uses dynamic memory allocation. Depending on the nature of the input data, SaTScan will
automatically choose one of two memory allocation schemes: the standard one and a special one for data
sets with very many spatial locations but few time intervals and few simulations.
72
Memory Needed
3,500
32 Mb
6,500
64 Mb
10,000
128 Mb
15,000
256 Mb
22,000
512 Mb
32,000
1 Gb
44,000
2 Gb
63,000
4 Gb
89,000
16 Gb
126,000
32 Gb
178,000
64 Gb
250,000
128 Gb
73
Insufficient Memory
If there is insufficient memory available on the computer to run the analysis using either memory
allocation scheme, there are several options available for working around the limitation:
Decrease the number of circle centroids in the special grid file (reduce G).
It is highly desirable that there is sufficient RAM to cover all the memory needs, as SaTScan runs
considerable slower when the swap file is used, so these techniques may also be used to avoid the swap
file. Not all of these above options will work for all data sets. Please note that the following SaTScan
options do not influence the demand on memory:
Note: The 32-bit windows operating system can allocate a maximum of 2 GBytes of memory to a single
application, and that is hence the upper limit on the memory for the 32-bit windows version of SaTScan.
The Linux version of SaTScan can be used to analyze larger data sets.
Related Topics: Coordinates File, Grid File, Spatial Temporal and Space-Time Scan Statistics, Spatial
Window Tab, Temporal Window Tab, Monte Carlo Replications, Multiple Data Sets Tab, Warnings and
Errors.
74
Results of Analysis
As output, SaTScan creates one standard text based results file in ASCII format, optional geographical
format output files in KML or shapefile format, an optional temporal graph file in HTML format, and
five optional column format output files in either ASCII or dBase format. Some of the optional files are
useful when exporting output from SaTScan into other software such as a spreadsheet or a geographical
information system.
All results file will be in the same directory and have the same name as the standard output file specified
on the Output Tab, except for the extension.
Related Topics: Output Tab, Spatial Output Tab, Standard Results File, Cluster Information File,
Location Information File, Risk Estimates for Each Location, Simulated Log Likelihood Ratios, Analysis
History File.
SUMMARY OF DATA: Use this to check that the input data files contain the correct number of
cases, locations, etc.
Total population (discrete Poisson model): This is the average population during the study
period.
Annual rate per 100,000 (discrete Poisson model): This is calculated taking leap years into
account and is based on the average length of a year of 365.2425. If calculated by hand ignoring
leap years, the numbers will be slightly different, but not by much.
Variance (normal model): This is the variance for all observations in the data assuming a
common mean.
MOST LIKELY CLUSTER: Summary information about the most likely cluster, that is, the
cluster that is least likely to be due to chance.
Radius: When latitude and longitude are used, the radius of the circle is given in kilometers.
When regular Cartesian coordinates are used, the radius of the circle is given in the same units as
those used in the coordinates file.
Population: This is the average population in the geographical area of the cluster. The average is
taken over the whole study period even when it is a space-time cluster whose temporal length is
only a part of the study period.
Relative Risk: This is the estimated risk within the cluster divided by the estimated risk outside
the cluster. It is calculated as the observed divided by the expected within the cluster divided by
the observed divided by the expected outside the cluster. In mathematical notation, it is:
RR
c / E [c ]
( C c ) /( E [ C ] E [ c ])
c / E [c ]
( C c ) /( C E [ c ])
75
where c is the number of observed cases within the cluster and C is the total number of cases in
the data set. Note that since the analysis is conditioned on the total number of cases observed,
E[C]=C.
Observed / Expected: This is the observed number of cases within the cluster divided by the
expected number of cases within the cluster when the null hypothesis is true, that is, when the risk
is the same inside and outside the cluster. This means that it is the estimated risk within the
cluster divided by the estimated risk for the study region as a whole. It is calculated as: c/E[c].
For the continuous Poisson model, the expected count is an upper bound when the scanning
window crosses the border of the spatial study region. That means that the Obs/Exp is a lower
bound.
Variance (normal model): This is the estimated common variance for all observations in the,
taking into account the different estimated means inside and outside the cluster. The weighted
variance is adjusted for the weights when provided by the user.
Time trend (spatial variation in temporal trends): Provides the estimated time trends inside and
outside the detected clusters on the log linear scale where the percent increase or percent decrease
is constant over time.
P-value: The p-values are adjusted for the multiple testing stemming from the multitude of
circles/cylinders corresponding to different spatial and/or temporal locations and sizes of
potential clusters evaluated. This means that under the null-hypothesis of complete spatial
randomness there is a 5% chance that the p-value for the most likely cluster will be smaller than
0.05 and a 95% chance that it will be bigger. Under the null hypothesis there will always be some
area with a rate higher than expected just by chance alone. Hence, even though the most likely
cluster always has an excess rate when scanning for areas with high rates, the p-value may
actually be very close or identical to one.
Recurrence Interval: For prospective analyses, the recurrence interval20 (or, null occurrence
rate) is shown as an alternative to the p-value. The measure reflects how often a cluster of the
observed or larger likelihood will be observed by chance, assuming that analyses are repeated on
a regular basis with a periodicity equal to the specified time interval length. For example, if the
observed p-value is used as the cut-off for a signal and if the recurrence interval is once in 14
months, than the expected number of false signals in any 14 month period is one.
If no adjustments are made for earlier analysis, then the recurrence interval is once in D/p days,
where D is the number of days in each time interval. If adjustments are made for A-1 earlier
analyses, then the recurrence interval is once every D / [ 1 (1-p)1/A ] days.
SECONDARY CLUSTERS: Summary information about other clusters detected in the data.
The information provided is the same as for the most likely cluster. Only clusters with p<1 are
displayed.
P-values listed for secondary clusters are calculated in the same way as for the most likely cluster,
by comparing the log likelihood ratio of secondary clusters in the real data set with the log
likelihood ratios of the most likely cluster in the simulated data sets. This means that if a
secondary cluster is significant, it can reject the null hypothesis on its own strength without help
of any other clusters. It also means that these p-values are conservative1.
PARAMETER SETTINGS: A reminder of the parameter settings used for the analysis.
Additional results files: The name and location of additional results files are provided, when
applicable.
76
Related Topics: Output Tab, Spatial Output Tab, Cluster Information File, Location Information File,
Risk Estimates for Each Location, Simulated Log Likelihood Ratios, Cartesian Coordinates, Column
Output Format.
By clicking on the descriptors below the graph, you can remove selected part of the graph, as well
as add them back again.
By dragging the mouse over the graph, you will see the exact numerical observed and expected
counts for each time interval.
77
By clicking on the Show Chart Options at the bottom left of the graph, you can add a title, you
can switch between showing the counts using histograms and line, and you can make the cluster
more visible by adding a cluster band.
If you want to print the graph, click on the three horizontal lines in the upper right hand corner of the
graph. There you can also save the graph in four other file formats: PNG, JPEG, PDF and SVG. This is
done separately for each cluster.
Related Topics: Temporal Output Tab, Results of Analysis.
78
Table 3: Content of the cluster information output file, with dBase variable names and examples of
column ordering for a few different types of analyses.
SaTScan User Guide v9.4
79
<Location ID>
<Cluster Number>
<P-Value of Cluster>
<Observed/Expected in Cluster>
<Observed/Expected in Location>
Note: The second, third, fourth, fifth sixth and seventh column entries are the same for all locations
belonging to the same cluster.
Related Topics: Output Tab, Results of Analysis, Standard Results File, Cluster Information File.
80
This file may be accessed using any text editor or spreadsheet program. It will have the same name as the
results file, but with the extension *.rr.txt or *.rr.dbf, and it will be located in the same directory. The file
is only available for the discrete scan statistic, and hence, not for the continuous Poisson model.
If Oliveiras F is requested on the Boundary Analysis Tab, there will be an additional column showing
Oliveiras F for each location ID.
Related Topics: Oliveiras F, Output Tab, Results of Analysis, Standard Results File..
81
Miscellaneous
New Versions
To check whether there is a later version than the one you are currently using, simply click on the update
button
on the tool bar. If a newer version exists, you will be asked whether you want to download and
install it. After checking, you can request that SaTScan automatically checks for new versions once a
week, once a month or every time SaTScan is used. Alternatively, you can set SaTScan to only check for
new versions manually, when you decide to do so.
At any given time, it is also possible to download the latest version of SaTScan from the World Wide
Web at http://www.satscan.org/.
Related Topics: Download and Installation.
82
Contact Us
Please direct technical questions about installation and running the program, as well as the web site, to:
techsupport@satscan.org
Please direct substantive questions about the statistical methods and suggestions about new features to:
Martin Kulldorff, Professor, Biostatistician
Department of Population Medicine
Harvard Medical School and Harvard Pilgrim Health Care Institute
133 Brookline Avenue, 6th Floor, Boston, MA 02215, USA
Email: kulldorff@satscan.org
Acknowledgements
Financial Support
National Cancer Institute, Division of Cancer Prevention, Biometry Branch [SaTScan v1.0, 2.0, 2.1]
National Cancer Institute, Division of Cancer Control and Population Sciences, Statistical Research and
Applications Branch [SaTScan v3.0 (part), 6.1 (part), 9.2-9.4]
Alfred P. Sloan Foundation, through a grant to the New York Academy of Medicine (Farzad
Mostashari, PI) [SaTScan v3.0 (part), 3.1, 4.0, 5.0, 5.1]
Centers for Disease Control and Prevention, through Association of American Medical Colleges
Cooperative Agreement award number MM-0870 [SaTScan v6.0, 6.1 (part)]
National Institute of Child Health and Development, through grant #R01HD048852
[7.0,8.0,v9.0(part)]
National Institute of General Medical Sciences, through Modeling Infectious Disease Agent Study
(MIDAS) grant #U01GM076672 [v9.0 (part),9.1]
Their financial support is greatly appreciated. The contents of SaTScan are the responsibility of the
developer and do not necessarily reflect the official views of funders.
Allyson Abrams, Harvard Medical School & Harvard Pilgrim Health Care
Frank Boscoe, New York State Health Department
Eric Feuer, National Cancer Institute
Laurence Freedman, National Cancer Institute
David Gregorio, University of Connecticut
Gran Gustafsson, Karolinska Institute, Sweden
Jessica Hartman, New York Academy of Medicine
Richard Heffernan, New York City Department of Health
Kevin Henry, New Jersey Department of Health
Ulf Hjalmars, stersund Hospital, Sweden
Richard Hoskins, Washington State Department of Health
83
84
SaTScan makes sure that the input data is compatible with each other, and with the options
specified on the windows interface. For example, it complains if there is a location ID in the case
file that is not present in the coordinates file, as it must know where to localize those cases. For
most data sets there is some need for data cleaning and SaTScan is designed to help with this
process by spotting and pointing out any inconsistencies found.
2. I have constructed the ASCII input files exactly according to the description in the SaTScan
User Guide, but SaTScan complains that they are not in the correct format. What is wrong?
The most likely explanation is that the files are in UNICODE rather than ASCII format. Just
convert to ASCII and it should work.
3. In my data, there is zero or only one case in most locations. Can I use SaTScan for such sparse
data?
Yes, you certainly can. One of the main reasons for using SaTScan is to avoid arbitrary
geographical aggregation of the data, letting the scan statistic consider different smaller or larger
aggregations through its continuously moving window. With finer geographical resolution of the
input data, SaTScan can evaluate more different cluster locations and sizes without restrictions
imposed by administrative geographical boundaries, minimizing assumptions about the
geographical cluster location and size.
The stability of rates does not depend on the geographical resolution of the input data, but on the
population size of the circles constructed by SaTScan.
The purely temporal scan statistic can be run with only one geographical location. The space-time
scan statistic needs at least two locations. With only two locations, the space-time scan statistic
will look for temporal clusters in either or both of the locations. Technically, the purely spatial
scan statistic can also be run using only two geographical locations, providing correct inference.
There is no point using a purely spatial scan statistic for such data though, for which a regular
chi-square statistic can be used instead, as there is no multiple testing to adjust for. With three
locations or more, the fundamental scan statistic concept of including different combinations of
locations into the potential clusters is being utilized. In most practical applications though, the
spatial and space-time scan statistics are used for data sets with hundreds or thousands of
geographical locations. If there is a choice, less spatial aggregation of the data is typically better,
which means more geographical locations.
85
Analysis
6. With latitude/longitude coordinates, what planar projection is used?
No projection is used. SaTScan draws perfect circles on the spherical surface of the earth.
Use the Bernoulli model when you have binary data, such as cases and controls, late and early
stage cancer or people with and without a disease. Use the Poisson model when you have cases
and a background population at risk, such as population numbers from the census.
8. SaTScan adjusts for categorical covariates, but I want to adjust for a continuous variable. Is
that possible?
One way to do this is to categorize the continuous variable. A better approach is to (i) calculate
the adjustment using a regular statistical software package such as SAS, (ii) use the result from
that analysis to calculate the covariate adjusted expected number of cases at each location, and
(iii) use these expected values instead of the population in the population file. With this approach,
there should not be any covariates in either the case or the population files.
9. What should I use as the maximum geographical cluster size? Is that an arbitrary choice?
If you dont want the choice to be arbitrary, choose 50% of the population as the maximum
geographical cluster size. SaTScan will then evaluate very small and very large clusters, and
everything in-between. To find a good collection of non-overlapping cluster, use the Gini index
feature.
10. Why cant I select a maximum geographical cluster size that is larger than 50% of the
population?
Clusters of excess risk that are larger than 50% of the population at risk are better viewed as
cluster with lower risk outside the scanning window, and the area outside will always have a very
irregular geographical shape. If there is interest in clusters with lower risk than expected, it is
more appropriate to select the low rates option on the analysis tab.
11. I have memory problems when running SaTScan. What should I do?
Make sure you are running SaTScan in 64-bit mode. For this you must (i) have a 64 bit computer,
(ii) run SaTScan version 8 or later, and (iii) have 64-bit Java installed on your computer.
Results
12. I get an error stating that the output file could not be created. Why?
In Windows, permission to write to the "Program Files" folder is given only to administrators and
power users of that machine. If the output file path includes the "Program Files" folder and you
do not have administrative or power user privileges on your computer, Windows prevents
SaTScan from creating the output file in the designated location. The solution is to specify a
different output file name using a different directory.
86
13. Since the SaTScan results are based on Monte Carlo simulated random data, why are the pvalues the same when I run the analysis twice?
All computer-based simulations are based on pseudo-random number generators. When the same
seed is used, exactly the same sequence of pseudo-random numbers will be generated. Since
SaTScan uses the same seed for every run, you obtain the same result for two runs when the input
data is the same.
14. I ran exactly the same data using two different versions of SaTScan, but the p-values are
different. Why? Which one is the correct one?
Compared to v2.1, the pseudo-random number generation is done slightly differently in SaTScan
v3.0 and later, typically resulting in slightly different p-values. In earlier version, SaTScan
defined overlapping clusters based on whether the two circles where overlapping. In SaTScan
v5.0 and later, two clusters overlap if they have at least one location ID in common. These two
definitions are usually the same, but in rare cases they may be different. If you were running the
Poisson model, another possible reason for the difference is that SaTScan v5.0 and later uses a
more precise algorithm for calculating the expected number of cases when the population dates in
the population file are specified using days rather than months or years.
While p-values from all versions are valid and correct, only one p-value should be used. We
recommend always using the p-value that was calculated first.
15. I found a spatial cluster, but some of the locations in the cluster have zero cases. Why are they
included in the cluster?
A location zip-code is included in the cluster if its centroid is included in the scanning circle. This
means that for sparse data, it is very common to have locations with zero cases in the cluster. This
typically happens when the expected counts are low and it is surrounded by other location with a
lot of cases. It is also important to note that while SaTScan gives the exact boundaries of the
detected cluster, the exact boundaries are not reliable and there may be areas within the detected
cluster that does not belong to the true cluster and vice versa. This can happen (i) by chance
where some areas with elevated risk still have zero cases due to small expected counts (e.g. 1
expected under H0, 2 expected due to a RR of 2, and 0 observed since a Poisson distribution with
2 expected has a high probability of generating 0 counts), (ii) if the true cluster is not exactly
circular or (iii) if there are several clusters close to each other with spare population in between.
In the latter case SaTScan may present one combined cluster rather than 2-3 separate clusters.
Interpretation
16. In SaTScan, after adjusting for population density and covariates such as age, the nullhypothesis is complete spatial randomness. For most disease data that is not true. Does this
mean that the null hypothesis is wrong?
When accepting the notion of statistical hypothesis testing one must also accept the fact that the
null hypothesis is never true. For example, when comparing the efficacy of two different surgical
procedures in a clinical trial we know for sure that their efficacy cannot be equal, but we still use
equality as the null hypothesis since we are interested in finding out whether one is better than the
other. Likewise, with geographical data we know that disease risk is not the same everywhere but
we still use it as the null hypothesis since we are interested in finding locations with excess risk.
Hence, the null hypothesis is wrong in the sense that we know it is not true but it is not wrong in
the sense that we should not use it.
87
17. Does SaTScan assume that there is no spatial auto-correlation in the data? (Note: Spatial autocorrelation means that the location of disease cases is dependent on the location of other disease
cases, such as with an infectious disease where an infected individual is likely to infect those
living close by.)
No, SaTScan does not assume that there is no spatial auto-correlation in the data. Rather, it is a
test of whether there is spatial auto-correlation or other divergences from the null hypothesis. In
this sense it is equivalent to a statistical test for normality, which does not assume that the data is
normally distributed but tests whether it is.
18. If I am interested in whether there is spatial auto-correlation in the data, why should I use the
spatial scan statistic rather than a traditional spatial auto-correlation test?
If you are only interested in whether there is spatial auto-correlation or not, but dont care about
cluster locations, there are tests for spatial auto-correlation / global clustering that have higher
power than the spatial scan statistic and should be used instead. The spatial scan statistic should
be used when you are interested in the detection and statistical significance of local clusters.
19. In spatial statistics, is it not always important to adjust for spatial auto-correlation? This
cannot be done in SaTScan.
Whether to adjust for spatial auto-correlation depends on the question being asked from the data.
As an example, lets assume that we have geographical data on people who get sick due to food
poisoning. In such data there is clearly spatial auto-correlation, since bad food sold at restaurants
or grocery stores are often sold to multiple customers, many of who will live in the same
neighborhood.
If we are doing spatial regression trying to determine what neighborhood characteristics such as
mean income, house values, educational levels or ethnic origin contribute to a higher risk for food
poisoning, it is critical to adjust for the spatial auto-correlation in the data. If not, the confidence
in the risk relationships will be overestimated with biased p-values that are too small, providing
statistically significant results when none exist. Here, the null hypothesis should be that there is
spatial auto-correlation and the alternative hypothesis that there are geographical differences in
the risk of food poisoning.
On the other hand, if we are interested in quickly detecting food poisoning outbreaks, we should
not adjust for the spatial auto-correlation since we are interested in detecting clusters due to such
correlation, and if they are adjusted away, important clusters may go undetected. Here, the null
hypothesis is that the food poisoning cases are geographically randomly distributed (adjusted for
population density etc.) and the alternative hypothesis is that there is some clustering either due to
differences in underlying risk factors or spatial auto-correlation. Once the location of a cluster has
been detected, it is for the local health officials to determine the source of the cluster to prevent
further illness.
88
20. If there are multiple clusters in the data, does that mean that the p-values are more likely to be
significant than their 0.05 nominal significance level suggests, so that chance clusters are
detected too often?
No. The opposite is actually true. Looking at United States mortality, suppose we have 1000
cases of a disease in Seattle and 30 in New York City. Seattle is clearly a significant cluster but
30 cases in New York City out of 1030 in all of the USA is not exceptional since the City has
about 3 percent of the U.S. population. If we accept that there is a cluster in Seattle though, and if
we adjust for that by removing Seattle from the analysis, then 30 cases in the City out of 30
nationwide is statistically significant. This is similar to a regular multiple regression, where if we
adjust for one variable, another variable may suddenly become statistically significant. Note that
the opposite is also true. If we remove an area with significantly fewer cases than expected, than a
significant cluster with an excess number of cases may become non-significant.
21. For count data, the spatial scan statistic uses a particular alternative hypothesis with an excess
risk in a circular cluster, where the number of cases follows a Poisson or Bernoulli distribution.
Does this mean that it can only be used to detect such alternative hypotheses?
Many proposed and widely used test statistics do not specify an alternative hypothesis at all. This
neither means that they cannot be used for any alternative hypotheses nor that they are good for
all alternatives. Likewise, if an explicit alternative is defined, as with the spatial scan statistic, that
does not mean that it cannot be used for other alternative hypotheses as well. It is simply a
question of the test statistic having good power for some alternative hypotheses and low power
for other. The advantage of having a well-specified alternative is that it gives some information
about the alternatives for which the test can be expected to have good power.
22. For the exponential (normal) model, it is assumed that the survival times follow an exponential
(normal) distribution. Are the results biased if the survival times follow a different distribution?
No matter which distribution generated the survival times, the p-values from the statistical
inference are still valid and unbiased. This is because rather than generating the random data from
an exponential distribution, each random data is a spatial permutation of the survival times. A
greatly misspecified distribution may lead to a loss in power though. For example, if the data is
Bernoulli distributed, the exponential model has less power to detect a cluster than the Bernoulli
model. For continuous distributions such as gamma and lognormal, the exponential model has
been shown to work well. The same reasoning is true with respect to the normal model.
Operating Systems
23. Is SaTScan available for Windows/Mac/Linux?
The SaTScan software for Windows, Mac and Linux can be downloaded from the www.satscan.org web
site.
89
SaTScan Bibliography
Different SaTScan analysis options were developed at different times and they are described in different
scientific publications. The following bibliography contains selected papers and reports intended to help
you find information on the following:
1. Find the methodological paper(s) in which the various analysis options are presented and
discussed in more detail than what is available here in the SaTScan User Guide.
2. Find applications in different scientific areas.
3. Determine the relevant scientific papers to cite.
Suggested Citations
The SaTScan software may be used freely, with the requirement that proper references are provided to the
scientific papers describing the statistical methods. For the most common analyses, the suggested
citations are:
Bernoulli, Discrete Poisson and Continuous Poisson Models: Kulldorff M. A spatial scan statistic.
Communications in Statistics: Theory and Methods, 26:1481-1496, 1997. [online]
Space-Time Permutation Model: Kulldorff M, Heffernan R, Hartman J, Assuno RM, Mostashari F. A
space-time permutation scan statistic for the early detection of disease outbreaks. PLoS Medicine, 2:216224, 2005. [online]
Multinomial Model: Jung I, Kulldorff M, Richard OJ. A spatial scan statistic for multinomial data.
Statistics in Medicine, 2010, epub. [online]
Ordinal Model: Jung I, Kulldorff M, Klassen A. A spatial scan statistic for ordinal data. Statistics in
Medicine, 2007; 26:1594-1607. [online]
Exponential Model: Huang L, Kulldorff M, Gregorio D. A spatial scan statistic for survival data.
Biometrics, 2007; 63:109-118. [online]
Normal Model without Weights: Kulldorff M, Huang L, Konty K. A scan statistic for continuous data
based on the normal probability model. International Journal of Health Geographics, 2009, 8:58. [online]
Normal Model with Weights: Huang L, Huang L, Tiwari R, Zuo J, Kulldorff M, Feuer E. Weighted
normal spatial scan statistic for heterogeneous population data. Journal of the American Statistical
Association, 2009, 104:886-898. [online]
Software: Kulldorff M. and Information Management Services, Inc. SaTScanTM v8.0: Software for the
spatial and space-time scan statistics. http://www.satscan.org/, 2009.
Users of SaTScan should in any reference to the software note that: SaTScanTM is a trademark of Martin
Kulldorff. The SaTScanTM software was developed under the joint auspices of (i) Martin Kulldorff, (ii)
the National Cancer Institute, and (iii) Farzad Mostashari of the New York City Department of Health and
Mental Hygiene.
Related Topics: SaTScan Bibliography, Methodological Papers.
90
91
Adjustments
Adjusting for Covariates
References [1] and [8] above, plus:
21. Kulldorff M, Feuer EJ, Miller BA, Freedman LS. Breast cancer in northeastern United States: A
geographical analysis. American Journal of Epidemiology, 146:161-170, 1997. [online]
22. Kleinman K, Abrams A, Kulldorff M, Platt R. A model-adjusted space-time scan statistic with an
application to syndromic surveillance. Epidemiology and Infection, 2005, 133:409-419.
23. Klassen A, Kulldorff M, Curriero F. Geographical clustering of prostate cancer grade and stage at
diagnosis, before and after adjustment for risk factors. International Journal of Health Geographics,
2005, 4:1. [online]
Iterative Scan Statistics, Adjusting for More Likely Clusters
24. Zhang Z, Kulldorff M, Assuno R. Spatial scan statistics adjusted for multiple clusters. Journal of
Probability and Statistics, 2010, 642379.
92
Computational Aspects
Algorithms
25. Kulldorff M. Spatial scan statistics: Models, calculations and applications. In Balakrishnan and Glaz
(eds), Recent Advances on Scan Statistics and Applications. Boston, USA: Birkhuser, 1999. [online]
Random Number Generator
26. Lehmer DH. Mathematical methods in large-scale computing units. In Proceedings of the second
symposium on large scale digital computing machinery. Cambridge, USA: Harvard Univ. Press,
1951.
27. Park SK, Miller KW. Random number generators: Good ones are hard to find. Communications of
the ACM, 31:1192-1201, 1988.
Macros
28. Abrams AM, Kleinman KP. A SaTScan (TM) macro accessory for cartography (SMAC) package
implemented with SAS (R) software. International Journal of Health Geographics, 6:6,2007. [online]
93
38. Tango T, Takahashi K. A flexibly shaped spatial scan statistic for detecting clusters. International
Journal of Health Geographics, 4:11, 2005. [online]
39. Kulldorff M, Song C, Gregorio D, Samociuk H, DeChello L. Cancer map patterns: Are they random
or not? American Journal of Preventive Medicine, 30:S37-49, 2006. [online]
40. Duczmal L, Kulldorff M, Huang L. Evaluation of spatial scan statistics for irregular shaped clusters.
Journal of Computational and Graphical Statistics, 15:428-442, 2006.
41. Aamodt G, Samuelsen SO, Skrondal A. A simulation study of three methods for detecting disease
clusters. International Journal of Health Geographics, 5:15, 2006. [online]
42. Jackson MC, Huang L, Luo J, Hachey M, Feuer E. Comparison of tests for spatial heterogeneity on
data with global clustering patterns and outliers. International Journal of Health Geographics.
2009;8:55. [online]
43. Wheeler DC. A comparison of spatial clustering and cluster detection techniques for childhood
leukemia incidence in Ohio, 19962003. International Journal of Health Geographics. 2007;6:13.
[online]
44. Goujon-Bellec S, Demoury C, Guyot-Goubin A, Hmon D, Clavel J. Detection of clusters of a rare
disease over a large territory: performance of cluster detection methods. International Journal of
Health Geographics. 2011:10:53. [online]
Related Topics: SaTScan Bibliography, Selected Applications by Field of Study, Suggested Citation.
94
52. Chen JH, Weng C, Chnag HG. Using space-time scan statistic to detect pertussis and shigellosis
outbreaks. CSTE Annual Conference, 2013. [online]
53. Nana Yakam A, Noeske J, Dambach P. Spatial analysis of tuberculosis in Douala, Cameroon:
clustering and links with socio-economic status. The International Journal of Tuberculosis and Lung
Disease, 18:292-297, 2014.
54. Shea KM, Kammerer JS, Winston CA, Navin TR, Horsburgh CR. Estimated rate of reactivation of
latent tuberculosis infection in the United States, overall and by population subgroup. American
Journal of Epidemiology, 179:216-25, 2014.
55. Souris M, Selenic D, Khaklang S, Ninphanomchai S, Minet G, Gonzalez JP, Kittayapong P Poultry
farm vulnerability and risk of avian influenza re-emergence in Thailand. International Journal of
Environmental Research and Public Health, 11:934-951, 2014. [online]
95
66. Oviedo M, Munoz P, Dominguez A, Carmona G, Batalla J, Borras E, Jans JM. Evaluation of Mass
Vaccination Programmes: The experience of hepatitis A in Catalonia (in Spanish). Revista Espaola
de Salud Pblica, 83:697-709, 2009. [online]
67. Luquero FJ, Banga CN, Remartnez D, Palma PP, Baron E, Grais RF. Cholera epidemic in GuineaBissau (2008): the importance of "place". PLoS One, 6:e19005, 2011. [online]
68. Bompangue Nkoko D, Giraudoux P, Plisnier PD, Tinda AM, Piarroux M, Sudre B, Horion S,
Tamfum JJ, Ilunga BK, Piarroux R. Dynamics of cholera outbreaks in Great Lakes region of Africa,
1978-2008. Emerging Infectious Diseases, 17:2026-2034, 2011. [online]
69. Tang F, Cheng Y, Bao C, Hu J, Liu W, Liang W, Wu Y, Norris J, Peng Z, Yu R, Shen H, Chen F.
Spatio-temporal trends and risk factors for Shigella from 2001 to 2011 in Jiangsu province, Peoples
Republic of China. PLoS ONE, 9:e83487, 2014. [online]
70. Chan TC, Hwang JS, Chen RH, King CC, Chiang PH. Spatio-temporal analysis on enterovirus cases
through integrated surveillance in Taiwan. BMC Public Health, 14:11, 2014. [online]
71. Briggs ADM, Boxall NS, van Santen D, Chalmers RM, McCarthy ND. Approaches to the detection
of very small, common, and easily missed outbreaks that together contribute substantially to human
Cryptosporidium infection. Epidemiology and Infection, epub, 2014.
72. Wang J, Cao Z, Zeng DD, Wang Q, Wang X, Qian H. Epidemiological Analysis, Detection, and
Comparison of Space-Time Patterns of Beijing Hand-Foot-Mouth Disease (20082012). PLoS One,
9:e2745, 2014. [online]
73. Vias M, Tuduri E, Galar A, Yih WK, Pichel M, Stelling J, Brengi S, Della Gaspera A, van der Ploeg
C, Bruno S, Rog A, Caffer M,, Kulldorff M, Galas M. Laboratory-based prospective surveillance for
community outbreaks of Shigella spp. in Argentina. PLoS Neglected Tropical Diseases, 2013,
7:e2521, 2013.
96
81. Cuadros DF, Abu-Raddad LJ. Spatial variability in HIV prevalence declines in several countries in
sub-Saharan Africa. Health and Place, epub, 2014.
97
96. Mirghani SE, Nour BY, Bushra SM, Elhassan IM, Snow RW, Noor AM. The spatial-temporal
clustering of Plasmodium falciparum infection over eleven years in Gezira State, The Sudan. Malaria
Journal, 9:172, 2010. [online]
97. Haque U, Sunahara T, Hashizume M, Shields T, Yamamoto T, Haque R, Glass GE. Malaria
prevalence, risk factors and spatial distribution in a hilly forest area of Bangladesh. PLoS ONE 6(4):
e18908, 2011. [online]
98. Schmidt W-P, Suzuki M, Dinh Thiem V, White RG, Tsuzuki A, Yoshida LM, Yanai H, Haque U,
Huu Tho L, Duc Anh D,Ariyoshi K. Population Density, Water Supply, and the Risk of Dengue Fever
in Vietnam: Cohort Study and Spatial Analysis. PLoS Medicine, 8:8, e1001082, 2011. [online]
99. Winskill P, Rowland M, Mtove G, Malima RC, Kirby MJ. Malaria risk factors in north-east
Tanzania. Malaria Journal 10:98, 2011. [online]
100. Washington CH, Radday J, Streit TG, Boyd HA, Beach MJ, Addiss DG, Lovince R, Lovegrove MC,
Lafontant JG, Lammie PJ, Hightower AW. Spatial clustering of filarial transmission before and after
a mass drug administration in a setting of low infection prevalence. Filaria Journal, 3: 3, 2004.
[online]
101. Bhattarai NR, Van der Auwera G, Rijal S, Picado A, Speybroeck N, Khanal B, De Doncker S, Lal
Das M, Ostyn B, Davies C, Coosemans M, Berkvens D, Boelaert M, Dujardin JC. Domestic animals
and epidemiology of visceral leishmaniasis, Nepal. Emerging Infectious Diseases, 16:231-237, 2010.
[online]
102. Cook J, Kleinschmidt I, Schwabe C, Nseng G, Bousema T, Corran PH, Riley EM, Drakeley CJ.
Serological markers suggest heterogeneity of effectiveness of malaria control interventions on Bioko
Island, Equatorial Guinea. PLoS One, 6:e25137, 2011. [online]
103. Nourein AB, Abass MA, Nugud AH, El Hassan I, Snow RW, Noor AM. Identifying residual foci of
Plasmodium falciparum infections for malaria elimination: the urban context of Khartoum, Sudan.
PLoS One, 6:e16948, 2011. [online]
104. Rochlin I, Turbow D, Gomez F, Ninivaggi DV, Campbell SR. Predictive mapping of human risk for
West Nile virus (WNV) based on environmental and socioeconomic factors. PLoS One. 6:e23280,
2011. [online]
105. Impoinvil DE, Solomon T, Schluter WW, Rayamajhi A, Bichha RP, Shakya G, Caminade C, Baylis
M. The spatial heterogeneity between Japanese encephalitis incidence distribution and environmental
variables in Nepal. PLoS One, 6:e22192, 2011. [online]
106. Bejon P, Turner L, Lavstsen T, Cham G, Olotu A, Drakeley CJ, Lievens M, Vekemans J, Savarese
B, Lusingu J, von Seidlein L, Bull PC, Marsh K, Theander TG. Serological evidence of discrete
spatial clusters of Plasmodium falciparum parasites. PLoS One, 6:e21711, 2011. [online]
107. Jones SG, Conner W, Song B, Gordon D, Jayakaran A. Comparing spatio-temporal clusters of
arthropod-borne infections using administrative medical claims and state reported surveillance data.
Spatial and Spatio-Temporal Epidemiology, 2012.
108. Sindato C, Karimuribo ED, Pfeiffer DU, Mboera LE, Kivaria F, Dautu G, Bernadrd B, Paweska JT.
Spatial and temporal pattern of Rift Valley fever outbreaks in Tanzania; 1930 to 2007. PloS One,
9(2), e88897, 2014. [online]
109. Lal A, Hales S. Heterogeneity in hotspots: spatio-temporal patterns in neglected parasitic diseases.
Epidemiology and infection, epub, 2014.
98
110. Bejon P, Williams TN, Nyundo C, Hay SI, Benz D. Gething PW, Otiende M, Peshu J, Bashraheil
M, Greenhouse B, Bousema T, Bauni E, Marsh K, Smith DL, Borrmann S. A micro-epidemiological
analysis of febrile malaria in Coastal Kenya showing hotspots within hotspots. eLife, epub, 2014.
111. Mosha JF, Sturrock HJ, Greenwood B, Sutherland CJ, Gadalla NB, Atwal S, Hemelaar S, Brown
JM, Drakeley C, Kibiki G, Bousema T, Chandramohan D, Gosling RD. Hot spot or not: a
comparison of spatial statistical methods to predict prospective malaria infections. Malaria Journal,
13:53, 2014. [online]
112. Carson C, Lavender CJ, Handasyde KA, O'Brien CR, Hewitt N, Johnson PD, Fyfe JA. Potential
wildlife sentinels for monitoring the endemic spread of human buruli ulcer in South-East australia.
PLoS Neglected Tropical Diseases, 8:e2668, 2014. [online]
113. Delgado-Ratto C, Soto-Calle VE, Van den Eede P, Gamboa D, Rosas A, Abatih EN, Rodriguez
Ferrucci H, Llanos-Cuentas A, Van Geertruyden JP, Erhart A, D'Alessandro U. Population structure
and spatio-temporal transmission dynamics of Plasmodium vivax after radical cure treatment in a
rural village of the Peruvian Amazon. Malaria Journal, 13:8, 2014. [online]
114. Kracalik I, Malania L, Tsertsvadze N, Manvelyan J, Bakanidze L, Imnadze P, Tsanava S, Blackburn
JK. Human cutaneous anthrax, Georgia 2010-2012. Emerging Infectious Diseases, 20:261-264,
2014. [online]
115. Liu C, Liu Q, Lin H, Xin B, Nie J. Spatial analysis of dengue fever in Guangdong Province, China,
2001-2006. Asia Pacific Journal of Public Health, 26:58-66, 2014.
116. Mulatti P, Mazzucato M, Montarsi F, Ciocchetta S, Capelli G, Bonfanti L, Marangon S.
Retrospective space-time analysis methods to support West Nile virus surveillance activities.
Epidemiology and Infection, epub, 2014.
117. Mollalo A, Alimohammadi A, Shirzadi MR, Malek MR. Geographic Information System-Based
Analysis of the Spatial and Spatio-Temporal Distribution of Zoonotic Cutaneous Leishmaniasis in
Golestan Province, North-East of Iran. Zoonoses and Public Health., epub, 2014.
99
123. Dreesman J, Scharlach H. Spatial-statistical analysis of infectious disease notification data in Lower
Saxony. Gesundheitswesen, 66: 783-789, 2004.
124. Polack SR, Solomon AW, Alexander NDE, Massae PA, Safari S, Shao JF, Foster A, Mabey DC.
The household distribution of trachoma in a Tanzanian village: an application of GIS to the study of
trachoma. Transactions of the Royal Society of Tropical Medicine and Hygiene, 99: 218-225, 2005.
Antimicrobial Resistance
125. Ghosh AN, Bhatta DR, Ansari MT, Tiwari HK, Mathuria JP, Gaur A, Supram HS, Gokhale S.
Application of WHONET in the Antimicrobial Resistance Surveillance of Uropathogens: A First
User Experience from Nepal. Journal of Clinical and Diagnostic Research, 7:845-848,2013. [online]
Syndromic Surveillance
132. Minnesota Department of Health. Syndromic Surveillance: A New Tool to Detect Disease
Outbreaks. Disease Control Newsletter, 32:16-17, 2004. [online]
133. Kleinman K, Abrams A, Kulldorff M, Platt R. A model-adjusted space-time scan statistic with an
application to syndromic surveillance. Epidemiology and Infection, 2005, 133:409-419.
134. Nordin JD, Goodman MJ, Kulldorff M, Ritzwoller DP, Abrams AM, Kleinman K, Levitt MJ,
Donahue J, Platt R. Simulated anthrax attacks and syndromic surveillance. Emerging Infectious
Diseases, 2005, 11:1394-98. [online]
135. Besculides M, Heffernan R, Mostashari F, Weiss D. Evaluation of school absenteeism data for early
outbreak detection, New York City. BMC Public Health, 5:105, 2006. [online]
100
136. Horst MA, Coco AS. Observing the spread of common illnesses through a community: Using
geographic information systems (GIS) for surveillance. Journal of the American Board of Family
Medicine, 23:32-41, 2010. [online]
137. van den Wijngaard CC, van Asten L, van Pelt W, Doornbos G, Nagelkerke NJ, Donker GA, van der
Hoek W, Koopmans MP. Syndromic surveillance for local outbreaks of lower-respiratory infections:
would it work? PLoS One, 29;e10406, 2010. [online]
101
153. Han DW, Rogerson PA, Nie J, Bonner MR, Vena JE, Vito D, Muti P, Trevisan M, Edge SB,
Freudenheim JL. Geographic clustering of residence in early life and subsequent risk of breast
cancer (United States). Cancer Causes and Control, 15:921-929, 2004.
154. Campo J, Comber H, Gavin A T. All-Ireland Cancer Statistics 1998-2000. Northern Ireland Cancer
Registry / National Cancer Registry, 2004. [online]
155. Hayran M. Analyzing factors associated with cancer occurrence: A geographical systems approach.
Turkish Journal of Cancer, 34:67-70, 2004. [online]
156. Fukuda Y, Umezaki M, Nakamura K, Takano T. Variations in societal characteristics of spatial
disease clusters: examples of colon, lung and breast cancer in Japan. International Journal of Health
Geographics, 4:16, 2005. [online]
157. Ozonoff A, Webster T, Vieira V, Weinberg J, Ozonoff D, Aschengrau A. Cluster detection methods
applied to the Upper Cape Cod cancer data. Environmental Health: A Global Access Science Source,
4:19, 2005. [online]
158. DeChello LM, Sheehan TJ. The geographic distribution of melanoma incidence in Massachusetts,
adjusted for covariates. Int J Health Geogr. 2006;5:31 [online]
159. Gregorio DI, Samociuk H, DeChello L, Swede H. Effects of study area size on geographic
characterizations of health events: prostate cancer incidence in Southern New England, USA, 1994
1998. Int J Health Geogr. 5:8, 2006. [online]
160. Chen Y, Yi Q, Mao Y. Cluster of liver cancer and immigration: a geographic analysis of incidence
data for Ontario 19982002. Int J Health Geogr. 7:28, 2008. [online]
161. Lorenzo-Luaces Alvarez P, Guerra-Yi ME, Faes C, Galn Alvarez Y, Molenberghs G. Spatial
analysis of breast and cervical cancer incidence in small geographical areas in Cuba, 1999-2003.
European Journal of Cancer Prevention, 18:395-403, 2009.
162. Amin R, Bohnert A, Holmes L, Rajasekaran A, Assanasen C. Epidemiologic mapping of Florida
childhood cancer clusters. Pediatric Blood Cancer, 54:511-518, 2010.
163. Liu-Mares W, MacKinnon JA, ShermanR, Fleming LE, Rocha-Lima C, Hu, JJ, Lee DJ. Pancreatic
cancer clusters and arsenic-contaminated drinking water wells in Florida. BMC Cancer, 13, 111,
2013. [online]
164. Baastrup Nordsborg R, Meliker JR, Kjr Ersbll A, Jacquez GM, Raaschou-Nielsen O. Space-Time
Clustering of Non-Hodgkin Lymphoma Using Residential Histories in a Danish Case-Control Study.
PLoS One, 8, e60800, 2013. [online]
165. Amin R, Hendryx M, Shull M, Bohnert A. A cluster analysis of pediatric cancer incidence rates in
Florida: 2000-2010. Statistics in Public Policy, 1:69-77, 2014. [online]
102
169. Klassen A, Curriero F, Kulldorff M, Alberg AJ, Platz EA, Neloms ST. Missing stage and grade in
Maryland prostate cancer surveillance data, 1992-1997. American Journal of Preventive Medicine,
30:S77-87, 2006. [online]
170. Pollack LA, Gotway CA, Bates JH, Parikh-Patel A, Richards TB, Seeff LC, Hodges H, Kassim S.
Use of the spatial scan statistic to identify geographic variations in late stage colorectal cancer in
California (United States). Cancer Causes and Control, 17:449457, 2006.
171. DeChello LM, Sheehan TJ. Spatial analysis of colorectal cancer incidence and proportion of latestage in Massachusetts residents: 19951998. Int J Health Geogr. 2007;6:20. [online]
Cardiovascular Diseases
175. Kuehl KS, Loffredo CA. A cluster of hypoplastic left heart malformation in Baltimore, Maryland
Pediatric Cardiology, 27:25-31, 2006.
176. Li XY, Chen K. Scan statistic theory and its application in spatial epidemiology (in Chinese).
Zhonghua Liu Xing Bing Xue Za Zhi., 29:828-31, 2008.
Liver Diseases
181. Ala A, Stanca CM, Bu-Ghanim M, Ahmado I, Branch AD, Schiano TD, Odin JA, Bach N. Increased
prevalence of primary biliary cirrhosis near superfund toxic waste sites. Hepatology, 43:525-531,
2006.
182. Stanca CM, Babar J, Singal V, Ozdenerol E, Odin JA. Pathogenic role of environmental toxins in
immune-mediated liver diseases. Journal of Immunotoxicology, 5:59-68, 2008.
103
183. McNally RJQ, Ducker S, James OFW. Are Transient Environmental Agents Involved in the Cause
of Primary Biliary Cirrhosis? Evidence from Space-Time Clustering Analysis. Hepatology, 50:11691174, 2009.
Diabetes
184. Green C, Hoppa RD, Young TK, Blanchard JF. Geographic analysis of diabetes prevalence in an
urban area. Social Science and Medicine, 57:551-560, 2003.
185. Aamodt G, Stene LC, Njlstad PR, Svik O, Joner G, for the Norwegian Childhood Diabetes Study
Group. Spatiotemporal trends and age-period-cohort modelling of the incidence of type 1 diabetes
among children ages <15 years in Norway 1973-1982 and 1989-2003. Diabetes Care, 30:884-889,
2007.
Neurological Diseases
189. Sabel CE, Boyle PJ, Lytnen M, Gatrell AC, Jokelainen M, Flowerdew R, Maasilta P. Spatial
clustering of amyotrophic lateral sclerosis in Finland at place of birth and place of death. American
Journal of Epidemiology, 157: 898-905, 2003.
104
197. Grady SC, Enander H. Geographic analysis of low birthweight and infant mortality in Michigan
using automated zoning methodology International Journal of Health Geographics 2009, 8:10.
[online]
Pediatrics
198. George M, Wiklund L, Aastrup M, Pousette J, Thunholm B, Saldeen T, Wernroth L, Zaren B,
Holmberg L. Incidence and geographical distribution of sudden infant death syndrome in relation to
content of nitrate in drinking water and groundwater levels. European Journal of Clinical
Investigation, 31: 1083-1094, 2001.
199. Sankoh OA, Ye Y, Sauerborn R, Muller O, Becher H. Clustering of childhood mortality in rural
Burkina Faso. International Journal of Epidemiology, 30:485-492, 2001. [online]
200. Ali M, Asefaw T, Byass P, Beyene H, Karup Pedersen F. Helping northern Ethiopian communities
reduce childhood mortality: population-based intervention trial. Bulletin of the World Health
Organization. 83:27-33, 2005. [online]
201. Awini E, Mattah P, Sankoh O, Gyapong M. Spatial variations in childhood mortalities at the
Dodowa Health and Demographic Surveillance System site of the INDEPTH Network in Ghana.
Tropical Medicine and International Health, 2010.
Geriatrics
202. Yiannakoulias N, Rowe BH, Svenson LW, Schopflocher DP, Kelly K, Voaklander DC. Zones of
prevention: the geography of fall injuries in the elderly. Social Science and Medicine, 57:2065-73,
2003.
203. Vaneckova P, Beggs PJ, Jacobson CR. Spatial analysis of heat-related mortality among the elderly
between 1993 and 2004 in Sydney, Australia. Social Science and Medicine, 70:293-304, 2010.
Psychology
204. Margai F, Henry N. A community-based assessment of learning disabilities using environmental and
contextual risk factors. Social Science and Medicine, 56: 1073-1085, 2003.
Brain Imaging
205. Yoshida M, Naya Y, Miyashita Y. Anatomical organization of forward fiber projections from area
TE to perirhinal neurons representing visual long-term memory in monkeys. Proceedings of the
National Academy of Sciences of the United States of America, 100:4257-4262, 2003. [online]
105
209. Penfold RB, Wang W, Pajer K, Strange B, Kelleher KJ. Spatio-temporal clusters of new
psychotropic medications among Michigan children insured by Medicaid. Pharmacoepidemiology
and Drug Safety, 18: 531539, 2009.
210. Brownstein JS, Green TC, Cassidy TA, Butler SF. Geographic information systems and
pharmacoepidemiology: using spatial cluster detection to monitor local patterns of prescription
opioid abuse. Pharmacoepidemiology and Drug Safety, 19:627-637, 2010.
211. King M, Essic C. The geography of antidepressant, antipsychotic, and stimulant utilization in the
United States, Health and Place, 20:32-38, 2013.
212. Atwell JE, Van Otterloo J, Zipprich J, Winter K, Harriman K, Salmon DA, Halsey NA, Omer SB.
Nonmedical vaccine exemptions and pertussis in California 2010. Pediatrics, 132:624-630, 2013.
[online]
Obesity
216. Dahly D. Obesity clustering in Cebu, Philippines: an application of satscan and the spatial scan
statistic Journal of Epidemiology and Community Health, 65, A71, 2011. [online]
217. Chalkias C, Papadopoulos AG, Mpenekos G, Tambalis K, Psarra G, Sidossis L. Spatial variability of
childhood obesity in respeose to socioeconomic heterogenety. The case of Athens metropolitan area,
Greece. Proceedings of the 17th European Colloquium on Quantitative and Theoretical Geography,
605-611, 2011. [online]
106
222. Warden R. Comparison of Poisson and Bernoulli spatial cluster analyses of pediatric injuries in a fire
district. International Journal of Health Geographics, 7:51, 2008. [online]
223. Mesoudi A. The cultural dynamics of copycat suicide. PLoS One, 4:e7252, 2009. [online]
224. Dey AN, Hicks P, Benoit S, Tokars JI. Automated monitoring of clusters of falls associated with
severe winter weather using the BioSense system. Injury Prevention, 16:403-407, 2010.
225. Saman DM, Cole HP, Odoi A, Myers ML, Carey DI, Westneat SC. A spatial cluster analysis of
tractor overturns in Kentucky from 1960 to 2002.PLoS One, 7:e30532, 2012. [online]
226. Amin R, Ritter EK, Cossette L. A Geospatial Analysis of Shark Attack Rates for the Coast of
California: 19942010. Journal of Environment and Ecology, 3:246-255, 2012.
227. Fuchs S, Ornetsmller C, Totschnig R. Spatial scan statistics in vulnerability assessment an
application to mountain hazards. Natural Hazards 64:2129-2151, 2012.
228. Campo J. Firearm deaths in Washington State. Washington State Health Services Research Brief No.
71, 2013. [online]
229. Jones P, Gunnell D, Platt S, Scourfield J, Lloyd K, Huxley P, John A, Kamran B, Wells C, Dennis
M. Identifying Probable Suicide Clusters in Wales Using National Mortality Data, PLoS One,
8:e71713, 2013 [online]
Demography
230. Collado Chaves A. Fecundidad adolescente en el gran rea metropolitana de Costa Rica. Poblacin y
Salud en Mesoamrica, 1:4, 2003. [online]
107
240. Berke O, Grosse Beilage E. Spatial relative risk mapping of pseudorabies-seropositive pig herds in
an animal-dense region. Journal of Veterinary Medicine, B50: 322325, 2003.
241. Abrial D, Calavas D, Lauvergne N, Morignat E, Ducrot C. Descriptive spatial analysis of BSE in
western France. Veterinary Research, 34:749-60, 2003.
242. Sheridan HA, McGrath G, White P, Fallon R, Shoukri MM, Martin SW. A temporal-spatial analysis
of bovine spongiform encephalopathy in Irish cattle herds, from 1996 to 2000. Canadian Journal of
Veterinary Research, 69:19-25, 2005. [online]
243. Guerin MT, Martin SW, Darlington GA, Rajic A. A temporal study of Salmonella serovars in
animals in Alberta between 1990 and 2001. Canadian Journal of Veterinary Research, 69:88-89,
2005. [online]
244. Allepuz A, Lpez-Qulez A, Forte A, Fernndez G, Casal J. Spatial analysis of bovine spongiform
encephalopathy in Galicia, Spain (2000-2005). Preventive Veterinary Medicine, 79:174-85, 2007.
245. Heres L, Brus DJ, Hagenaars TJ. Spatial analysis of BSE cases in the Netherlands. BMC Veterinary
Research, 4:21, 2008. [online]
246. Frossling J, Nodtvedt A, Lindberg A, Bjrkman C. Spatial analysis of Neospora caninum
distribution in dairy cattle from Sweden. Geospatial Health, 3:39-45, 2008.
108
Mammalogy
255. Webb NF, Hebblewhite M, Merrill EH. Statistical Methods for Identifying Wolf Kill Sites Using
Global Positioning System Locations. Journal of Wildlife Management, 2008, 72, 798-807.
256. McPhee HM, Webb NF, Merrill EH. Hierarchical predation: Wolf (Canis lupus) selection along hunt
paths and at kill sites. Canadian Journal of Zoology, 2012, 90:555-563.
257. Ouko EO. Where, when and why are there elephant poaching hotspots in Kenya? MSc Thesis,
University of Twente, Netherlands, 2013. [online]
Entomology
258. Porcasi X, Catal SS, Hrellac H, Scavuzzo MC, Gorla DE. Infestation of Rural Houses by Triatoma
Infestans (Hemiptera: Reduviidae) in Southern Area of Gran Chaco in Argentina. Journal of Medical
Entomology, 43:1060-1067, 2006.
Ichthyology
259. Spindler BD, Chipps SR, Klumb RA, Wimberly MC. Spatial analysis of pallid sturgeon
Scaphirhynchus albus distribution in the Missouri River, South Dakota. Journal of Applied
Ichthyology, 25:8-13, 2009.
Botany
260. Bayon C, Pei MH, Ruiz C, Hunter T. Genetic structure and spatial distribution of the mycoparasite
Sphaerellopsis filum on Melampsora larici-epitea in a short-rotation coppice willow planting. Plant
Pathology, 56:616-623, 2007.
Forestry
261. Coulston JW, Riitters KH. Geographic analysis of forest health indicators Using Spatial Scan
Statistics. Environmental Management, 31: 764-773, 2003.
262. Riitters KH, Coulston JW. Hot spots of perforated forest in the eastern United States. Environmental
Management, 35:483-492, 2005.
263. Tuia D, Ratle F, Lasaponara R, Telesca L, Kanevski M. Scan statistics analysis of forest fire clusters.
Communications in Nonlinear Sciences and Numerical Simulations, 13:1689-94, 2008.
264. Tonini M, Tuia D, Ratle F. Detection of clusters using spacetime scan statistics. International Journal of
Wildland Fire, 18 830836, 2009.
265. Fei S. Applying hotspot detection methods in forestry: A case study of Chestnut Oak regeneration.
International Journal of Forestry Research., 815292, 2010. [online]
266. Vega Orozco C, Tonini M, Conedera M, Kanveski M. Cluster recognition in spatial-temporal
sequences: the case of forest fires. Geoinformatica, 16: 653-673, 2012. [online]
Environment
267. Vadrevu KP. Analysis of fire events and controlling factors in eastern India using spatial scan and
multivariate statistics. Geografiska Annaler, 90A: 315-328, 2008.
109
268. Sudakin DL, Horowitz Z, Giffin S. Regional variation in the incidence of symptomatic pesticide
exposures: Applications of geographic information systems. Journal of Toxicology - Clinical
Toxicology, 40:767-773, 2002.
269. Krolik J, Maier A, Evans G, Belanger P, Hall G, Joyce A, Majury A. A spatial analysis of private
well water Escherichia coli contamination in southern Ontario. Geospatial Health 8:65-75, 2013.
270. Krolik J, Evans G, Belanger P, Maier A, Hall G, Joyce A, Guimont S, Pelot A, Majury A. Microbial
source tracking and spatial analysis of E. coli contaminated private well waters in southeastern
Ontario. Journal of Water and Health, 12:348-357, 2014.
Geology
271. Gao J, Zhang Z, Hu Y, Bian J, Jiang W, Wang X. Geographical distribution patterns of iodine in
drinking-water and its associations with geological factors in Shandong Province, China. [online]
Natural Disasters
272. Witham CS, Oppenheimer C. Mortality in England during the 1783-4 Laki Craters eruption. Bulletin
of Volcanology, 67:15-25, 2004.
273. Stevenson JR Emrich CT Mitchell JT, Cutter SL. Using building permits to monitor disaster
recovery: A spatio-temporal case study of coastal Mississippi following hurricane Katrina,
Cartography and Geographic Information Science, 37:S57-68, 2010.
War
274. Ziemke J. From battles to massacres. 3rd Annual Harvard-Yale-MIT Graduate Student Conference
on Order, Conflict and Violence, 2008. [online]
275. O'Loughlin J, Witmer F, Linke A. The Afghanistan-Pakistan Wars 20082009: Micro-geographies,
Conflict Diffusion, and Clusters of Violence. Eurasian Geography and Economics, 2010, 51, 437-71.
[online]
276. OLoughlin J, Witmer FDW, Linke AM, Thorwardson N. Peering into the Fog of War: The
Geography of the WikiLeaks Afghanistan War Logs, 20042009. Eurasian Geography and
Economics, 51:472495, 2010. [online]
277. O'Loughlin J, Witmer FDW, The Localized Geographies of Violence in the North Caucasus of
Russia, 1999-2007', Annals of the Association of American Geographers, 101: 178-201, 2011.
[online]
Criminology
278. Jefferis ES. A multi-method exploration of crime hot spots: SaTScan results. National Institute of
Justice, Crime Mapping Research Center, 1998.
279. Kaminski RJ, Jefferis ES, Chanhatasilpa C. A spatial analysis of American police killed in the line
of duty. In Turnbull et al. (eds.), Atlas of crime: Mapping the criminal landscape. Phoenix, AZ: Oryx
Press, 2000.
280. LeBeau JL. Demonstrating the analytical utility of GIS for police operations: A final report. National
Criminal Justice Reference Service, 2000. [online]
110
281. Beato Filho CC, Assuno RM, Silva BF, Marinho FC, Reis IA, Almeida MC. Homicide clusters
and drug traffic in Belo Horizonte, Minas Gerais, Brazil from 1995 to 1999. Cadernos de Sade
Pblica, 17:1163-1171, 2001. [online]
282. Ceccato V, Haining R. Crime in border regions: The Scandinavian case of resund, 1998-2001.
Annals of the Association of American Geographers, 94:807-826, 2004.
283. Ceccato V. Homicide in Sao Paulo, Brazil: Assessing the spatial-temporal and weather variations.
Journal of Environmental Psychology, 25:307-321, 2005.
284. Minamisava R, Nouer SS, de Morais Neto OL, Melo LK, Andrade ALS. Spatial clusters of violent
deaths in a newly urbanized region of Brazil: Highlighting the social disparities. International
Journal of Health Geographics, 8:66, 2009. [online]
285. Nakaya T, Yano K. Visualising crime clusters in a space-time cube: An exploratory data-analysis
approach using space-time kernel density estimation and scan statistics. Transactions in GIS, 14:223239, 2010.
286. Leitner M, Helbich M. The Impact of Hurricanes on Crime: A Spatio-temporal Analysis in the City
of Houston, TX. Cartography and Geographic Information Science, 37:214-222, 2011.
287. Zeoli AM, Pizarro JM, Grady SC, Melde C. Homicide as infectious disease: Using public health
methods to investigate the diffusion of homicide. Justice Quarterly, 31:609-632, 2014.
Architecture
290. Kaza N, Lester TW, Rodriguez DA. The spatio-temporal clustering of green buildings in the United
States, Urban Studies, 50:3262-3282, 2013. [online]
Astronomy
293. Marcos RDLF, Marcos CDLF. From star complexes to the field: Open cluster families, 672:342351, 2008.
294. Bidin CM, Marcos RD, Marcos CD, Carraro, G. Not an open cluster after all: the NGC 6863
asterism in Aquila. Astronomy and Astrophysics, 510:A44, 2010.[online]
Related Topics: Methodological Papers, SaTScan Bibliography, Suggested Citation.
111
112
315. Ranta J, Pitkniemi J, Karvonen M, et al. Detection of overall space-time clustering in non-uniformly
distributed population. Statistics in Medicine, 15:2561-2572, 1996.
316. Rushton G, Lolonis P. Exploratory Spatial Analysis of Birth Defect Rates in an Urban Population.
Statistics in Medicine, 7:717-726, 1996.
317. Stone RA. Investigation of excess environmental risk around putative sources: statistical problems
and a proposed test. Statistics in Medicine, 7:649-660, 1988.
318. Tango T. A class of tests for detecting 'general' and 'focused' clustering of rare diseases. Statistics in
Medicine, 14:2323-2334, 1995.
319. Tango T. A test for spatial disease clustering adjusted for multiple testing. Statistics in Medicine,
19:191-204, 2000.
320. Turnbull B, Iwano EJ, Burnett WS, et al. Monitoring for clusters of disease: application to Leukemia
incidence in upstate New York. American Journal of Epidemiology, 132:S136-143, 1990.
321. Waller LA, Turnbull BW, Clark LC, Nasca P. Chronic disease surveillance and testing of clustering
of disease and exposure. Environmetrics, 3:281-300, 1992.
322. Walter SD. A simple test for spatial pattern in regional health data. Statistics in Medicine, 13:10371044, 1994.
323. Whittemore AS, Friend N, Brown BW, Holly EA. A test to detect clusters of disease. Biometrika,
74:631-635, 1987.
113