Bayesian model-based clustering is a widely applied procedure for discovering groups of related o... more Bayesian model-based clustering is a widely applied procedure for discovering groups of related observations in a dataset. These approaches use Bayesian mixture models, estimated with MCMC, which provide posterior samples of the model parameters and clustering partition. While inference on model parameters is well established, inference on the clustering partition is less developed. A new method is developed for estimating the optimal partition from the pairwise posterior similarity matrix generated by a Bayesian cluster model. This approach uses non-negative matrix factorization (NMF) to provide a low-rank approximation to the similarity matrix. The factorization permits hard or soft partitions and is shown to perform better than several popular alternatives under a variety of penalty functions.
Across the nation, researchers and transportation engineers are developing safety performance fun... more Across the nation, researchers and transportation engineers are developing safety performance functions (SPFs) to predict crash rates and develop crash modification factors to improve traffic safety at roadway segments and intersections. Generalized linear models (GLMs), such as Poisson or negative binomial regression, are most commonly used to develop SPFs with annual average daily traffic as the primary roadway characteristic to predict crashes. However, while more complex to interpret, data mining models such as boosted regression trees have improved upon GLMs crash prediction performance due to their ability to handle more data characteristics, accommodate non-linearities, and include interaction effects between the characteristics. An intersection data inventory of 36 safety relevant parameters for three-and four-legged non-signalized intersections along state routes in Alabama was used to study the importance of intersection characteristics on crash rate and the interaction effects between key characteristics. Four different SPFs were investigated and compared: Poisson regression, negative binomial regression, regularized generalized linear model, and boosted regression trees. The models did not agree on which intersection characteristics were most related to the crash rate. The boosted regression tree model significantly outperformed the other models and identified several intersection characteristics as having strong interaction effects.
The object of this paper is to develop a statistical approach to criminal linkage analysis that d... more The object of this paper is to develop a statistical approach to criminal linkage analysis that discovers and groups crime events that share a common offender and prioritizes suspects for further investigation. Bayes factors are used to describe the strength of evidence that two crimes are linked. Using concepts from agglomerative hierarchical clustering, the Bayes factors for crime pairs are combined to provide similarity measures for comparing two crime series. This facilitates crime series clustering, crime series identification, and suspect prioritization. The ability of our models to make correct linkages and predictions is demonstrated under a variety of real-world scenarios with a large number of solved and unsolved breaking and entering crimes. For example, a naive Bayes model for pairwise case linkage can identify 82% of actual linkages with a 5% false positive rate. For crime series identification, 74%-89% of the additional crimes in a crime series can be identified from a ranked list of 50 incidents.
Statistical clustering of criminal events can be used by crime analysts to create of lists of pot... more Statistical clustering of criminal events can be used by crime analysts to create of lists of potential suspects for an unsolved crime, identify groups of crimes that may have been committed by the same individuals or group of individuals, for offender profiling, and for predicting future events. In this paper, we propose a Bayesian model-based clustering approach for criminal events. Our approach is semi-supervised because the offender is known
for a subset of the events, and utilizes spatiotemporal crime locations as well as crime features describing the offender’s modus operandi. The hierarchical model naturally handles complex features often seen in crime data, including missing data, interval censored event times, and a mix of discrete and continuous variables. In addition, our Bayesian model produces posterior clustering probabilities which allow analysts to act on model output only as warranted. We illustrate the approach using a large data set of burglaries in 2009-2010 in Baltimore County, Maryland.
Purpose
Behavioural crime linkage is underpinned by two assumptions: (a) that offenders exhibit ... more Purpose
Behavioural crime linkage is underpinned by two assumptions: (a) that offenders exhibit some degree of consistency in the way they commit offences (their modus operandi [MO]); and, (b) that offenders can be differentiated on the basis of their offence behaviour. The majority of existing studies sample at most three crimes from an offender's series of detected crimes and do not examine whether patterns differ across offenders. Here, we examine patterns observed across the entire detected series of each sampled offender, and assess how homogeneous patterns are across offenders.
Methods
Using a non-parametric resampling approach, we analyse the entire crime series of 153 prolific burglars to determine if they exhibit consistency and specificity in the way they commit offences.
Results
Findings suggest that offenders exhibit consistency in the way they commit offences. With respect to specificity, our results suggest that patterns are not homogeneous across offenders or the type of MO considered – some offenders exhibit more specificity than do others, and offenders are more distinctive for some aspects of their MO (particularly spatial choices) than they are for others.
Conclusions
The findings provide support for the underlying principles of crime linkage, but suggest that some aspects of an offender's MO either conform to a common preference, or are perhaps more influenced by situational factors than stable scripted preferences. That some offenders fail to demonstrate sufficient specificity for accurate linkage suggests that identifying which crimes are likely to be the work of offenders who display more specificity a priori constitutes one challenge for future research of this kind.
The use of graphical processing unit (GPU) parallel processing is becoming a part of mainstream s... more The use of graphical processing unit (GPU) parallel processing is becoming a part of mainstream statistical practice. The reliance of Bayesian statistics on Markov Chain Monte Carlo (MCMC) methods makes the applicability of parallel processing not immediately obvious. It is illustrated that there are substantial gains in improved computational time for MCMC and other methods of evaluation by computing the likelihood using GPU parallel processing. Examples use data from the Global Terrorism Database to model terrorist activity in Colombia from 2000 through 2010 and a likelihood based on the explicit convolution of two negative-binomial processes. Results show decreases in computational time by a factor of over 200. Factors influencing these improvements and guidelines for programming parallel implementations of the likelihood are discussed.
Objective:
This article explores patterns of terrorist activity over the period from 2000 throug... more Objective:
This article explores patterns of terrorist activity over the period from 2000 through 2010 across three target countries: Indonesia, the Philippines and Thailand.
Methods:
We use self-exciting point process models to create interpretable and replicable metrics for three key terrorism concepts: risk, resilience and volatility, as defined in the context of terrorist activity.
Results:
Analysis of the data shows significant and important differences in the risk, volatility and resilience metrics over time across the three countries. For the three countries analysed, we show that risk varied on a scale from 0.005 to 1.61 “expected terrorist attacks per day”, volatility ranged from 0.820 to 0.994 “additional attacks caused by each attack”, and resilience, as measured by the number of days until risk subsides to a pre-attack level, ranged from 19 to 39 days. We find that of the three countries, Indonesia had the lowest average risk and volatility, and the highest level of resilience, indicative of the relatively sporadic nature of terrorist activity in Indonesia. The high terrorism risk and low resilience in the Philippines was a function of the more intense, less clustered pattern of terrorism than what was evident in Indonesia.
Conclusions:
Mathematical models hold great promise for creating replicable, reliable and interpretable “metrics” to key terrorism concepts such as risk, resilience and volatility.
A predictive model of terrorist activity is developed by examining the daily number of terrorist ... more A predictive model of terrorist activity is developed by examining the daily number of terrorist attacks in Indonesia from 1994 through 2007. The dynamic model employs a shot noise process to explain the self-exciting nature of the terrorist activities. This estimates the probability of future attacks as a function of the times since the past attacks. In addition, the excess of nonattack days coupled with the presence of multiple coordinated attacks on the same day compelled the use of hurdle models to jointly model the probability of an attack day and corresponding number of attacks. A power law distribution with a shot noise driven parameter best modeled the number of attacks on an attack day. Interpretation of the model parameters is discussed and predictive performance of the models is evaluated.
We present a technique to represent the structure of large social networks through ego-centered n... more We present a technique to represent the structure of large social networks through ego-centered network neighborhoods. This provides a local view of the network, focusing on the vertices and their kth order neighborhoods allowing discovery of interesting patterns and features of the network that would be hidden in a global network analysis. We present several examples from a corporate phone call network revealing the ability of our methods to discover interesting network behavior that is only available at the local level. In addition, we present an approach to use these concepts to identify abrupt or subtle anomalies in dynamic networks.
In order to assess the effectiveness of counter-terrorism interventions, terrorism must First be ... more In order to assess the effectiveness of counter-terrorism interventions, terrorism must First be quantitatively measured and appropriate statistical tests developed. By combining aspects of both the frequency and impact of terrorist attacks, we describe how a marked point process framework can establish a comprehensive measure of terrorism. In addition, we show how change point analysis can provide a powerful alternative to intervention analysis in assessing the effectiveness of counter-terrorism efforts. This is illustrated by examining the influence Detachment 88, a specialized counter-terrorism unit, had on the terrorism process in Indonesia.
One aspect of tactical crime or terrorism analysis is predicting the location of the next event i... more One aspect of tactical crime or terrorism analysis is predicting the location of the next event in a series. The objective of this article is to present a methodology to identify the optimal parameters for and test the performance of temporally weighted kernel density estimation models for predicting the next event in a criminal or terrorist event series. By placing event series in a space-time point pattern framework, the next event prediction models are shown to be based on estimating a conditional spatial density function. We use temporal weights that indicate how much influence past events have toward predicting future event locations and can also incorporate uncertainty in the event timing. Results of applying this methodology to crime series in Baltimore County, MD indicate that performance can vary greatly by crime type, little by series length, and is fairly robust to choice of bandwidth.
A method is presented for detecting changes to the distribution of a criminal or terrorist point ... more A method is presented for detecting changes to the distribution of a criminal or terrorist point process between two time periods using a non-model-based approach. By treating the criminal/terrorist point process as an intelligent site selection problem, changes to the process can signify changes in the behavior or activity level of the criminals/terrorists. The locations of past events and an associated vector of geographic, environmental, and socio-economic feature values are employed in the analysis. By modeling the locations of events in each time period as a marked point process, we can then detect differences in the intensity of each component process. A modified PRIM (patient rule induction method) is implemented to partition the high-dimensional feature space, which can include mixed variables, into the most likely change regions. Monte Carlo simulations are easily and quickly generated under random relabeling to test a scan statistic for significance. By detecting local regions of change, not only can it be determined if change has occurred in the study area, but the specific spatial regions where change occurs is also identified. An example is provided of breaking and entering crimes over two-time periods to demonstrate the use of this technique for detecting local regions of change. This methodology also applies to detecting regions of differences between two types of events such as in case–control data.
The main goal of this research is to generate a methodological framework for the statistical dete... more The main goal of this research is to generate a methodological framework for the statistical detection of change in an intelligent site selection (ISS) process. An ISS process is one in which an actor judiciously selects the location and time to initiate an event according to their preferences or perceived utility of that location and time. A fundamental difference between an ISS process and other space-time point processes is its dependence on the realization of some external covariate processes.
A methodological framework is established for the statistical detection of anomalies between two spatial ISS point processes. The two processes could represent two time periods or two types of events such as case-control. By modeling the locations of events in each process as a marked point process, we can then detect differences in the intensity of each component process. A modified PRIM (patient rule induction method) is implemented to partition the high dimensional feature space, which can include mixed variables, into the most likely change regions. Monte Carlo simulations are easily and quickly generated under random relabeling to test a scan statistic for significance. By detecting local regions of change, not only can it be determined if change has occurred in the study area, but the specific region where change occurs is also identified.
Next, consideration of ISS anomaly detection is expanded to the surveillance problem. Instead of fixing the time period to test for change, we now perform sequential analysis to quickly detect when an anomaly occurs and the corresponding change region. Difficulties arise for several reasons: the high-dimensional complex ISS models are of an unknown form or have unknown parameters both pre and post change, the anomalous region is unknown, the time of the change is unknown, and multiple hypothesis testing problems will result from both searching over all possible change regions and change times. A likelihood based methodology is developed that addresses these difficulties. This method expands on some common change detection methods (such as CUSUM, SR, and GLR) while maintaining their simplicity and recursive computation even under multiple unknown parameters. We discuss the derivation of our procedure along with several properties of our methodology related to standard change detection criteria.
Early detection of disease outbreaks is of paramount importance to implementing intervention stra... more Early detection of disease outbreaks is of paramount importance to implementing intervention strategies to mitigate the severity and duration of the outbreak. We build methodology that utilizes the characteristic profile of disease outbreaks to reduce the time to detection and false positive rate. We model daily counts through a Poisson distribution with additive background plus outbreak components. The outbreak component has a parametric form with unknown underlying parameters. A mixture likelihood ratio scan statistic is developed to maximize parameters over a window in time. This provides an alert statistic with early time to detection and low false positive rate. The methodology is demonstrated on three simulated data sets meant to represent E. coli, Cryptosporidium, and Influenza outbreaks.
We consider the problem of modeling the site selection behavior of criminal and terrorist actors.... more We consider the problem of modeling the site selection behavior of criminal and terrorist actors. These site selection processes are non-stationary and can exhibit slow to abrupt changes (e.g. due to police and military actions). Such processes also complicate feature selection as different features may be important at different times. Furthermore, timely responses are desired necessitating a model that can rapidly assess new data and make relevant predictions, incorporating the possibility of radical shifts in the actors’ tactics. This paper describes the use of the fused lasso to model these dynamic processes over time. We develop a sequential fused lasso algorithm that incorporates automatic feature selection, adaptation to process changes and discovery of change points.
Bayesian model-based clustering is a widely applied procedure for discovering groups of related o... more Bayesian model-based clustering is a widely applied procedure for discovering groups of related observations in a dataset. These approaches use Bayesian mixture models, estimated with MCMC, which provide posterior samples of the model parameters and clustering partition. While inference on model parameters is well established, inference on the clustering partition is less developed. A new method is developed for estimating the optimal partition from the pairwise posterior similarity matrix generated by a Bayesian cluster model. This approach uses non-negative matrix factorization (NMF) to provide a low-rank approximation to the similarity matrix. The factorization permits hard or soft partitions and is shown to perform better than several popular alternatives under a variety of penalty functions.
Across the nation, researchers and transportation engineers are developing safety performance fun... more Across the nation, researchers and transportation engineers are developing safety performance functions (SPFs) to predict crash rates and develop crash modification factors to improve traffic safety at roadway segments and intersections. Generalized linear models (GLMs), such as Poisson or negative binomial regression, are most commonly used to develop SPFs with annual average daily traffic as the primary roadway characteristic to predict crashes. However, while more complex to interpret, data mining models such as boosted regression trees have improved upon GLMs crash prediction performance due to their ability to handle more data characteristics, accommodate non-linearities, and include interaction effects between the characteristics. An intersection data inventory of 36 safety relevant parameters for three-and four-legged non-signalized intersections along state routes in Alabama was used to study the importance of intersection characteristics on crash rate and the interaction effects between key characteristics. Four different SPFs were investigated and compared: Poisson regression, negative binomial regression, regularized generalized linear model, and boosted regression trees. The models did not agree on which intersection characteristics were most related to the crash rate. The boosted regression tree model significantly outperformed the other models and identified several intersection characteristics as having strong interaction effects.
The object of this paper is to develop a statistical approach to criminal linkage analysis that d... more The object of this paper is to develop a statistical approach to criminal linkage analysis that discovers and groups crime events that share a common offender and prioritizes suspects for further investigation. Bayes factors are used to describe the strength of evidence that two crimes are linked. Using concepts from agglomerative hierarchical clustering, the Bayes factors for crime pairs are combined to provide similarity measures for comparing two crime series. This facilitates crime series clustering, crime series identification, and suspect prioritization. The ability of our models to make correct linkages and predictions is demonstrated under a variety of real-world scenarios with a large number of solved and unsolved breaking and entering crimes. For example, a naive Bayes model for pairwise case linkage can identify 82% of actual linkages with a 5% false positive rate. For crime series identification, 74%-89% of the additional crimes in a crime series can be identified from a ranked list of 50 incidents.
Statistical clustering of criminal events can be used by crime analysts to create of lists of pot... more Statistical clustering of criminal events can be used by crime analysts to create of lists of potential suspects for an unsolved crime, identify groups of crimes that may have been committed by the same individuals or group of individuals, for offender profiling, and for predicting future events. In this paper, we propose a Bayesian model-based clustering approach for criminal events. Our approach is semi-supervised because the offender is known
for a subset of the events, and utilizes spatiotemporal crime locations as well as crime features describing the offender’s modus operandi. The hierarchical model naturally handles complex features often seen in crime data, including missing data, interval censored event times, and a mix of discrete and continuous variables. In addition, our Bayesian model produces posterior clustering probabilities which allow analysts to act on model output only as warranted. We illustrate the approach using a large data set of burglaries in 2009-2010 in Baltimore County, Maryland.
Purpose
Behavioural crime linkage is underpinned by two assumptions: (a) that offenders exhibit ... more Purpose
Behavioural crime linkage is underpinned by two assumptions: (a) that offenders exhibit some degree of consistency in the way they commit offences (their modus operandi [MO]); and, (b) that offenders can be differentiated on the basis of their offence behaviour. The majority of existing studies sample at most three crimes from an offender's series of detected crimes and do not examine whether patterns differ across offenders. Here, we examine patterns observed across the entire detected series of each sampled offender, and assess how homogeneous patterns are across offenders.
Methods
Using a non-parametric resampling approach, we analyse the entire crime series of 153 prolific burglars to determine if they exhibit consistency and specificity in the way they commit offences.
Results
Findings suggest that offenders exhibit consistency in the way they commit offences. With respect to specificity, our results suggest that patterns are not homogeneous across offenders or the type of MO considered – some offenders exhibit more specificity than do others, and offenders are more distinctive for some aspects of their MO (particularly spatial choices) than they are for others.
Conclusions
The findings provide support for the underlying principles of crime linkage, but suggest that some aspects of an offender's MO either conform to a common preference, or are perhaps more influenced by situational factors than stable scripted preferences. That some offenders fail to demonstrate sufficient specificity for accurate linkage suggests that identifying which crimes are likely to be the work of offenders who display more specificity a priori constitutes one challenge for future research of this kind.
The use of graphical processing unit (GPU) parallel processing is becoming a part of mainstream s... more The use of graphical processing unit (GPU) parallel processing is becoming a part of mainstream statistical practice. The reliance of Bayesian statistics on Markov Chain Monte Carlo (MCMC) methods makes the applicability of parallel processing not immediately obvious. It is illustrated that there are substantial gains in improved computational time for MCMC and other methods of evaluation by computing the likelihood using GPU parallel processing. Examples use data from the Global Terrorism Database to model terrorist activity in Colombia from 2000 through 2010 and a likelihood based on the explicit convolution of two negative-binomial processes. Results show decreases in computational time by a factor of over 200. Factors influencing these improvements and guidelines for programming parallel implementations of the likelihood are discussed.
Objective:
This article explores patterns of terrorist activity over the period from 2000 throug... more Objective:
This article explores patterns of terrorist activity over the period from 2000 through 2010 across three target countries: Indonesia, the Philippines and Thailand.
Methods:
We use self-exciting point process models to create interpretable and replicable metrics for three key terrorism concepts: risk, resilience and volatility, as defined in the context of terrorist activity.
Results:
Analysis of the data shows significant and important differences in the risk, volatility and resilience metrics over time across the three countries. For the three countries analysed, we show that risk varied on a scale from 0.005 to 1.61 “expected terrorist attacks per day”, volatility ranged from 0.820 to 0.994 “additional attacks caused by each attack”, and resilience, as measured by the number of days until risk subsides to a pre-attack level, ranged from 19 to 39 days. We find that of the three countries, Indonesia had the lowest average risk and volatility, and the highest level of resilience, indicative of the relatively sporadic nature of terrorist activity in Indonesia. The high terrorism risk and low resilience in the Philippines was a function of the more intense, less clustered pattern of terrorism than what was evident in Indonesia.
Conclusions:
Mathematical models hold great promise for creating replicable, reliable and interpretable “metrics” to key terrorism concepts such as risk, resilience and volatility.
A predictive model of terrorist activity is developed by examining the daily number of terrorist ... more A predictive model of terrorist activity is developed by examining the daily number of terrorist attacks in Indonesia from 1994 through 2007. The dynamic model employs a shot noise process to explain the self-exciting nature of the terrorist activities. This estimates the probability of future attacks as a function of the times since the past attacks. In addition, the excess of nonattack days coupled with the presence of multiple coordinated attacks on the same day compelled the use of hurdle models to jointly model the probability of an attack day and corresponding number of attacks. A power law distribution with a shot noise driven parameter best modeled the number of attacks on an attack day. Interpretation of the model parameters is discussed and predictive performance of the models is evaluated.
We present a technique to represent the structure of large social networks through ego-centered n... more We present a technique to represent the structure of large social networks through ego-centered network neighborhoods. This provides a local view of the network, focusing on the vertices and their kth order neighborhoods allowing discovery of interesting patterns and features of the network that would be hidden in a global network analysis. We present several examples from a corporate phone call network revealing the ability of our methods to discover interesting network behavior that is only available at the local level. In addition, we present an approach to use these concepts to identify abrupt or subtle anomalies in dynamic networks.
In order to assess the effectiveness of counter-terrorism interventions, terrorism must First be ... more In order to assess the effectiveness of counter-terrorism interventions, terrorism must First be quantitatively measured and appropriate statistical tests developed. By combining aspects of both the frequency and impact of terrorist attacks, we describe how a marked point process framework can establish a comprehensive measure of terrorism. In addition, we show how change point analysis can provide a powerful alternative to intervention analysis in assessing the effectiveness of counter-terrorism efforts. This is illustrated by examining the influence Detachment 88, a specialized counter-terrorism unit, had on the terrorism process in Indonesia.
One aspect of tactical crime or terrorism analysis is predicting the location of the next event i... more One aspect of tactical crime or terrorism analysis is predicting the location of the next event in a series. The objective of this article is to present a methodology to identify the optimal parameters for and test the performance of temporally weighted kernel density estimation models for predicting the next event in a criminal or terrorist event series. By placing event series in a space-time point pattern framework, the next event prediction models are shown to be based on estimating a conditional spatial density function. We use temporal weights that indicate how much influence past events have toward predicting future event locations and can also incorporate uncertainty in the event timing. Results of applying this methodology to crime series in Baltimore County, MD indicate that performance can vary greatly by crime type, little by series length, and is fairly robust to choice of bandwidth.
A method is presented for detecting changes to the distribution of a criminal or terrorist point ... more A method is presented for detecting changes to the distribution of a criminal or terrorist point process between two time periods using a non-model-based approach. By treating the criminal/terrorist point process as an intelligent site selection problem, changes to the process can signify changes in the behavior or activity level of the criminals/terrorists. The locations of past events and an associated vector of geographic, environmental, and socio-economic feature values are employed in the analysis. By modeling the locations of events in each time period as a marked point process, we can then detect differences in the intensity of each component process. A modified PRIM (patient rule induction method) is implemented to partition the high-dimensional feature space, which can include mixed variables, into the most likely change regions. Monte Carlo simulations are easily and quickly generated under random relabeling to test a scan statistic for significance. By detecting local regions of change, not only can it be determined if change has occurred in the study area, but the specific spatial regions where change occurs is also identified. An example is provided of breaking and entering crimes over two-time periods to demonstrate the use of this technique for detecting local regions of change. This methodology also applies to detecting regions of differences between two types of events such as in case–control data.
The main goal of this research is to generate a methodological framework for the statistical dete... more The main goal of this research is to generate a methodological framework for the statistical detection of change in an intelligent site selection (ISS) process. An ISS process is one in which an actor judiciously selects the location and time to initiate an event according to their preferences or perceived utility of that location and time. A fundamental difference between an ISS process and other space-time point processes is its dependence on the realization of some external covariate processes.
A methodological framework is established for the statistical detection of anomalies between two spatial ISS point processes. The two processes could represent two time periods or two types of events such as case-control. By modeling the locations of events in each process as a marked point process, we can then detect differences in the intensity of each component process. A modified PRIM (patient rule induction method) is implemented to partition the high dimensional feature space, which can include mixed variables, into the most likely change regions. Monte Carlo simulations are easily and quickly generated under random relabeling to test a scan statistic for significance. By detecting local regions of change, not only can it be determined if change has occurred in the study area, but the specific region where change occurs is also identified.
Next, consideration of ISS anomaly detection is expanded to the surveillance problem. Instead of fixing the time period to test for change, we now perform sequential analysis to quickly detect when an anomaly occurs and the corresponding change region. Difficulties arise for several reasons: the high-dimensional complex ISS models are of an unknown form or have unknown parameters both pre and post change, the anomalous region is unknown, the time of the change is unknown, and multiple hypothesis testing problems will result from both searching over all possible change regions and change times. A likelihood based methodology is developed that addresses these difficulties. This method expands on some common change detection methods (such as CUSUM, SR, and GLR) while maintaining their simplicity and recursive computation even under multiple unknown parameters. We discuss the derivation of our procedure along with several properties of our methodology related to standard change detection criteria.
Early detection of disease outbreaks is of paramount importance to implementing intervention stra... more Early detection of disease outbreaks is of paramount importance to implementing intervention strategies to mitigate the severity and duration of the outbreak. We build methodology that utilizes the characteristic profile of disease outbreaks to reduce the time to detection and false positive rate. We model daily counts through a Poisson distribution with additive background plus outbreak components. The outbreak component has a parametric form with unknown underlying parameters. A mixture likelihood ratio scan statistic is developed to maximize parameters over a window in time. This provides an alert statistic with early time to detection and low false positive rate. The methodology is demonstrated on three simulated data sets meant to represent E. coli, Cryptosporidium, and Influenza outbreaks.
We consider the problem of modeling the site selection behavior of criminal and terrorist actors.... more We consider the problem of modeling the site selection behavior of criminal and terrorist actors. These site selection processes are non-stationary and can exhibit slow to abrupt changes (e.g. due to police and military actions). Such processes also complicate feature selection as different features may be important at different times. Furthermore, timely responses are desired necessitating a model that can rapidly assess new data and make relevant predictions, incorporating the possibility of radical shifts in the actors’ tactics. This paper describes the use of the fused lasso to model these dynamic processes over time. We develop a sequential fused lasso algorithm that incorporates automatic feature selection, adaptation to process changes and discovery of change points.
Uploads
Papers by Michael D. Porter
for a subset of the events, and utilizes spatiotemporal crime locations as well as crime features describing the offender’s modus operandi. The hierarchical model naturally handles complex features often seen in crime data, including missing data, interval censored event times, and a mix of discrete and continuous variables. In addition, our Bayesian model produces posterior clustering probabilities which allow analysts to act on model output only as warranted. We illustrate the approach using a large data set of burglaries in 2009-2010 in Baltimore County, Maryland.
Behavioural crime linkage is underpinned by two assumptions: (a) that offenders exhibit some degree of consistency in the way they commit offences (their modus operandi [MO]); and, (b) that offenders can be differentiated on the basis of their offence behaviour. The majority of existing studies sample at most three crimes from an offender's series of detected crimes and do not examine whether patterns differ across offenders. Here, we examine patterns observed across the entire detected series of each sampled offender, and assess how homogeneous patterns are across offenders.
Methods
Using a non-parametric resampling approach, we analyse the entire crime series of 153 prolific burglars to determine if they exhibit consistency and specificity in the way they commit offences.
Results
Findings suggest that offenders exhibit consistency in the way they commit offences. With respect to specificity, our results suggest that patterns are not homogeneous across offenders or the type of MO considered – some offenders exhibit more specificity than do others, and offenders are more distinctive for some aspects of their MO (particularly spatial choices) than they are for others.
Conclusions
The findings provide support for the underlying principles of crime linkage, but suggest that some aspects of an offender's MO either conform to a common preference, or are perhaps more influenced by situational factors than stable scripted preferences. That some offenders fail to demonstrate sufficient specificity for accurate linkage suggests that identifying which crimes are likely to be the work of offenders who display more specificity a priori constitutes one challenge for future research of this kind.
This article explores patterns of terrorist activity over the period from 2000 through 2010 across three target countries: Indonesia, the Philippines and Thailand.
Methods:
We use self-exciting point process models to create interpretable and replicable metrics for three key terrorism concepts: risk, resilience and volatility, as defined in the context of terrorist activity.
Results:
Analysis of the data shows significant and important differences in the risk, volatility and resilience metrics over time across the three countries. For the three countries analysed, we show that risk varied on a scale from 0.005 to 1.61 “expected terrorist attacks per day”, volatility ranged from 0.820 to 0.994 “additional attacks caused by each attack”, and resilience, as measured by the number of days until risk subsides to a pre-attack level, ranged from 19 to 39 days. We find that of the three countries, Indonesia had the lowest average risk and volatility, and the highest level of resilience, indicative of the relatively sporadic nature of terrorist activity in Indonesia. The high terrorism risk and low resilience in the Philippines was a function of the more intense, less clustered pattern of terrorism than what was evident in Indonesia.
Conclusions:
Mathematical models hold great promise for creating replicable, reliable and interpretable “metrics” to key terrorism concepts such as risk, resilience and volatility.
A methodological framework is established for the statistical detection of anomalies between two spatial ISS point processes. The two processes could represent two time periods or two types of events such as case-control. By modeling the locations of events in each process as a marked point process, we can then detect differences in the intensity of each component process. A modified PRIM (patient rule induction method) is implemented to partition the high dimensional feature space, which can include mixed variables, into the most likely change regions. Monte Carlo simulations are easily and quickly generated under random relabeling to test a scan statistic for significance. By detecting local regions of change, not only can it be determined if change has occurred in the study area, but the specific region where change occurs is also identified.
Next, consideration of ISS anomaly detection is expanded to the surveillance problem. Instead of fixing the time period to test for change, we now perform sequential analysis to quickly detect when an anomaly occurs and the corresponding change region. Difficulties arise for several reasons: the high-dimensional complex ISS models are of an unknown form or have unknown parameters both pre and post change, the anomalous region is unknown, the time of the change is unknown, and multiple hypothesis testing problems will result from both searching over all possible change regions and change times. A likelihood based methodology is developed that addresses these difficulties. This method expands on some common change detection methods (such as CUSUM, SR, and GLR) while maintaining their simplicity and recursive computation even under multiple unknown parameters. We discuss the derivation of our procedure along with several properties of our methodology related to standard change detection criteria.
for a subset of the events, and utilizes spatiotemporal crime locations as well as crime features describing the offender’s modus operandi. The hierarchical model naturally handles complex features often seen in crime data, including missing data, interval censored event times, and a mix of discrete and continuous variables. In addition, our Bayesian model produces posterior clustering probabilities which allow analysts to act on model output only as warranted. We illustrate the approach using a large data set of burglaries in 2009-2010 in Baltimore County, Maryland.
Behavioural crime linkage is underpinned by two assumptions: (a) that offenders exhibit some degree of consistency in the way they commit offences (their modus operandi [MO]); and, (b) that offenders can be differentiated on the basis of their offence behaviour. The majority of existing studies sample at most three crimes from an offender's series of detected crimes and do not examine whether patterns differ across offenders. Here, we examine patterns observed across the entire detected series of each sampled offender, and assess how homogeneous patterns are across offenders.
Methods
Using a non-parametric resampling approach, we analyse the entire crime series of 153 prolific burglars to determine if they exhibit consistency and specificity in the way they commit offences.
Results
Findings suggest that offenders exhibit consistency in the way they commit offences. With respect to specificity, our results suggest that patterns are not homogeneous across offenders or the type of MO considered – some offenders exhibit more specificity than do others, and offenders are more distinctive for some aspects of their MO (particularly spatial choices) than they are for others.
Conclusions
The findings provide support for the underlying principles of crime linkage, but suggest that some aspects of an offender's MO either conform to a common preference, or are perhaps more influenced by situational factors than stable scripted preferences. That some offenders fail to demonstrate sufficient specificity for accurate linkage suggests that identifying which crimes are likely to be the work of offenders who display more specificity a priori constitutes one challenge for future research of this kind.
This article explores patterns of terrorist activity over the period from 2000 through 2010 across three target countries: Indonesia, the Philippines and Thailand.
Methods:
We use self-exciting point process models to create interpretable and replicable metrics for three key terrorism concepts: risk, resilience and volatility, as defined in the context of terrorist activity.
Results:
Analysis of the data shows significant and important differences in the risk, volatility and resilience metrics over time across the three countries. For the three countries analysed, we show that risk varied on a scale from 0.005 to 1.61 “expected terrorist attacks per day”, volatility ranged from 0.820 to 0.994 “additional attacks caused by each attack”, and resilience, as measured by the number of days until risk subsides to a pre-attack level, ranged from 19 to 39 days. We find that of the three countries, Indonesia had the lowest average risk and volatility, and the highest level of resilience, indicative of the relatively sporadic nature of terrorist activity in Indonesia. The high terrorism risk and low resilience in the Philippines was a function of the more intense, less clustered pattern of terrorism than what was evident in Indonesia.
Conclusions:
Mathematical models hold great promise for creating replicable, reliable and interpretable “metrics” to key terrorism concepts such as risk, resilience and volatility.
A methodological framework is established for the statistical detection of anomalies between two spatial ISS point processes. The two processes could represent two time periods or two types of events such as case-control. By modeling the locations of events in each process as a marked point process, we can then detect differences in the intensity of each component process. A modified PRIM (patient rule induction method) is implemented to partition the high dimensional feature space, which can include mixed variables, into the most likely change regions. Monte Carlo simulations are easily and quickly generated under random relabeling to test a scan statistic for significance. By detecting local regions of change, not only can it be determined if change has occurred in the study area, but the specific region where change occurs is also identified.
Next, consideration of ISS anomaly detection is expanded to the surveillance problem. Instead of fixing the time period to test for change, we now perform sequential analysis to quickly detect when an anomaly occurs and the corresponding change region. Difficulties arise for several reasons: the high-dimensional complex ISS models are of an unknown form or have unknown parameters both pre and post change, the anomalous region is unknown, the time of the change is unknown, and multiple hypothesis testing problems will result from both searching over all possible change regions and change times. A likelihood based methodology is developed that addresses these difficulties. This method expands on some common change detection methods (such as CUSUM, SR, and GLR) while maintaining their simplicity and recursive computation even under multiple unknown parameters. We discuss the derivation of our procedure along with several properties of our methodology related to standard change detection criteria.