-
Optimal Forecast Reconciliation with Uncertainty Quantification
Authors:
Jan Kloppenborg Møller,
Peter Nystrup,
Poul G. Hjorth,
Henrik Madsen
Abstract:
We propose to estimate the weight matrix used for forecast reconciliation as parameters in a general linear model in order to quantify its uncertainty. This implies that forecast reconciliation can be formulated as an orthogonal projection from the space of base-forecast errors into a coherent linear subspace. We use variance decomposition together with the Wishart distribution to derive the centr…
▽ More
We propose to estimate the weight matrix used for forecast reconciliation as parameters in a general linear model in order to quantify its uncertainty. This implies that forecast reconciliation can be formulated as an orthogonal projection from the space of base-forecast errors into a coherent linear subspace. We use variance decomposition together with the Wishart distribution to derive the central estimator for the forecast-error covariance matrix. In addition, we prove that distance-reducing properties apply to the reconciled forecasts at all levels of the hierarchy as well as to the forecast-error covariance. A covariance matrix for the reconciliation weight matrix is derived, which leads to improved estimates of the forecast-error covariance matrix. We show how shrinkage can be introduced in the formulated model by imposing specific priors on the weight matrix and the forecast-error covariance matrix. The method is illustrated in a simulation study that shows consistent improvements in the log-score. Finally, standard errors for the weight matrix and the variance-separation formula are illustrated using a case study of forecasting electricity load in Sweden.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Fitting the grain orientation distribution of a polycrystalline material conditioned on a Laguerre tessellation
Authors:
I. Karafiátová,
J. Møller,
Z. Pawlas,
J. Staněk,
F. Seitl,
V. Beneš
Abstract:
The description of distributions related to grain microstructure helps physicists to understand the processes in materials and their properties. This paper presents a general statistical methodology for the analysis of crystallographic orientations of grains in a 3D Laguerre tessellation dataset which represents the microstructure of a polycrystalline material. We introduce complex stochastic mode…
▽ More
The description of distributions related to grain microstructure helps physicists to understand the processes in materials and their properties. This paper presents a general statistical methodology for the analysis of crystallographic orientations of grains in a 3D Laguerre tessellation dataset which represents the microstructure of a polycrystalline material. We introduce complex stochastic models which may substitute expensive laboratory experiments: conditional on the Laguerre tessellation, we suggest interaction models for the distribution of cubic crystal lattice orientations, where the interaction is between pairs of orientations for neighbouring grains in the tessellation. We discuss parameter estimation and model comparison methods based on maximum pseudolikelihood as well as graphical procedures for model checking using simulations. Our methodology is applied for analysing a dataset representing a nickel-titanium shape memory alloy.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Determinantal shot noise Cox processes
Authors:
Jesper Møller,
Ninna Vihrs
Abstract:
We present a new class of cluster point process models, which we call determinantal shot noise Cox processes (DSNCP), with repulsion between cluster centres. They are the special case of generalized shot noise Cox processes where the cluster centres are determinantal point processes. We establish various moment results and describe how these can be used to easily estimate unknown parameters in two…
▽ More
We present a new class of cluster point process models, which we call determinantal shot noise Cox processes (DSNCP), with repulsion between cluster centres. They are the special case of generalized shot noise Cox processes where the cluster centres are determinantal point processes. We establish various moment results and describe how these can be used to easily estimate unknown parameters in two particularly tractable cases, namely when the offspring density is isotropic Gaussian and the kernel of the determinantal point process of cluster centres is Gaussian or like in a scaled Ginibre point process. Through a simulation study and the analysis of a real point pattern data set we see that when modelling clustered point patterns, a much lower intensity of cluster centres may be needed in DSNCP models as compared to shot noise Cox processes.
△ Less
Submitted 30 May, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Fitting three-dimensional Laguerre tessellations by hierarchical marked point process models
Authors:
Filip Seitl,
Jesper Møller,
Viktor Beneš
Abstract:
We present a general statistical methodology for analysing a Laguerre tessellation data set viewed as a realization of a marked point process model. In the first step, for the points we use a nested sequence of multiscale processes which constitute a flexible parametric class of pairwise interaction point process models. In the second step, for the marks/radii conditioned on the points we consider…
▽ More
We present a general statistical methodology for analysing a Laguerre tessellation data set viewed as a realization of a marked point process model. In the first step, for the points we use a nested sequence of multiscale processes which constitute a flexible parametric class of pairwise interaction point process models. In the second step, for the marks/radii conditioned on the points we consider various exponential family models where the canonical sufficient statistic is based on tessellation characteristics. For each step parameter estimation based on maximum pseudolikelihood methods is tractable. Model checking is performed using global envelopes and corresponding tests in the first step and by comparing observed and simulated tessellation characteristics in the second step. We apply our methodology for a 3D Laguerre tessellation data set representing the microstructure of a polycrystalline metallic material, where simulations under a fitted model may substitute expensive laboratory experiments.
△ Less
Submitted 1 April, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
onlineforecast: An R package for adaptive and recursive forecasting
Authors:
Peder Bacher,
Hjörleifur G. Bergsteinsson,
Linde Frölke,
Mikkel L. Sørensen,
Julian Lemos-Vinasco,
Jon Liisberg,
Jan Kloppenborg Møller,
Henrik Aalborg Nielsen,
Henrik Madsen
Abstract:
Systems that rely on forecasts to make decisions, e.g. control or energy trading systems, require frequent updates of the forecasts. Usually, the forecasts are updated whenever new observations become available, hence in an online setting. We present the R package onlineforecast that provides a generalized setup of data and models for online forecasting. It has functionality for time-adaptive fitt…
▽ More
Systems that rely on forecasts to make decisions, e.g. control or energy trading systems, require frequent updates of the forecasts. Usually, the forecasts are updated whenever new observations become available, hence in an online setting. We present the R package onlineforecast that provides a generalized setup of data and models for online forecasting. It has functionality for time-adaptive fitting of dynamical and non-linear models. The setup is tailored to enable the effective use of forecasts as model inputs, e.g. numerical weather forecast. Users can create new models for their particular applications and run models in an operational setting. The package also allows users to easily replace parts of the setup, e.g. using neural network methods for estimation. The package comes with comprehensive vignettes and examples of online forecasting applications in energy systems, but can easily be applied for online forecasting in all fields.
△ Less
Submitted 22 May, 2022; v1 submitted 27 September, 2021;
originally announced September 2021.
-
Should we condition on the number of points when modelling spatial point patterns?
Authors:
Jesper Møller,
Ninna Vihrs
Abstract:
We discuss the practice of directly or indirectly assuming a model for the number of points when modelling spatial point patterns even though it is rarely possible to validate such a model in practice since most point pattern data consist of only one pattern. We therefore explore the possibility to condition on the number of points instead when fitting and validating spatial point process models.…
▽ More
We discuss the practice of directly or indirectly assuming a model for the number of points when modelling spatial point patterns even though it is rarely possible to validate such a model in practice since most point pattern data consist of only one pattern. We therefore explore the possibility to condition on the number of points instead when fitting and validating spatial point process models. In a simulation study with different popular spatial point process models, we consider model validation using global envelope tests based on functional summary statistics. We find that conditioning on the number of points will for some functional summary statistics lead to more narrow envelopes and that it can also be useful for correcting for some conservativeness in the tests when testing composite hypothesis. However, for other functional summary statistics, it makes little or no difference to condition on the number of points. When estimating parameters in popular spatial point process models, we conclude that for mathematical and computational reasons it is convenient to assume a distribution for the number of points.
△ Less
Submitted 9 February, 2022; v1 submitted 23 August, 2021;
originally announced August 2021.
-
Digital trace data collection through data donation
Authors:
Laura Boeschoten,
Jef Ausloos,
Judith Moeller,
Theo Araujo,
Daniel L. Oberski
Abstract:
A potentially powerful method of social-scientific data collection and investigation has been created by an unexpected institution: the law. Article 15 of the EU's 2018 General Data Protection Regulation (GDPR) mandates that individuals have electronic access to a copy of their personal data, and all major digital platforms now comply with this law by providing users with "data download packages"…
▽ More
A potentially powerful method of social-scientific data collection and investigation has been created by an unexpected institution: the law. Article 15 of the EU's 2018 General Data Protection Regulation (GDPR) mandates that individuals have electronic access to a copy of their personal data, and all major digital platforms now comply with this law by providing users with "data download packages" (DDPs). Through voluntary donation of DDPs, all data collected by public and private entities during the course of citizens' digital life can be obtained and analyzed to answer social-scientific questions - with consent. Thus, consented DDPs open the way for vast new research opportunities. However, while this entirely new method of data collection will undoubtedly gain popularity in the coming years, it also comes with its own questions of representativeness and measurement quality, which are often evaluated systematically by means of an error framework. Therefore, in this paper we provide a blueprint for digital trace data collection using DDPs, and devise a "total error framework" for such projects. Our error framework for digital trace data collection through data donation is intended to facilitate high quality social-scientific investigations using DDPs while critically reflecting its unique methodological challenges and sources of error. In addition, we provide a quality control checklist to guide researchers in leveraging the vast opportunities afforded by this new mode of investigation.
△ Less
Submitted 13 November, 2020;
originally announced November 2020.
-
MCMC computations for Bayesian mixture models using repulsive point processes
Authors:
Mario Beraha,
Raffaele Argiento,
Jesper Møller,
Alessandra Guglielmi
Abstract:
Repulsive mixture models have recently gained popularity for Bayesian cluster detection. Compared to more traditional mixture models, repulsive mixture models produce a smaller number of well separated clusters. The most commonly used methods for posterior inference either require to fix a priori the number of components or are based on reversible jump MCMC computation. We present a general framew…
▽ More
Repulsive mixture models have recently gained popularity for Bayesian cluster detection. Compared to more traditional mixture models, repulsive mixture models produce a smaller number of well separated clusters. The most commonly used methods for posterior inference either require to fix a priori the number of components or are based on reversible jump MCMC computation. We present a general framework for mixture models, when the prior of the `cluster centres' is a finite repulsive point process depending on a hyperparameter, specified by a density which may depend on an intractable normalizing constant. By investigating the posterior characterization of this class of mixture models, we derive a MCMC algorithm which avoids the well-known difficulties associated to reversible jump MCMC computation. In particular, we use an ancillary variable method, which eliminates the problem of having intractable normalizing constants in the Hastings ratio. The ancillary variable method relies on a perfect simulation algorithm, and we demonstrate this is fast because the number of components is typically small. In several simulation studies and an application on sociological data, we illustrate the advantage of our new methodology over existing methods, and we compare the use of a determinantal or a repulsive Gibbs point process prior model.
△ Less
Submitted 19 April, 2021; v1 submitted 12 November, 2020;
originally announced November 2020.
-
Globally intensity-reweighted estimators for $K$- and pair correlation functions
Authors:
Thomas Shaw,
Jesper Møller,
Rasmus Waagepetersen
Abstract:
We introduce new estimators of the inhomogeneous $K$-function and the pair correlation function of a spatial point process as well as the cross $K$-function and the cross pair correlation function of a bivariate spatial point process under the assumption of second-order intensity-reweighted stationarity. These estimators rely on a 'global' normalization factor which depends on an aggregation of th…
▽ More
We introduce new estimators of the inhomogeneous $K$-function and the pair correlation function of a spatial point process as well as the cross $K$-function and the cross pair correlation function of a bivariate spatial point process under the assumption of second-order intensity-reweighted stationarity. These estimators rely on a 'global' normalization factor which depends on an aggregation of the intensity function, whilst the existing estimators depend 'locally' on the intensity function at the individual observed points. The advantages of our new global estimators over the existing local estimators are demonstrated by theoretical considerations and a simulation study.
△ Less
Submitted 2 October, 2020; v1 submitted 1 April, 2020;
originally announced April 2020.
-
Approximate Bayesian inference for a spatial point process model exhibiting regularity and random aggregation
Authors:
Ninna Vihrs,
Jesper Møller,
Alan E. Gelfand
Abstract:
In this paper, we propose a doubly stochastic spatial point process model with both aggregation and repulsion. This model combines the ideas behind Strauss processes and log Gaussian Cox processes. The likelihood for this model is not expressible in closed form but it is easy to simulate realisations under the model. We therefore explain how to use approximate Bayesian computation (ABC) to carry o…
▽ More
In this paper, we propose a doubly stochastic spatial point process model with both aggregation and repulsion. This model combines the ideas behind Strauss processes and log Gaussian Cox processes. The likelihood for this model is not expressible in closed form but it is easy to simulate realisations under the model. We therefore explain how to use approximate Bayesian computation (ABC) to carry out statistical inference for this model. We suggest a method for model validation based on posterior predictions and global envelopes. We illustrate the ABC procedure and model validation approach using both simulated point patterns and a real data example.
△ Less
Submitted 3 December, 2020; v1 submitted 23 March, 2020;
originally announced March 2020.
-
Modelling columnarity of pyramidal cells in the human cerebral cortex
Authors:
Andreas D. Christoffersen,
Jesper Møller,
Heidi S. Christensen
Abstract:
For modelling the location of pyramidal cells in the human cerebral cortex we suggest a hierarchical point process in $\mathbb{R}^3$ that exhibits anisotropy in the form of cylinders extending along the $z$-axis. The model consists first of a generalised shot noise Cox process for the $xy$-coordinates, providing cylindrical clusters, and next of a Markov random field model for the $z$-coordinates…
▽ More
For modelling the location of pyramidal cells in the human cerebral cortex we suggest a hierarchical point process in $\mathbb{R}^3$ that exhibits anisotropy in the form of cylinders extending along the $z$-axis. The model consists first of a generalised shot noise Cox process for the $xy$-coordinates, providing cylindrical clusters, and next of a Markov random field model for the $z$-coordinates conditioned on the $xy$-coordinates, providing either repulsion, aggregation, or both within specified areas of interaction. Several cases of these hierarchical point processes are fitted to two pyramidal cell datasets, and of these a final model allowing for both repulsion and attraction between the points seem adequate. We discuss how the final model relates to the so-called minicolumn hypothesis in neuroscience.
△ Less
Submitted 24 November, 2020; v1 submitted 14 August, 2019;
originally announced August 2019.
-
Modelling spine locations on dendrite trees using inhomogeneous Cox point processes
Authors:
Heidi S. Christensen,
Jesper Møller
Abstract:
Dendritic spines, which are small protrusions on the dendrites of a neuron, are of interest in neuroscience as they are related to cognitive processes such as learning and memory. We analyse the distribution of spine locations on six different dendrite trees from mouse neurons using point process theory for linear networks. Besides some possible small-scale repulsion, { we find that two of the spi…
▽ More
Dendritic spines, which are small protrusions on the dendrites of a neuron, are of interest in neuroscience as they are related to cognitive processes such as learning and memory. We analyse the distribution of spine locations on six different dendrite trees from mouse neurons using point process theory for linear networks. Besides some possible small-scale repulsion, { we find that two of the spine point pattern data sets may be described by inhomogeneous Poisson process models}, while the other point pattern data sets exhibit clustering between spines at a larger scale. To model this we propose an inhomogeneous Cox process model constructed by thinning a Poisson process on a linear network with retention probabilities determined by a spatially correlated random field. For model checking we consider network analogues of the empirical $F$-, $G$-, and $J$-functions originally introduced for inhomogeneous point processes on a Euclidean space. The fitted Cox process models seem to catch the clustering of spine locations between spines, but also posses a large variance in the number of points for some of the data sets causing large confidence regions for the empirical $F$- and $G$-functions.
△ Less
Submitted 25 October, 2020; v1 submitted 29 July, 2019;
originally announced July 2019.
-
Spatio-Temporal Forecasting by Coupled Stochastic Differential Equations: Applications to Solar Power
Authors:
Emil B. Iversen,
Rune Juhl,
Jan K. Møller,
Jan Kleissl,
Henrik Madsen,
Juan M. Morales
Abstract:
Spatio-temporal problems exist in many areas of knowledge and disciplines ranging from biology to engineering and physics. However, solution strategies based on classical statistical techniques often fall short due to the large number of parameters that are to be estimated and the huge amount of data that need to be handled. In this paper we apply known techniques in a novel way to provide a frame…
▽ More
Spatio-temporal problems exist in many areas of knowledge and disciplines ranging from biology to engineering and physics. However, solution strategies based on classical statistical techniques often fall short due to the large number of parameters that are to be estimated and the huge amount of data that need to be handled. In this paper we apply known techniques in a novel way to provide a framework for spatio-temporal modeling which is both computationally efficient and has a low dimensional parameter space. We present a micro-to-macro approach whereby the local dynamics are first modeled and subsequently combined to capture the global system behavior. The proposed methodology relies on coupled stochastic differential equations and is applied to produce spatio-temporal forecasts for a solar power plant for very short horizons, which essentially implies tracking clouds moving across the field of solar power inverters. We outperform simple and complex benchmarks while providing forecasts for 70 spatial dimensions and 24 lead times (i.e., for a total number of random variables equal to 1680). The resulting model can provide all sorts of forecast products, ranging from point forecasts and co-variances to predictive densities, multi-horizon forecasts, and space-time trajectories.
△ Less
Submitted 14 June, 2017;
originally announced June 2017.
-
Some recent developments in statistics for spatial point patterns
Authors:
Jesper Møller,
Rasmus Waagepetersen
Abstract:
This paper reviews developments in statistics for spatial point processes obtained within roughly the last decade. These developments include new classes of spatial point process models such as determinantal point processes, models incorporating both regularity and aggregation, and models where points are randomly distributed around latent geometric structures. Regarding parametric inference the m…
▽ More
This paper reviews developments in statistics for spatial point processes obtained within roughly the last decade. These developments include new classes of spatial point process models such as determinantal point processes, models incorporating both regularity and aggregation, and models where points are randomly distributed around latent geometric structures. Regarding parametric inference the main focus is on various types of estimating functions derived from so-called innovation measures. Optimality of such estimating functions is discussed as well as computational issues. Maximum likelihood inference for determinantal point processes and Bayesian inference are briefly considered too. Concerning non-parametric inference, we consider extensions of functional summary statistics to the case of inhomogeneous point processes as well as new approaches to simulation based inference.
△ Less
Submitted 4 September, 2016;
originally announced September 2016.
-
Investigations of the effects of random sampling patterns on the stability of generalized sampling
Authors:
Robert Dahl Jacobsen,
Jesper Møller,
Morten Nielsen,
Morten Grud Rasmussen
Abstract:
We investigate how the choice of spatial point process for generating random sampling patterns affects the numerical stability of non-uniform generalized sampling between Fourier bases and Daubechies scaling functions. Specifically, we consider binomial, Poisson and determinantal point processes and demonstrate that the more regular point patterns from the determinantal point process are superior.
We investigate how the choice of spatial point process for generating random sampling patterns affects the numerical stability of non-uniform generalized sampling between Fourier bases and Daubechies scaling functions. Specifically, we consider binomial, Poisson and determinantal point processes and demonstrate that the more regular point patterns from the determinantal point process are superior.
△ Less
Submitted 17 September, 2017; v1 submitted 15 July, 2016;
originally announced July 2016.
-
Determinantal point process models on the sphere
Authors:
Jesper Møller,
Morten Nielsen,
Emilio Porcu,
Ege Rubak
Abstract:
We consider determinantal point processes on the $d$-dimensional unit sphere $\mathbb S^d$. These are finite point processes exhibiting repulsiveness and with moment properties determined by a certain determinant whose entries are specified by a so-called kernel which we assume is a complex covariance function defined on $\mathbb S^d\times\mathbb S^d$. We review the appealing properties of such pr…
▽ More
We consider determinantal point processes on the $d$-dimensional unit sphere $\mathbb S^d$. These are finite point processes exhibiting repulsiveness and with moment properties determined by a certain determinant whose entries are specified by a so-called kernel which we assume is a complex covariance function defined on $\mathbb S^d\times\mathbb S^d$. We review the appealing properties of such processes, including their specific moment properties, density expressions and simulation procedures. Particularly, we characterize and construct isotropic DPPs models on $\mathbb{S}^d$, where it becomes essential to specify the eigenvalues and eigenfunctions in a spectral representation for the kernel, and we figure out how repulsive isotropic DPPs can be. Moreover, we discuss the shortcomings of adapting existing models for isotropic covariance functions and consider strategies for developing new models, including a useful spectral approach.
△ Less
Submitted 13 July, 2016;
originally announced July 2016.
-
ctsmr - Continuous Time Stochastic Modeling in R
Authors:
Rune Juhl,
Jan Kloppenborg Møller,
Henrik Madsen
Abstract:
ctsmr is an R package providing a general framework for identifying and estimating partially observed continuous-discrete time gray-box models. The estimation is based on maximum likelihood principles and Kalman filtering efficiently implemented in Fortran. This paper briefly demonstrates how to construct a Continuous Time Stochastic Model using multivariate time series data, and how to estimate t…
▽ More
ctsmr is an R package providing a general framework for identifying and estimating partially observed continuous-discrete time gray-box models. The estimation is based on maximum likelihood principles and Kalman filtering efficiently implemented in Fortran. This paper briefly demonstrates how to construct a Continuous Time Stochastic Model using multivariate time series data, and how to estimate the embedded parameters. The setup provides a unique framework for statistical modeling of physical phenomena, and the approach is often called grey box modeling. Finally three examples are provided to demonstrate the capabilities of ctsmr.
△ Less
Submitted 1 June, 2016;
originally announced June 2016.
-
A Unified View of Localized Kernel Learning
Authors:
John Moeller,
Sarathkrishna Swaminathan,
Suresh Venkatasubramanian
Abstract:
Multiple Kernel Learning, or MKL, extends (kernelized) SVM by attempting to learn not only a classifier/regressor but also the best kernel for the training task, usually from a combination of existing kernel functions. Most MKL methods seek the combined kernel that performs best over every training example, sacrificing performance in some areas to seek a global optimum. Localized kernel learning (…
▽ More
Multiple Kernel Learning, or MKL, extends (kernelized) SVM by attempting to learn not only a classifier/regressor but also the best kernel for the training task, usually from a combination of existing kernel functions. Most MKL methods seek the combined kernel that performs best over every training example, sacrificing performance in some areas to seek a global optimum. Localized kernel learning (LKL) overcomes this limitation by allowing the training algorithm to match a component kernel to the examples that can exploit it best. Several approaches to the localized kernel learning problem have been explored in the last several years. We unify many of these approaches under one simple system and design a new algorithm with improved performance. We also develop enhanced versions of existing algorithms, with an eye on scalability and performance.
△ Less
Submitted 4 March, 2016;
originally announced March 2016.
-
Functional summary statistics for point processes on the sphere with an application to determinantal point processes
Authors:
Jesper Møller,
Ege Rubak
Abstract:
We study point processes on $\mathbb S^d$, the $d$-dimensional unit sphere $\mathbb S^d$, considering both the isotropic and the anisotropic case, and focusing mostly on the spherical case $d=2$. The first part studies reduced Palm distributions and functional summary statistics, including nearest neighbour functions, empty space functions, and Ripley's and inhomogeneous $K$-functions. The second…
▽ More
We study point processes on $\mathbb S^d$, the $d$-dimensional unit sphere $\mathbb S^d$, considering both the isotropic and the anisotropic case, and focusing mostly on the spherical case $d=2$. The first part studies reduced Palm distributions and functional summary statistics, including nearest neighbour functions, empty space functions, and Ripley's and inhomogeneous $K$-functions. The second part partly discusses the appealing properties of determinantal point process (DPP) models on the sphere and partly considers the application of functional summary statistics to DPPs. In fact DPPs exhibit repulsiveness, but we also use them together with certain dependent thinnings when constructing point process models on the sphere with aggregation on the large scale and regularity on the small scale. We conclude with a discussion on future work on statistics for spatial point processes on the sphere.
△ Less
Submitted 12 June, 2016; v1 submitted 13 January, 2016;
originally announced January 2016.
-
Modelling aggregation on the large scale and regularity on the small scale in spatial point pattern datasets
Authors:
Frédéric Lavancier,
Jesper Møller
Abstract:
We consider a dependent thinning of a regular point process with the aim of obtaining aggregation on the large scale and regularity on the small scale in the resulting target point process of retained points. Various parametric models for the underlying processes are suggested and the properties of the target point process are studied. Simulation and inference procedures are discussed when a reali…
▽ More
We consider a dependent thinning of a regular point process with the aim of obtaining aggregation on the large scale and regularity on the small scale in the resulting target point process of retained points. Various parametric models for the underlying processes are suggested and the properties of the target point process are studied. Simulation and inference procedures are discussed when a realization of the target point process is observed, depending on whether the thinned points are observed or not. The paper extends previous work by Dietrich Stoyan on interrupted point processes.
△ Less
Submitted 27 May, 2015;
originally announced May 2015.
-
Certifying and removing disparate impact
Authors:
Michael Feldman,
Sorelle Friedler,
John Moeller,
Carlos Scheidegger,
Suresh Venkatasubramanian
Abstract:
What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender, religious practice) and an explicit description of the process.
When th…
▽ More
What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender, religious practice) and an explicit description of the process.
When the process is implemented using computers, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the algorithm, we propose making inferences based on the data the algorithm uses.
We make four contributions to this problem. First, we link the legal notion of disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on analyzing the information leakage of the protected class from the other data attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny.
△ Less
Submitted 15 July, 2015; v1 submitted 11 December, 2014;
originally announced December 2014.
-
Fingerprint Analysis with Marked Point Processes
Authors:
Peter G. M. Forbes,
Steffen Lauritzen,
Jesper Møller
Abstract:
We present a framework for fingerprint matching based on marked point process models. An efficient Monte Carlo algorithm is developed to calculate the marginal likelihood ratio for the hypothesis that two observed prints originate from the same finger against the hypothesis that they originate from different fingers. Our model achieves good performance on an NIST-FBI fingerprint database of 258 ma…
▽ More
We present a framework for fingerprint matching based on marked point process models. An efficient Monte Carlo algorithm is developed to calculate the marginal likelihood ratio for the hypothesis that two observed prints originate from the same finger against the hypothesis that they originate from different fingers. Our model achieves good performance on an NIST-FBI fingerprint database of 258 matched fingerprint pairs.
△ Less
Submitted 22 July, 2014;
originally announced July 2014.
-
Frequentist and Bayesian inference for Gaussian-log-Gaussian wavelet trees, and statistical signal processing applications
Authors:
Robert Dahl Jacobsen,
Jesper Møller
Abstract:
We introduce new estimation methods for a sub-class of the Gaussian scale mixture models for wavelet trees by Wainwright, Simoncelli & Willsky that rely on modern results for composite likelihoods and approximate Bayesian inference. Our methodology is illustrated for denoising and edge detection problems in two-dimensional images.
We introduce new estimation methods for a sub-class of the Gaussian scale mixture models for wavelet trees by Wainwright, Simoncelli & Willsky that rely on modern results for composite likelihoods and approximate Bayesian inference. Our methodology is illustrated for denoising and edge detection problems in two-dimensional images.
△ Less
Submitted 10 August, 2017; v1 submitted 2 May, 2014;
originally announced May 2014.
-
Probabilistic Forecasts of Solar Irradiance by Stochastic Differential Equations
Authors:
Emil B. Iversen,
Juan M. Morales,
Jan K. Møller,
Henrik Madsen
Abstract:
Probabilistic forecasts of renewable energy production provide users with valuable information about the uncertainty associated with the expected generation. Current state-of-the-art forecasts for solar irradiance have focused on producing reliable \emph{point} forecasts. The additional information included in probabilistic forecasts may be paramount for decision makers to efficiently make use of…
▽ More
Probabilistic forecasts of renewable energy production provide users with valuable information about the uncertainty associated with the expected generation. Current state-of-the-art forecasts for solar irradiance have focused on producing reliable \emph{point} forecasts. The additional information included in probabilistic forecasts may be paramount for decision makers to efficiently make use of this uncertain and variable generation. In this paper, a stochastic differential equation (SDE) framework for modeling the uncertainty associated with the solar irradiance point forecast is proposed. This modeling approach allows for characterizing both the interdependence structure of prediction errors of short-term solar irradiance and their predictive distribution. A series of different SDE models are fitted to a training set and subsequently evaluated on a one-year test set. The final model proposed is defined on a bounded and time-varying state space with zero probability almost surely of events outside this space.
△ Less
Submitted 25 October, 2013;
originally announced October 2013.
-
A Geometric Algorithm for Scalable Multiple Kernel Learning
Authors:
John Moeller,
Parasaran Raman,
Avishek Saha,
Suresh Venkatasubramanian
Abstract:
We present a geometric formulation of the Multiple Kernel Learning (MKL) problem. To do so, we reinterpret the problem of learning kernel weights as searching for a kernel that maximizes the minimum (kernel) distance between two convex polytopes. This interpretation combined with novel structural insights from our geometric formulation allows us to reduce the MKL problem to a simple optimization r…
▽ More
We present a geometric formulation of the Multiple Kernel Learning (MKL) problem. To do so, we reinterpret the problem of learning kernel weights as searching for a kernel that maximizes the minimum (kernel) distance between two convex polytopes. This interpretation combined with novel structural insights from our geometric formulation allows us to reduce the MKL problem to a simple optimization routine that yields provable convergence as well as quality guarantees. As a result our method scales efficiently to much larger data sets than most prior methods can handle. Empirical evaluation on eleven datasets shows that we are significantly faster and even compare favorably with a uniform unweighted combination of kernels.
△ Less
Submitted 15 March, 2014; v1 submitted 25 June, 2012;
originally announced June 2012.
-
Score, Pseudo-Score and Residual Diagnostics for Spatial Point Process Models
Authors:
Adrian Baddeley,
Ege Rubak,
Jesper Møller
Abstract:
We develop new tools for formal inference and informal model validation in the analysis of spatial point pattern data. The score test is generalized to a "pseudo-score" test derived from Besag's pseudo-likelihood, and to a class of diagnostics based on point process residuals. The results lend theoretical support to the established practice of using functional summary statistics, such as Ripley's…
▽ More
We develop new tools for formal inference and informal model validation in the analysis of spatial point pattern data. The score test is generalized to a "pseudo-score" test derived from Besag's pseudo-likelihood, and to a class of diagnostics based on point process residuals. The results lend theoretical support to the established practice of using functional summary statistics, such as Ripley's $K$-function, when testing for complete spatial randomness; and they provide new tools such as the compensator of the $K$-function for testing other fitted models. The results also support localization methods such as the scan statistic and smoothed residual plots. Software for computing the diagnostics is provided.
△ Less
Submitted 17 May, 2012;
originally announced May 2012.