Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 81 results for author: Mengersen, K

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.19591  [pdf, other

    stat.AP q-bio.PE

    Mathematical modelling and uncertainty quantification for analysis of biphasic coral reef recovery patterns

    Authors: David J. Warne, Kerryn Crossman, Grace E. M. Heron, Jesse A. Sharp, Wang Jin, Paul Pao-Yen Wu, Matthew J. Simpson, Kerrie Mengersen, Juan-Carlos Ortiz

    Abstract: Coral reefs are increasingly subjected to major disturbances threatening the health of marine ecosystems. Substantial research underway to develop intervention strategies that assist reefs in recovery from, and resistance to, inevitable future climate and weather extremes. To assess potential benefits of interventions, mechanistic understanding of coral reef recovery and resistance patterns is ess… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    MSC Class: 62P12 (Primary)

  2. arXiv:2405.16055  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Federated Learning for Non-factorizable Models using Deep Generative Prior Approximations

    Authors: Conor Hassan, Joshua J Bon, Elizaveta Semenova, Antonietta Mira, Kerrie Mengersen

    Abstract: Federated learning (FL) allows for collaborative model training across decentralized clients while preserving privacy by avoiding data sharing. However, current FL methods assume conditional independence between client models, limiting the use of priors that capture dependence, such as Gaussian processes (GPs). We introduce the Structured Independence via deep Generative Model Approximation (SIGMA… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 25 pages, 7 figures, 2 tables

  3. arXiv:2405.04043  [pdf, other

    stat.CO cs.LG stat.ME stat.ML

    Scalable Vertical Federated Learning via Data Augmentation and Amortized Inference

    Authors: Conor Hassan, Matthew Sutton, Antonietta Mira, Kerrie Mengersen

    Abstract: Vertical federated learning (VFL) has emerged as a paradigm for collaborative model estimation across multiple clients, each holding a distinct set of covariates. This paper introduces the first comprehensive framework for fitting Bayesian models in the VFL setting. We propose a novel approach that leverages data augmentation techniques to transform VFL problems into a form compatible with existin… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 30 pages, 5 figures, 3 tables

  4. arXiv:2404.12657  [pdf, other

    stat.AP cs.CR

    Proposer selection in EIP-7251

    Authors: Sandra Johnson, Kerrie Mengersen, Patrick O'Callaghan, Anders L. Madsen

    Abstract: Immediate settlement, or single-slot finality (SSF), is a long-term goal for Ethereum. The growing active validator set size is placing an increasing computational burden on the network, making SSF more challenging. EIP-7251 aims to reduce the number of validators by giving stakers the option to merge existing validators. Key to the success of this proposal therefore is whether stakers choose to m… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 15 pages

    MSC Class: 62-06 ACM Class: G.3

  5. arXiv:2403.14954  [pdf, other

    stat.ME stat.AP

    Creating a Spatial Vulnerability Index for Environmental Health

    Authors: Aiden Price, Kerrie Mengersen, Michael Rigby, Paula FiƩvez

    Abstract: Extreme natural hazards are increasing in frequency and intensity. These natural changes in our environment, combined with man-made pollution, have substantial economic, social and health impacts globally. The impact of the environment on human health (environmental health) is becoming well understood in international research literature. However, there are significant barriers to understanding ke… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  6. arXiv:2403.13076  [pdf, other

    stat.ME

    Spatial Autoregressive Model on a Dirichlet Distribution

    Authors: Teo Nguyen, Sarat Moka, Kerrie Mengersen, Benoit Liquet

    Abstract: Compositional data find broad application across diverse fields due to their efficacy in representing proportions or percentages of various components within a whole. Spatial dependencies often exist in compositional data, particularly when the data represents different land uses or ecological variables. Ignoring the spatial autocorrelations in modelling of compositional data may lead to incorrect… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 33 pages, 2 figures, submitted to "Computational Statistics & Data Analysis"

  7. arXiv:2403.10791  [pdf, other

    stat.AP

    Bayesian Design for Sampling Anomalous Spatio-Temporal Data

    Authors: Katie Buchhorn, Kerrie Mengersen, Edgar Santos-Fernandez, James McGree

    Abstract: Data collected from arrays of sensors are essential for informed decision-making in various systems. However, the presence of anomalies can compromise the accuracy and reliability of insights drawn from the collected data or information obtained via statistical analysis. This study aims to develop a robust Bayesian optimal experimental design (BOED) framework with anomaly detection methods for hig… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  8. arXiv:2403.08127  [pdf

    cs.DB physics.data-an stat.OT

    Guidelines for the Creation of Analysis Ready Data

    Authors: Harriette Phillips, Aiden Price, Owen Forbes, Claire Boulange, Kerrie Mengersen, Marketa Reeves, Rebecca Glauert

    Abstract: Globally, there is an increased need for guidelines to produce high-quality data outputs for analysis. No framework currently exists that provides guidelines for a comprehensive approach to producing analysis ready data (ARD). Through critically reviewing and summarising current literature, this paper proposes such guidelines for the creation of ARD. The guidelines proposed in this paper inform te… ▽ More

    Submitted 29 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 49 pages, 3 figures, 3 tables, and 5 appendices

  9. arXiv:2403.00319  [pdf, other

    stat.AP

    Creating area level indices of behaviours impacting cancer in Australia with a Bayesian generalised shared component model

    Authors: James Hogg, Susanna Cramb, Jessica Cameron, Peter Baade, Kerrie Mengersen

    Abstract: This study develops a model-based index creation approach called the Generalized Shared Component Model (GSCM) by drawing on the large field of factor models. The proposed fully Bayesian approach accommodates heteroscedastic model error, multiple shared factors and flexible spatial priors. Moreover, our model, unlike previous index approaches, provides indices with uncertainty. Focusing on Austral… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: Submitted to Health and Place

  10. arXiv:2311.12349  [pdf, other

    stat.AP

    Spatial Non-parametric Bayesian Clustered Coefficients

    Authors: Wala Draidi Areed, Aiden Price, Helen Thompson, Reid Malseed, Kerrie Mengersen

    Abstract: In the field of population health research, understanding the similarities between geographical areas and quantifying their shared effects on health outcomes is crucial. In this paper, we synthesise a number of existing methods to create a new approach that specifically addresses this goal. The approach is called a Bayesian spatial Dirichlet process clustered heterogeneous regression model. This n… ▽ More

    Submitted 22 November, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

  11. arXiv:2311.12347  [pdf, other

    stat.ME stat.AP

    Bayesian Cluster Geographically Weighted Regression for Spatial Heterogeneous Data

    Authors: Wala Draidi Areed, Aiden Price, Helen Thompson, Conor Hassan, Reid Malseed, Kerrie Mengersen

    Abstract: Spatial statistical models are commonly used in geographical scenarios to ensure spatial variation is captured effectively. However, spatial models and cluster algorithms can be complicated and expensive. This paper pursues three main objectives. First, it introduces covariate effect clustering by integrating a Bayesian Geographically Weighted Regression (BGWR) with a Gaussian mixture model and th… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  12. Mapping the prevalence of cancer risk factors at the small area level in Australia

    Authors: James Hogg, Jessica Cameron, Susanna Cramb, Peter Baade, Kerrie Mengersen

    Abstract: Cancer is a significant health issue globally and it is well known that cancer risk varies geographically. However in many countries there are no small area level data on cancer risk factors with high resolution and complete reach, which hinders the development of targeted prevention strategies. Using Australia as a case study, the 2017-2018 National Health Survey was used to generate prevalence e… ▽ More

    Submitted 22 October, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: Submitted to the International Journal of Health Geographics

  13. arXiv:2308.03970  [pdf, other

    cs.DS stat.ML

    Dependent Cluster Mapping (DCMAP): Optimal clustering of directed acyclic graphs for statistical inference

    Authors: Paul Pao-Yen Wu, Fabrizio Ruggeri, Kerrie Mengersen

    Abstract: A Directed Acyclic Graph (DAG) can be partitioned or mapped into clusters to support and make inference more computationally efficient in Bayesian Network (BN), Markov process and other models. However, optimal partitioning with an arbitrary cost function is challenging, especially in statistical inference as the local cluster cost is dependent on both nodes within a cluster, and the mapping of cl… ▽ More

    Submitted 7 February, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

  14. arXiv:2307.15424  [pdf, ps, other

    cs.LG stat.AP stat.CO stat.ML

    Deep Generative Models, Synthetic Tabular Data, and Differential Privacy: An Overview and Synthesis

    Authors: Conor Hassan, Robert Salomone, Kerrie Mengersen

    Abstract: This article provides a comprehensive synthesis of the recent developments in synthetic data generation via deep generative models, focusing on tabular datasets. We specifically outline the importance of synthetic data generation in the context of privacy-sensitive data. Additionally, we highlight the advantages of using deep generative models over other methods and provide a detailed explanation… ▽ More

    Submitted 27 August, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

  15. arXiv:2306.11302  [pdf, other

    stat.ME stat.AP

    A Two-Stage Bayesian Small Area Estimation Approach for Proportions

    Authors: James Hogg, Jessica Cameron, Susanna Cramb, Peter Baade, Kerrie Mengersen

    Abstract: With the rise in popularity of digital Atlases to communicate spatial variation, there is an increasing need for robust small-area estimates. However, current small-area estimation methods suffer from various modeling problems when data are very sparse or when estimates are required for areas with very small populations. These issues are particularly heightened when modeling proportions. Additiona… ▽ More

    Submitted 4 December, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Currently in second round of review at the International Statistical Review

  16. arXiv:2306.01278  [pdf, ps, other

    math.ST stat.CO

    The Fisher Geometry and Geodesics of the Multivariate Normals, without Differential Geometry

    Authors: Brodie A. J. Lawson, Kevin Burrage, Kerrie Mengersen, Rodrigo Weber dos Santos

    Abstract: Choosing the Fisher information as the metric tensor for a Riemannian manifold provides a powerful yet fundamental way to understand statistical distribution families. Distances along this manifold become a compelling measure of statistical distance, and paths of shorter distance improve sampling techniques that leverage a sequence of distributions in their operation. Unfortunately, even for a dis… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: 22 pages, 8 figures, further figures and algorithms in supplement

    MSC Class: 62B11 (primary); 62-02; 62-08 (secondary) ACM Class: G.3

  17. arXiv:2305.15746  [pdf, other

    stat.ML cs.LG

    Assessing the Spatial Structure of the Association between Attendance at Preschool and Childrens Developmental Vulnerabilities in Queensland Australia

    Authors: wala Draidi Areed, Aiden Price, Kathryn Arnett, Helen Thompson, Reid Malseed, Kerrie Mengersen

    Abstract: The research explores the influence of preschool attendance (one year before full-time school) on the development of children during their first year of school. Using data collected by the Australian Early Development Census, the findings show that areas with high proportions of preschool attendance tended to have lower proportions of children with at least one developmental vulnerability. Develop… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  18. arXiv:2305.12651  [pdf, other

    stat.ME stat.AP stat.CO

    Conditional normalization in time series analysis

    Authors: Puwasala Gamakumara, Edgar Santos-Fernandez, Priyanga Dilini Talagala, Rob J. Hyndman, Kerrie Mengersen, Catherine Leigh

    Abstract: Time series often reflect variation associated with other related variables. Controlling for the effect of these variables is useful when modeling or analysing the time series. We introduce a novel approach to normalize time series data conditional on a set of covariates. We do this by modeling the conditional mean and the conditional variance of the time series with generalized additive models us… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

    Comments: 36 pages, 26 Figures, Journal Article

  19. arXiv:2305.01144  [pdf, other

    stat.AP

    Increasing trust in new data sources: crowdsourcing image classification for ecology

    Authors: Edgar Santos-Fernandez, Julie Vercelloni, Aiden Price, Grace Heron, Bryce Christensen, Erin E. Peterson, Kerrie Mengersen

    Abstract: Crowdsourcing methods facilitate the production of scientific information by non-experts. This form of citizen science (CS) is becoming a key source of complementary data in many fields to inform data-driven decisions and study challenging problems. However, concerns about the validity of these data often constrain their utility. In this paper, we focus on the use of citizen science data in addres… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: 25 pages, 10 figures

  20. arXiv:2304.09367  [pdf, other

    cs.LG stat.AP

    Graph Neural Network-Based Anomaly Detection for River Network Systems

    Authors: Katie Buchhorn, Edgar Santos-Fernandez, Kerrie Mengersen, Robert Salomone

    Abstract: Water is the lifeblood of river networks, and its quality plays a crucial role in sustaining both aquatic ecosystems and human societies. Real-time monitoring of water quality is increasingly reliant on in-situ sensor technology. Anomaly detection is crucial for identifying erroneous patterns in sensor data, but can be a challenging task due to the complexity and variability of the data, even unde… ▽ More

    Submitted 31 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

  21. arXiv:2302.08724  [pdf, other

    stat.ML cs.LG stat.OT

    Piecewise Deterministic Markov Processes for Bayesian Neural Networks

    Authors: Ethan Goan, Dimitri Perrin, Kerrie Mengersen, Clinton Fookes

    Abstract: Inference on modern Bayesian Neural Networks (BNNs) often relies on a variational inference treatment, imposing violated assumptions of independence and the form of the posterior. Traditional MCMC approaches avoid these assumptions at the cost of increased computation due to its incompatibility to subsampling of the likelihood. New Piecewise Deterministic Markov Process (PDMP) samplers permit subs… ▽ More

    Submitted 19 October, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: Includes correction to software and corrigendum note

  22. arXiv:2302.03314  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Federated Variational Inference Methods for Structured Latent Variable Models

    Authors: Conor Hassan, Robert Salomone, Kerrie Mengersen

    Abstract: Federated learning methods enable model training across distributed data sources without data leaving their original locations and have gained increasing interest in various fields. However, existing approaches are limited, excluding many structured probabilistic models. We present a general and elegant solution based on structured variational inference, widely used in Bayesian machine learning, a… ▽ More

    Submitted 7 July, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

  23. Where are the vulnerable children? Identification and comparison of clusters of young children with health and developmental vulnerabilities across Queensland

    Authors: Wala Draidi Areed, Aiden Price, Kathryn Arnett, Kerrie Mengersen, Helen Thompson

    Abstract: This study aimed to better understand the vulnerability of 5 to 6 year old children in their first year of school, based on five health and development domains. Identification of subgroups of children within these domains can lead to more targeted research and policies to reduce these vulnerabilities. The study focused on finding clusters of geographical regions with high and low proportions of vu… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

  24. arXiv:2211.10029  [pdf, other

    stat.AP

    Being Bayesian in the 2020s: opportunities and challenges in the practice of modern applied Bayesian statistics

    Authors: Joshua J. Bon, Adam Bretherton, Katie Buchhorn, Susanna Cramb, Christopher Drovandi, Conor Hassan, Adrianne L. Jenner, Helen J. Mayfield, James M. McGree, Kerrie Mengersen, Aiden Price, Robert Salomone, Edgar Santos-Fernandez, Julie Vercelloni, Xiaoyu Wang

    Abstract: Building on a strong foundation of philosophy, theory, methods and computation over the past three decades, Bayesian approaches are now an integral part of the toolkit for most statisticians and data scientists. Whether they are dedicated Bayesians or opportunistic users, applied professionals can now reap many of the benefits afforded by the Bayesian paradigm. In this paper, we touch on six moder… ▽ More

    Submitted 17 January, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: 27 pages, 8 figures

  25. arXiv:2209.04117  [pdf, other

    stat.ME stat.AP stat.ML

    clusterBMA: Bayesian model averaging for clustering

    Authors: Owen Forbes, Edgar Santos-Fernandez, Paul Pao-Yen Wu, Hong-Bo Xie, Paul E. Schwenn, Jim Lagopoulos, Lia Mills, Dashiell D. Sacks, Daniel F. Hermens, Kerrie Mengersen

    Abstract: Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and… ▽ More

    Submitted 25 March, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

  26. arXiv:2208.02921  [pdf, other

    stat.ME stat.AP stat.CO

    A flexible, random histogram kernel for discrete-time Hawkes processes

    Authors: Raiha Browning, Judith Rousseau, Kerrie Mengersen

    Abstract: Hawkes processes are a self-exciting stochastic process used to describe phenomena whereby past events increase the probability of the occurrence of future events. This work presents a flexible approach for modelling a variant of these, namely discrete-time Hawkes processes. Most standard models of Hawkes processes rely on a parametric form for the function describing the influence of past events,… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

    MSC Class: 60G55; 62F15

  27. arXiv:2206.05369  [pdf, other

    stat.ME stat.AP

    Bayesian Design with Sampling Windows for Complex Spatial Processes

    Authors: Katie Buchhorn, Kerrie Mengersen, Edgar Santos-Fernandez, Erin E. Peterson, James M. McGree

    Abstract: Optimal design facilitates intelligent data collection. In this paper, we introduce a fully Bayesian design approach for spatial processes with complex covariance structures, like those typically exhibited in natural ecosystems. Coordinate Exchange algorithms are commonly used to find optimal design points. However, collecting data at specific points is often infeasible in practice. Currently, the… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

  28. Analysis of sloppiness in model simulations: unveiling parameter uncertainty when mathematical models are fitted to data

    Authors: Gloria M. Monsalve-Bravo, Brodie A. J. Lawson, Christopher Drovandi, Kevin Burrage, Kevin S. Brown, Christopher M. Baker, Sarah A. Vollert, Kerrie Mengersen, Eve McDonald-Madden, Matthew P. Adams

    Abstract: This work introduces a comprehensive approach to assess the sensitivity of model outputs to changes in parameter values, constrained by the combination of prior beliefs and data. This novel approach identifies stiff parameter combinations strongly affecting the quality of the model-data fit while simultaneously revealing which of these key parameter combinations are informed primarily by the data… ▽ More

    Submitted 21 September, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Journal ref: Sci.Adv. 8(38) eabm5952 (2022)

  29. Stateful to Stateless: Modelling Stateless Ethereum

    Authors: Sandra Johnson, David Hyland-Wood, Anders L Madsen, Kerrie Mengersen

    Abstract: The concept of 'Stateless Ethereum' was conceived with the primary aim of mitigating Ethereum's unbounded state growth. The key facilitator of Stateless Ethereum is through the introduction of 'witnesses' into the ecosystem. The changes and potential consequences that these additional data packets pose on the network need to be identified and analysed to ensure that the Ethereum ecosystem can cont… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: In Proceedings MARS 2022, arXiv:2203.09299

    Journal ref: EPTCS 355, 2022, pp. 27-39

  30. arXiv:2203.04165  [pdf, other

    stat.AP stat.CO stat.ML

    On the intrinsic dimensionality of Covid-19 data: a global perspective

    Authors: Abhishek Varghese, Edgar Santos-Fernandez, Francesco Denti, Antonietta Mira, Kerrie Mengersen

    Abstract: This paper aims to develop a global perspective of the complexity of the relationship between the standardised per-capita growth rate of Covid-19 cases, deaths, and the OxCGRT Covid-19 Stringency Index, a measure describing a country's stringency of lockdown policies. To achieve our goal, we use a heterogeneous intrinsic dimension estimator implemented as a Bayesian mixture model, called Hidalgo.… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    MSC Class: 62P10

  31. arXiv:2202.07166  [pdf, other

    stat.CO stat.ME

    SSNbayes: An R package for Bayesian spatio-temporal modelling on stream networks

    Authors: Edgar Santos-Fernandez, Jay M. Ver Hoef, James M. McGree, Daniel J. Isaak, Kerrie Mengersen, Erin E. Peterson

    Abstract: Spatio-temporal models are widely used in many research areas from ecology to epidemiology. However, most covariance functions describe spatial relationships based on Euclidean distance only. In this paper, we introduce the R package SSNbayes for fitting Bayesian spatio-temporal models and making predictions on branching stream networks. SSNbayes provides a linear regression framework with multipl… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  32. arXiv:2107.12592  [pdf, other

    cs.CR stat.ME

    Detection of cybersecurity attacks through analysis of web browsing activities using principal component analysis

    Authors: Insha Ullah, Kerrie Mengersen, Rob J Hyndman, James McGree

    Abstract: Organizations such as government departments and financial institutions provide online service facilities accessible via an increasing number of internet connected devices which make their operational environment vulnerable to cyber attacks. Consequently, there is a need to have mechanisms in place to detect cyber security attacks in a timely manner. A variety of Network Intrusion Detection System… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

  33. Understanding links between water-quality variables and nitrate concentration in freshwater streams using high-frequency sensor data

    Authors: Claire Kermorvant, Benoit Liquet, Guy Litt, Kerrie Mengersen, Erin Peterson, Rob Hyndman, Jeremy B. Jones Jr., Catherine Leigh

    Abstract: Real time monitoring using in situ sensors is becoming a common approach for measuring water quality within watersheds. High frequency measurements produce big data sets that present opportunities to conduct new analyses for improved understanding of water quality dynamics and more effective management of rivers and streams. Of primary importance is enhancing knowledge of the relationships between… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

    Comments: 4 figures, 17 pages

    MSC Class: I.2.7 ACM Class: F.2.2

  34. arXiv:2105.02140  [pdf, other

    stat.AP stat.CO stat.ME

    A Bayesian latent allocation model for clustering compositional data with application to the Great Barrier Reef

    Authors: Luiza Piancastelli, Nial Friel, Julie Vercelloni, Kerrie Mengersen, Antonietta Mira

    Abstract: Relative abundance is a common metric to estimate the composition of species in ecological surveys reflecting patterns of commonness and rarity of biological assemblages. Measurements of coral reef compositions formed by four communities along Australia's Great Barrier Reef (GBR) gathered between 2012 and 2017 are the focus of this paper. We undertake the task of finding clusters of transect locat… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: Paper submitted for publication

  35. Spatio-temporal quantile regression analysis revealing more nuanced patterns of climate change: a study of long-term daily temperature in Australia

    Authors: Qibin Duan, Clare A. McGrory, Glenn Brown, Kerrie Mengersen, You-Gan Wang

    Abstract: Climate change is commonly associated with an overall increase in mean temperature in a defined past time period. Many studies consider temperature trends at the global scale, but the literature is lacking in in-depth analysis of the temperature trends across Australia in recent decades. In addition to heterogeneity in mean and median values, daily Australia temperature data suffers from quasi-per… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

    Comments: 30 pages, 10 figures, and 3 tables

  36. Bayesian spatio-temporal models for stream networks

    Authors: Edgar Santos-Fernandez, Jay M. Ver Hoef, Erin E. Peterson, James McGree, Daniel Isaak, Kerrie Mengersen

    Abstract: Spatio-temporal models are widely used in many research areas including ecology. The recent proliferation of the use of in-situ sensors in streams and rivers supports space-time water quality modelling and monitoring in near real-time. A new family of spatio-temporal models is introduced. These models incorporate spatial dependence using stream distance while temporal autocorrelation is captured u… ▽ More

    Submitted 14 February, 2022; v1 submitted 5 March, 2021; originally announced March 2021.

    Comments: 30 pages, 10 figs

  37. arXiv:2011.08407  [pdf, other

    stat.AP

    A statistical machine learning approach for benchmarking in the presence of complex contextual factors and peer groups

    Authors: Daniel W. Kennedy, Jessica Cameron, Paul P. -Y. Wu, Kerrie Mengersen

    Abstract: The ability to compare between individuals or organisations fairly is important for the development of robust and meaningful quantitative benchmarks. To make fair comparisons, contextual factors must be taken into account, and comparisons should only be made between similar organisations such as peer groups. Previous benchmarking methods have used linear regression to adjust for contextual factors… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

    Comments: 18 pages, 8 figures

  38. Peer groups for organisational learning: clustering with practical constraints

    Authors: Daniel William Kennedy, Jessica Cameron, Paul Pao-Yen Wu, Kerrie Mengersen

    Abstract: Peer-grouping is used in many sectors for organisational learning, policy implementation, and benchmarking. Clustering provides a statistical, data-driven method for constructing meaningful peer groups, but peer groups must be compatible with business constraints such as size and stability considerations. Additionally, statistical peer groups are constructed from many different variables, and can… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

    Comments: 22 pages, 4 figures

  39. A Survey of Bayesian Statistical Approaches for Big Data

    Authors: Farzana Jahan, Insha Ullah, Kerrie L Mengersen

    Abstract: The modern era is characterised as an era of information or Big Data. This has motivated a huge literature on new methods for extracting information and insights from these data. A natural question is how these approaches differ from those that were available prior to the advent of Big Data. We present a review of published studies that present Bayesian statistical approaches specifically for Big… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    MSC Class: 62-08; 97K70; 97K80 ACM Class: G.3

    Journal ref: In Mengersen K., Pudlo P., Robert C. (2020) Case Studies in Applied Bayesian Data Science. Lecture Notes in Mathematics, vol 2259. (pp. 17-44) Springer, Cham

  40. arXiv:2006.00741  [pdf, other

    stat.AP stat.OT

    Correcting misclassification errors in crowdsourced ecological data: A Bayesian perspective

    Authors: Edgar Santos-Fernandez, Erin E. Peterson, Julie Vercelloni, Em Rushworth, Kerrie Mengersen

    Abstract: Many research domains use data elicited from "citizen scientists" when a direct measure of a process is expensive or infeasible. However, participants may report incorrect estimates or classifications due to their lack of skill. We demonstrate how Bayesian hierarchical models can be used to learn about latent variables of interest, while accounting for the participants' abilities. The model is des… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

    Comments: 18 figures, 5 tables

  41. arXiv:2004.04620  [pdf, ps, other

    stat.CO

    Bayesian Computation with Intractable Likelihoods

    Authors: Matthew T. Moores, Anthony N. Pettitt, Kerrie Mengersen

    Abstract: This article surveys computational methods for posterior inference with intractable likelihoods, that is where the likelihood function is unavailable in closed form, or where evaluation of the likelihood is infeasible. We review recent developments in pseudo-marginal methods, approximate Bayesian computation (ABC), the exchange algorithm, thermodynamic integration, and composite likelihood, paying… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

    Comments: arXiv admin note: text overlap with arXiv:1503.08066

    MSC Class: 62F15; 62M40

  42. arXiv:2003.06966  [pdf, other

    stat.AP

    Bayesian item response models for citizen science ecological data

    Authors: Edgar Santos-Fernandez, Kerrie Mengersen

    Abstract: So-called 'citizen science' data elicited from crowds has become increasingly popular in many fields including ecology. However, the quality of this information is being frequently debated by many within the scientific community. Therefore, modern citizen science implementations require measures of the users' proficiency that account for the difficulty of the tasks. We introduce a new methodologic… ▽ More

    Submitted 25 May, 2020; v1 submitted 15 March, 2020; originally announced March 2020.

    Comments: under review, 24 pages, 10 figures

  43. arXiv:2003.06291  [pdf

    stat.CO cs.DB stat.AP

    Improved assessment of the accuracy of record linkage via an extended MaCSim approach

    Authors: Shovanur Haque, Kerrie Mengersen

    Abstract: Record linkage is the process of bringing together the same entity from overlapping data sources while removing duplicates. Huge amounts of data are now being collected by public or private organizations as well as by researchers and individuals. Linking and analysing relevant information from this massive data reservoir can provide new insights into society. However, this increase in the amount o… ▽ More

    Submitted 12 October, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: 32 pages, 4 figures. arXiv admin note: text overlap with arXiv:1901.04779

  44. arXiv:2003.05686  [pdf

    stat.CO cs.DB stat.AP

    Assessing the accuracy of individual link with varying block sizes and cut-off values using MaCSim approach

    Authors: Shovanur Haque, Kerrie Mengersen

    Abstract: Record linkage is the process of matching together records from different data sources that belong to the same entity. Record linkage is increasingly being used by many organizations including statistical, health, government etc. to link administrative, survey, and other files to create a robust file for more comprehensive analysis. Therefore, it becomes necessary to assess the ability of a linkin… ▽ More

    Submitted 23 November, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: 24 pages, 6 figures. arXiv admin note: text overlap with arXiv:1901.04779

  45. arXiv:2002.04148  [pdf, other

    stat.AP

    The role of intrinsic dimension in high-resolution player tracking data -- Insights in basketball

    Authors: Edgar Santos-Fernandez, Francesco Denti, Kerrie Mengersen, Antonietta Mira

    Abstract: A new range of statistical analysis has emerged in sports after the introduction of the high-resolution player tracking technology, specifically in basketball. However, this high dimensional data is often challenging for statistical inference and decision making. In this article, we employ Hidalgo, a state-of-the-art Bayesian mixture model that allows the estimation of heterogeneous intrinsic dime… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

    Comments: 21 pages, 16 figures, Codes + data + results can be found in https://github.com/EdgarSantos-Fernandez/id_basketball, Submitted

  46. arXiv:1910.14227  [pdf, other

    stat.CO stat.ME

    Combined parameter and state inference with automatically calibrated ABC

    Authors: Anthony Ebert, Pierre Pudlo, Kerrie Mengersen, Paul Wu, Christopher Drovandi

    Abstract: State space models contain time-indexed parameters, termed states, as well as static parameters, simply termed parameters. The problem of inferring both static parameters as well as states simultaneously, based on time-indexed observations, is the subject of much recent literature. This problem is compounded once we consider models with intractable likelihoods. In these situations, some emerging a… ▽ More

    Submitted 26 May, 2021; v1 submitted 30 October, 2019; originally announced October 2019.

  47. arXiv:1910.02379  [pdf, other

    stat.AP

    Factors associated with injurious from falls in people with early stage Parkinson's disease

    Authors: Sarini Abdullah, James McGree, Nicole White, Kerrie Mengersen, Graham Kerr

    Abstract: Falls are common in people with Parkinson's disease (PD) and have detrimental effects which can lower the quality of life. While studies have been conducted to learn about falling in general, factors distinguishing injurious from non-injurious falls are less clear. We develop a two-stage Bayesian logistic regression model was used to model the association of falls and injurious falls with data mea… ▽ More

    Submitted 6 October, 2019; originally announced October 2019.

    Comments: 18 pages, 3 figures, 4 tables

    MSC Class: 62P10; 62-07; 62J12 ACM Class: J.3.2; G.3.2; G.3.6

  48. arXiv:1910.01864  [pdf, other

    stat.AP

    Profile regression for subgrouping patients with early stage Parkinson's disease

    Authors: Sarini Abdullah, James McGree, Nicole White, Kerrie Mengersen, Graham Kerr

    Abstract: Falls are detrimental to people with Parkinson's Disease (PD) because of the potentially severe consequences to the patients' quality of life. While many studies have attempted to predict falls/non-falls, this study aimed to determine factors related to falls frequency in people with early PD. Ninety nine participants with early stage PD were assessed based on two types of tests. The first type of… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: 30 pages, 11 figures, 4 tables

    MSC Class: 62-07; 62P10; 62H30 ACM Class: G.3.6; G.3.14; J.3.2

  49. arXiv:1910.01313  [pdf, other

    stat.AP

    Assessing the predictive ability of the UPDRS for falls classification in early stage Parkinson's disease

    Authors: Sarini Abdullah, Nicole White, James McGree, Kerrie Mengersen, Graham Kerr

    Abstract: Identification of risk factors associated with falls in people with Parkinson's Disease (PD) is important due to their high risk of falling. In this study, various ways of utilizing the Unified Parkinson's Disease Rating Scale (UPDRS) were assessed for the identification of risk factors and for the prediction of falls. Three statistical methods for classification were considered:decision trees, ra… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

    Comments: 29 pages, 7 figures, 5 tables

    MSC Class: 62P10; 62-07; 62H30 ACM Class: G.3.6; G.3.7; J.3.2

  50. Estimating a novel stochastic model for within-field disease dynamics of banana bunchy top virus via approximate Bayesian computation

    Authors: Abhishek Varghese, Christopher Drovandi, Kerrie Mengersen, Antonietta Mira

    Abstract: The Banana Bunchy Top Virus (BBTV) is one of the most economically important vector-borne banana diseases throughout the Asia-Pacific Basin and presents a significant challenge to the agricultural sector. Current models of BBTV are largely deterministic, limited by an incomplete understanding of interactions in complex natural systems, and the appropriate identification of parameters. A stochastic… ▽ More

    Submitted 16 March, 2020; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: 40 pages, 16 figures

    MSC Class: 62P12