Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–6 of 6 results for author: Wilde, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2211.11540  [pdf, other

    cs.CR

    A Framework for Auditable Synthetic Data Generation

    Authors: Florimond Houssiau, Samuel N. Cohen, Lukasz Szpruch, Owen Daniel, Michaela G. Lawrence, Robin Mitra, Henry Wilde, Callum Mole

    Abstract: Synthetic data has gained significant momentum thanks to sophisticated machine learning tools that enable the synthesis of high-dimensional datasets. However, many generation techniques do not give the data controller control over what statistical patterns are captured, leading to concerns over privacy protection. While synthetic records are not linked to a particular real-world individual, they c… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  2. arXiv:2203.01363  [pdf, other

    cs.LG stat.AP

    Faking feature importance: A cautionary tale on the use of differentially-private synthetic data

    Authors: Oscar Giles, Kasra Hosseini, Grigorios Mingas, Oliver Strickson, Louise Bowler, Camila Rangel Smith, Harrison Wilde, Jen Ning Lim, Bilal Mateen, Kasun Amarasinghe, Rayid Ghani, Alison Heppenstall, Nik Lomax, Nick Malleson, Martin O'Reilly, Sebastian Vollmerteke

    Abstract: Synthetic datasets are often presented as a silver-bullet solution to the problem of privacy-preserving data publishing. However, for many applications, synthetic data has been shown to have limited utility when used to train predictive models. One promising potential application of these data is in the exploratory phase of the machine learning workflow, which involves understanding, engineering a… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

    Comments: 27 pages, 8 figures

  3. arXiv:2108.10934  [pdf, other

    stat.ML cs.CR cs.LG

    Mitigating Statistical Bias within Differentially Private Synthetic Data

    Authors: Sahra Ghalebikesabi, Harrison Wilde, Jack Jewson, Arnaud Doucet, Sebastian Vollmer, Chris Holmes

    Abstract: Increasing interest in privacy-preserving machine learning has led to new and evolved approaches for generating private synthetic data from undisclosed real data. However, mechanisms of privacy preservation can significantly reduce the utility of synthetic data, which in turn impacts downstream tasks such as learning predictive models or inference. We propose several re-weighting strategies using… ▽ More

    Submitted 19 May, 2022; v1 submitted 24 August, 2021; originally announced August 2021.

  4. arXiv:2011.08299  [pdf, other

    cs.LG stat.AP stat.ME stat.ML

    Foundations of Bayesian Learning from Synthetic Data

    Authors: Harrison Wilde, Jack Jewson, Sebastian Vollmer, Chris Holmes

    Abstract: There is significant growth and interest in the use of synthetic data as an enabler for machine learning in environments where the release of real data is restricted due to privacy or availability constraints. Despite a large number of methods for synthetic data generation, there are comparatively few results on the statistical properties of models learnt on synthetic data, and fewer still for sit… ▽ More

    Submitted 24 November, 2020; v1 submitted 16 November, 2020; originally announced November 2020.

    Comments: 43 pages (10 main text, 33 supplement), 32 figures (4 main text, 28 supplement)

  5. arXiv:2002.02701  [pdf, other

    cs.LG cs.GT stat.ML

    A novel initialisation based on hospital-resident assignment for the k-modes algorithm

    Authors: Henry Wilde, Vincent Knight, Jonathan Gillard

    Abstract: This paper presents a new way of selecting an initial solution for the k-modes algorithm that allows for a notion of mathematical fairness and a leverage of the data that the common initialisations from literature do not. The method, which utilises the Hospital-Resident Assignment Problem to find the set of initial cluster centroids, is compared with the current initialisations on both benchmark d… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

    Comments: 23 pages, 11 figures (31 panels)

  6. arXiv:1907.13508  [pdf, other

    cs.DS cs.NE

    Evolutionary Dataset Optimisation: learning algorithm quality through evolution

    Authors: Henry Wilde, Vincent Knight, Jonathan Gillard

    Abstract: In this paper we propose a novel method for learning how algorithms perform. Classically, algorithms are compared on a finite number of existing (or newly simulated) benchmark datasets based on some fixed metrics. The algorithm(s) with the smallest value of this metric are chosen to be the `best performing'. We offer a new approach to flip this paradigm. We instead aim to gain a richer picture of… ▽ More

    Submitted 31 October, 2019; v1 submitted 31 July, 2019; originally announced July 2019.

    Comments: 33 pages, 15 figures