-
Towards Reducing Diagnostic Errors with Interpretable Risk Prediction
Authors:
Denis Jered McInerney,
William Dickinson,
Lucy C. Flynn,
Andrea C. Young,
Geoffrey S. Young,
Jan-Willem van de Meent,
Byron C. Wallace
Abstract:
Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propo…
▽ More
Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propose a Neural Additive Model to make predictions backed by evidence with individualized risk estimates at time-points where clinicians are still uncertain, aiming to specifically mitigate delays in diagnosis and errors stemming from an incomplete differential. To train such a model, it is necessary to infer temporally fine-grained retrospective labels of eventual "true" diagnoses. We do so with LLMs, to ensure that the input text is from before a confident diagnosis can be made. We use an LLM to retrieve an initial pool of evidence, but then refine this set of evidence according to correlations learned by the model. We conduct an in-depth evaluation of the usefulness of our approach by simulating how it might be used by a clinician to decide between a pre-defined list of differential diagnoses.
△ Less
Submitted 19 March, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Enabling Cross-Language Data Integration and Scalable Analytics in Decentralized Finance
Authors:
Conor Flynn,
Kristin P. Bennett,
John S. Erickson,
Aaron Green,
Oshani Seneviratne
Abstract:
With the agile development process of most academic and corporate entities, designing a robust computational back-end system that can support their ever-changing data needs is a constantly evolving challenge. We propose the implementation of a data and language-agnostic system design that handles different data schemes and sources while subsequently providing researchers and developers a way to co…
▽ More
With the agile development process of most academic and corporate entities, designing a robust computational back-end system that can support their ever-changing data needs is a constantly evolving challenge. We propose the implementation of a data and language-agnostic system design that handles different data schemes and sources while subsequently providing researchers and developers a way to connect to it that is supported by a vast majority of programming languages. To validate the efficacy of a system with this proposed architecture, we integrate various data sources throughout the decentralized finance (DeFi) space, specifically from DeFi lending protocols, retrieving tens of millions of data points to perform analytics through this system. We then access and process the retrieved data through several different programming languages (R-Lang, Python, and Java). Finally, we analyze the performance of the proposed architecture in relation to other high-performance systems and explore how this system performs under a high computational load.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Assessing Scientific Contributions in Data Sharing Spaces
Authors:
Kacy Adams,
Fernando Spadea,
Conor Flynn,
Oshani Seneviratne
Abstract:
In the present academic landscape, the process of collecting data is slow, and the lax infrastructures for data collaborations lead to significant delays in coming up with and disseminating conclusive findings. Therefore, there is an increasing need for a secure, scalable, and trustworthy data-sharing ecosystem that promotes and rewards collaborative data-sharing efforts among researchers, and a r…
▽ More
In the present academic landscape, the process of collecting data is slow, and the lax infrastructures for data collaborations lead to significant delays in coming up with and disseminating conclusive findings. Therefore, there is an increasing need for a secure, scalable, and trustworthy data-sharing ecosystem that promotes and rewards collaborative data-sharing efforts among researchers, and a robust incentive mechanism is required to achieve this objective. Reputation-based incentives, such as the h-index, have historically played a pivotal role in the academic community. However, the h-index suffers from several limitations. This paper introduces the SCIENCE-index, a blockchain-based metric measuring a researcher's scientific contributions. Utilizing the Microsoft Academic Graph and machine learning techniques, the SCIENCE-index predicts the progress made by a researcher over their career and provides a soft incentive for sharing their datasets with peer researchers. To incentivize researchers to share their data, the SCIENCE-index is augmented to include a data-sharing parameter. DataCite, a database of openly available datasets, proxies this parameter, which is further enhanced by including a researcher's data-sharing activity. Our model is evaluated by comparing the distribution of its output for geographically diverse researchers to that of the h-index. We observe that it results in a much more even spread of evaluations. The SCIENCE-index is a crucial component in constructing a decentralized protocol that promotes trust-based data sharing, addressing the current inequity in dataset sharing. The work outlined in this paper provides the foundation for assessing scientific contributions in future data-sharing spaces powered by decentralized applications.
△ Less
Submitted 18 March, 2023;
originally announced March 2023.
-
Towards Algorithmic Fairness in Space-Time: Filling in Black Holes
Authors:
Cheryl Flynn,
Aritra Guha,
Subhabrata Majumdar,
Divesh Srivastava,
Zhengyi Zhou
Abstract:
New technologies and the availability of geospatial data have drawn attention to spatio-temporal biases present in society. For example: the COVID-19 pandemic highlighted disparities in the availability of broadband service and its role in the digital divide; the environmental justice movement in the United States has raised awareness to health implications for minority populations stemming from h…
▽ More
New technologies and the availability of geospatial data have drawn attention to spatio-temporal biases present in society. For example: the COVID-19 pandemic highlighted disparities in the availability of broadband service and its role in the digital divide; the environmental justice movement in the United States has raised awareness to health implications for minority populations stemming from historical redlining practices; and studies have found varying quality and coverage in the collection and sharing of open-source geospatial data. Despite the extensive literature on machine learning (ML) fairness, few algorithmic strategies have been proposed to mitigate such biases. In this paper we highlight the unique challenges for quantifying and addressing spatio-temporal biases, through the lens of use cases presented in the scientific literature and media. We envision a roadmap of ML strategies that need to be developed or adapted to quantify and overcome these challenges -- including transfer learning, active learning, and reinforcement learning techniques. Further, we discuss the potential role of ML in providing guidance to policy makers on issues related to spatial fairness.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Detecting Bias in the Presence of Spatial Autocorrelation
Authors:
Subhabrata Majumdar,
Cheryl Flynn,
Ritwik Mitra
Abstract:
In spite of considerable practical importance, current algorithmic fairness literature lacks technical methods to account for underlying geographic dependency while evaluating or mitigating bias issues for spatial data. We initiate the study of bias in spatial applications in this paper, taking the first step towards formalizing this line of quantitative methods. Bias in spatial data applications…
▽ More
In spite of considerable practical importance, current algorithmic fairness literature lacks technical methods to account for underlying geographic dependency while evaluating or mitigating bias issues for spatial data. We initiate the study of bias in spatial applications in this paper, taking the first step towards formalizing this line of quantitative methods. Bias in spatial data applications often gets confounded by underlying spatial autocorrelation. We propose hypothesis testing methodology to detect the presence and strength of this effect, then account for it by using a spatial filtering-based approach -- in order to enable application of existing bias detection metrics. We evaluate our proposed methodology through numerical experiments on real and synthetic datasets, demonstrating that in the presence of several types of confounding effects due to the underlying spatial structure our testing methods perform well in maintaining low type-II errors and nominal type-I errors.
△ Less
Submitted 28 January, 2022; v1 submitted 5 January, 2021;
originally announced January 2021.
-
Local Dampening: Differential Privacy for Non-numeric Queries via Local Sensitivity
Authors:
Victor A. E. Farias,
Felipe T. Brito,
Cheryl Flynn,
Javam C. Machado,
Subhabrata Majumdar,
Divesh Srivastava
Abstract:
Differential privacy is the state-of-the-art formal definition for data release under strong privacy guarantees. A variety of mechanisms have been proposed in the literature for releasing the output of numeric queries (e.g., the Laplace mechanism and smooth sensitivity mechanism). Those mechanisms guarantee differential privacy by adding noise to the true query's output. The amount of noise added…
▽ More
Differential privacy is the state-of-the-art formal definition for data release under strong privacy guarantees. A variety of mechanisms have been proposed in the literature for releasing the output of numeric queries (e.g., the Laplace mechanism and smooth sensitivity mechanism). Those mechanisms guarantee differential privacy by adding noise to the true query's output. The amount of noise added is calibrated by the notions of global sensitivity and local sensitivity of the query that measure the impact of the addition or removal of an individual on the query's output. Mechanisms that use local sensitivity add less noise and, consequently, have a more accurate answer. However, although there has been some work on generic mechanisms for releasing the output of non-numeric queries using global sensitivity (e.g., the Exponential mechanism), the literature lacks generic mechanisms for releasing the output of non-numeric queries using local sensitivity to reduce the noise in the query's output. In this work, we remedy this shortcoming and present the local dampening mechanism. We adapt the notion of local sensitivity for the non-numeric setting and leverage it to design a generic non-numeric mechanism. We provide theoretical comparisons to the exponential mechanism and show under which conditions the local dampening mechanism is more accurate than the exponential mechanism. We illustrate the effectiveness of the local dampening mechanism by applying it to three diverse problems: (i) percentile selection problem. We report the p-th element in the database; (ii) Influential node analysis. Given an influence metric, we release the top-k most influential nodes while preserving the privacy of the relationship between nodes in the network; (iii) Decision tree induction. We provide a private adaptation to the ID3 algorithm to build decision trees from a given tabular dataset.
△ Less
Submitted 14 April, 2022; v1 submitted 7 December, 2020;
originally announced December 2020.
-
Towards Integrating Fairness Transparently in Industrial Applications
Authors:
Emily Dodwell,
Cheryl Flynn,
Balachander Krishnamurthy,
Subhabrata Majumdar,
Ritwik Mitra
Abstract:
Numerous Machine Learning (ML) bias-related failures in recent years have led to scrutiny of how companies incorporate aspects of transparency and accountability in their ML lifecycles. Companies have a responsibility to monitor ML processes for bias and mitigate any bias detected, ensure business product integrity, preserve customer loyalty, and protect brand image. Challenges specific to industr…
▽ More
Numerous Machine Learning (ML) bias-related failures in recent years have led to scrutiny of how companies incorporate aspects of transparency and accountability in their ML lifecycles. Companies have a responsibility to monitor ML processes for bias and mitigate any bias detected, ensure business product integrity, preserve customer loyalty, and protect brand image. Challenges specific to industry ML projects can be broadly categorized into principled documentation, human oversight, and need for mechanisms that enable information reuse and improve cost efficiency. We highlight specific roadblocks and propose conceptual solutions on a per-category basis for ML practitioners and organizational subject matter experts. Our systematic approach tackles these challenges by integrating mechanized and human-in-the-loop components in bias detection, mitigation, and documentation of projects at various stages of the ML lifecycle. To motivate the implementation of our system -- SIFT (System to Integrate Fairness Transparently) -- we present its structural primitives with an example real-world use case on how it can be used to identify potential biases and determine appropriate mitigation strategies in a participatory manner.
△ Less
Submitted 13 February, 2021; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
Authors:
Miles Brundage,
Shahar Avin,
Jasmine Wang,
Haydn Belfield,
Gretchen Krueger,
Gillian Hadfield,
Heidy Khlaaf,
Jingying Yang,
Helen Toner,
Ruth Fong,
Tegan Maharaj,
Pang Wei Koh,
Sara Hooker,
Jade Leung,
Andrew Trask,
Emma Bluemke,
Jonathan Lebensold,
Cullen O'Keefe,
Mark Koren,
Théo Ryffel,
JB Rubinovitz,
Tamay Besiroglu,
Federica Carugati,
Jack Clark,
Peter Eckersley
, et al. (34 additional authors not shown)
Abstract:
With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they…
▽ More
With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms.
△ Less
Submitted 20 April, 2020; v1 submitted 15 April, 2020;
originally announced April 2020.
-
The Windfall Clause: Distributing the Benefits of AI for the Common Good
Authors:
Cullen O'Keefe,
Peter Cihon,
Ben Garfinkel,
Carrick Flynn,
Jade Leung,
Allan Dafoe
Abstract:
As the transformative potential of AI has become increasingly salient as a matter of public and political interest, there has been growing discussion about the need to ensure that AI broadly benefits humanity. This in turn has spurred debate on the social responsibilities of large technology companies to serve the interests of society at large. In response, ethical principles and codes of conduct…
▽ More
As the transformative potential of AI has become increasingly salient as a matter of public and political interest, there has been growing discussion about the need to ensure that AI broadly benefits humanity. This in turn has spurred debate on the social responsibilities of large technology companies to serve the interests of society at large. In response, ethical principles and codes of conduct have been proposed to meet the escalating demand for this responsibility to be taken seriously. As yet, however, few institutional innovations have been suggested to translate this responsibility into legal commitments which apply to companies positioned to reap large financial gains from the development and use of AI. This paper offers one potentially attractive tool for addressing such issues: the Windfall Clause, which is an ex ante commitment by AI firms to donate a significant amount of any eventual extremely large profits. By this we mean an early commitment that profits that a firm could not earn without achieving fundamental, economically transformative breakthroughs in AI capabilities will be donated to benefit humanity broadly, with particular attention towards mitigating any downsides from deployment of windfall-generating AI.
△ Less
Submitted 24 January, 2020; v1 submitted 25 December, 2019;
originally announced December 2019.
-
The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation
Authors:
Miles Brundage,
Shahar Avin,
Jack Clark,
Helen Toner,
Peter Eckersley,
Ben Garfinkel,
Allan Dafoe,
Paul Scharre,
Thomas Zeitzoff,
Bobby Filar,
Hyrum Anderson,
Heather Roff,
Gregory C. Allen,
Jacob Steinhardt,
Carrick Flynn,
Seán Ó hÉigeartaigh,
Simon Beard,
Haydn Belfield,
Sebastian Farquhar,
Clare Lyle,
Rebecca Crootof,
Owain Evans,
Michael Page,
Joanna Bryson,
Roman Yampolskiy
, et al. (1 additional authors not shown)
Abstract:
This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the ways in which AI may influence the threat landscape in the digital, physical, and political domains, we make four high-level recommendations for AI researchers and other stakeholders. We also suggest several promis…
▽ More
This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the ways in which AI may influence the threat landscape in the digital, physical, and political domains, we make four high-level recommendations for AI researchers and other stakeholders. We also suggest several promising areas for further research that could expand the portfolio of defenses, or make attacks less effective or harder to execute. Finally, we discuss, but do not conclusively resolve, the long-term equilibrium of attackers and defenders.
△ Less
Submitted 20 February, 2018;
originally announced February 2018.
-
On the Impossibility of Supersized Machines
Authors:
Ben Garfinkel,
Miles Brundage,
Daniel Filan,
Carrick Flynn,
Jelena Luketina,
Michael Page,
Anders Sandberg,
Andrew Snyder-Beattie,
Max Tegmark
Abstract:
In recent years, a number of prominent computer scientists, along with academics in fields such as philosophy and physics, have lent credence to the notion that machines may one day become as large as humans. Many have further argued that machines could even come to exceed human size by a significant margin. However, there are at least seven distinct arguments that preclude this outcome. We show t…
▽ More
In recent years, a number of prominent computer scientists, along with academics in fields such as philosophy and physics, have lent credence to the notion that machines may one day become as large as humans. Many have further argued that machines could even come to exceed human size by a significant margin. However, there are at least seven distinct arguments that preclude this outcome. We show that it is not only implausible that machines will ever exceed human size, but in fact impossible.
△ Less
Submitted 31 March, 2017;
originally announced March 2017.
-
Composing Differential Privacy and Secure Computation: A case study on scaling private record linkage
Authors:
Xi He,
Ashwin Machanavajjhala,
Cheryl Flynn,
Divesh Srivastava
Abstract:
Private record linkage (PRL) is the problem of identifying pairs of records that are similar as per an input matching rule from databases held by two parties that do not trust one another. We identify three key desiderata that a PRL solution must ensure: 1) perfect precision and high recall of matching pairs, 2) a proof of end-to-end privacy, and 3) communication and computational costs that scale…
▽ More
Private record linkage (PRL) is the problem of identifying pairs of records that are similar as per an input matching rule from databases held by two parties that do not trust one another. We identify three key desiderata that a PRL solution must ensure: 1) perfect precision and high recall of matching pairs, 2) a proof of end-to-end privacy, and 3) communication and computational costs that scale subquadratically in the number of input records. We show that all of the existing solutions for PRL - including secure 2-party computation (S2PC), and their variants that use non-private or differentially private (DP) blocking to ensure subquadratic cost - violate at least one of the three desiderata. In particular, S2PC techniques guarantee end-to-end privacy but have either low recall or quadratic cost. In contrast, no end-to-end privacy guarantee has been formalized for solutions that achieve subquadratic cost. This is true even for solutions that compose DP and S2PC: DP does not permit the release of any exact information about the databases, while S2PC algorithms for PRL allow the release of matching records.
In light of this deficiency, we propose a novel privacy model, called output constrained differential privacy, that shares the strong privacy protection of DP, but allows for the truthful release of the output of a certain function applied to the data. We apply this to PRL, and show that protocols satisfying this privacy model permit the disclosure of the true matching records, but their execution is insensitive to the presence or absence of a single non-matching record. We find that prior work that combine DP and S2PC techniques even fail to satisfy this end-to-end privacy model. Hence, we develop novel protocols that provably achieve this end-to-end privacy guarantee, together with the other two desiderata of PRL.
△ Less
Submitted 1 September, 2017; v1 submitted 1 February, 2017;
originally announced February 2017.