-
Privacy-Enhancing Collaborative Information Sharing through Federated Learning -- A Case of the Insurance Industry
Authors:
Panyi Dong,
Zhiyu Quan,
Brandon Edwards,
Shih-han Wang,
Runhuan Feng,
Tianyang Wang,
Patrick Foley,
Prashant Shah
Abstract:
The report demonstrates the benefits (in terms of improved claims loss modeling) of harnessing the value of Federated Learning (FL) to learn a single model across multiple insurance industry datasets without requiring the datasets themselves to be shared from one company to another. The application of FL addresses two of the most pressing concerns: limited data volume and data variety, which are c…
▽ More
The report demonstrates the benefits (in terms of improved claims loss modeling) of harnessing the value of Federated Learning (FL) to learn a single model across multiple insurance industry datasets without requiring the datasets themselves to be shared from one company to another. The application of FL addresses two of the most pressing concerns: limited data volume and data variety, which are caused by privacy concerns, the rarity of claim events, the lack of informative rating factors, etc.. During each round of FL, collaborators compute improvements on the model using their local private data, and these insights are combined to update a global model. Such aggregation of insights allows for an increase to the effectiveness in forecasting claims losses compared to models individually trained at each collaborator. Critically, this approach enables machine learning collaboration without the need for raw data to leave the compute infrastructure of each respective data owner. Additionally, the open-source framework, OpenFL, that is used in our experiments is designed so that it can be run using confidential computing as well as with additional algorithmic protections against leakage of information via the shared model updates. In such a way, FL is implemented as a privacy-enhancing collaborative learning technique that addresses the challenges posed by the sensitivity and privacy of data in traditional machine learning solutions. This paper's application of FL can also be expanded to other areas including fraud detection, catastrophe modeling, etc., that have a similar need to incorporate data privacy into machine learning collaborations. Our framework and empirical results provide a foundation for future collaborations among insurers, regulators, academic researchers, and InsurTech experts.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
The State of Food Systems Worldwide: Counting Down to 2030
Authors:
Kate Schneider,
Jessica Fanzo,
Lawrence Haddad,
Mario Herrero,
Jose Rosero Moncayo,
Anna Herforth,
Roseline Reman,
Alejandro Guarin,
Danielle Resnick,
Namukolo Covic,
Christophe Béné,
Andrea Cattaneo,
Nancy Aburto,
Ramya Ambikapathi,
Destan Aytekin,
Simon Barquera,
Jane Battersby-Lennard,
Ty Beal,
Paulina Bizzoto Molina,
Carlo Cafiero,
Christine Campeau,
Patrick Caron,
Piero Conforti,
Kerstin Damerau,
Michael DiGirolamo
, et al. (32 additional authors not shown)
Abstract:
Transforming food systems is essential to bring about a healthier, equitable, sustainable, and resilient future, including achieving global development and sustainability goals. To date, no comprehensive framework exists to track food systems transformation and their contributions to global goals. In 2021, the Food Systems Countdown to 2030 Initiative (FSCI) articulated an architecture to monitor…
▽ More
Transforming food systems is essential to bring about a healthier, equitable, sustainable, and resilient future, including achieving global development and sustainability goals. To date, no comprehensive framework exists to track food systems transformation and their contributions to global goals. In 2021, the Food Systems Countdown to 2030 Initiative (FSCI) articulated an architecture to monitor food systems across five themes: 1 diets, nutrition, and health; 2 environment, natural resources, and production; 3 livelihoods, poverty, and equity; 4 governance; and 5 resilience and sustainability. Each theme comprises three-to-five indicator domains. This paper builds on that architecture, presenting the inclusive, consultative process used to select indicators and an application of the indicator framework using the latest available data, constructing the first global food systems baseline to track transformation. While data are available to cover most themes and domains, critical indicator gaps exist such as off-farm livelihoods, food loss and waste, and governance. Baseline results demonstrate every region or country can claim positive outcomes in some parts of food systems, but none are optimal across all domains, and some indicators are independent of national income. These results underscore the need for dedicated monitoring and transformation agendas specific to food systems. Tracking these indicators to 2030 and beyond will allow for data-driven food systems governance at all scales and increase accountability for urgently needed progress toward achieving global goals.
△ Less
Submitted 29 March, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Federated Learning Enables Big Data for Rare Cancer Boundary Detection
Authors:
Sarthak Pati,
Ujjwal Baid,
Brandon Edwards,
Micah Sheller,
Shih-Han Wang,
G Anthony Reina,
Patrick Foley,
Alexey Gruzdev,
Deepthi Karkada,
Christos Davatzikos,
Chiharu Sako,
Satyam Ghodasara,
Michel Bilello,
Suyash Mohan,
Philipp Vollmuth,
Gianluca Brugnara,
Chandrakanth J Preetha,
Felix Sahm,
Klaus Maier-Hein,
Maximilian Zenk,
Martin Bendszus,
Wolfgang Wick,
Evan Calabrese,
Jeffrey Rudie,
Javier Villanueva-Meyer
, et al. (254 additional authors not shown)
Abstract:
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc…
▽ More
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.
△ Less
Submitted 25 April, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
Invariant Risk Minimisation for Cross-Organism Inference: Substituting Mouse Data for Human Data in Human Risk Factor Discovery
Authors:
Odhran O'Donoghue,
Paul Duckworth,
Giuseppe Ughi,
Linus Scheibenreif,
Kia Khezeli,
Adrienne Hoarfrost,
Samuel Budd,
Patrick Foley,
Nicholas Chia,
John Kalantari,
Graham Mackintosh,
Frank Soboczenski,
Lauren Sanders
Abstract:
Human medical data can be challenging to obtain due to data privacy concerns, difficulties conducting certain types of experiments, or prohibitive associated costs. In many settings, data from animal models or in-vitro cell lines are available to help augment our understanding of human data. However, this data is known for having low etiological validity in comparison to human data. In this work,…
▽ More
Human medical data can be challenging to obtain due to data privacy concerns, difficulties conducting certain types of experiments, or prohibitive associated costs. In many settings, data from animal models or in-vitro cell lines are available to help augment our understanding of human data. However, this data is known for having low etiological validity in comparison to human data. In this work, we augment small human medical datasets with in-vitro data and animal models. We use Invariant Risk Minimisation (IRM) to elucidate invariant features by considering cross-organism data as belonging to different data-generating environments. Our models identify genes of relevance to human cancer development. We observe a degree of consistency between varying the amounts of human and mouse data used, however, further work is required to obtain conclusive insights. As a secondary contribution, we enhance existing open source datasets and provide two uniformly processed, cross-organism, homologue gene-matched datasets to the community.
△ Less
Submitted 13 February, 2022; v1 submitted 14 November, 2021;
originally announced November 2021.
-
OpenFL: An open-source framework for Federated Learning
Authors:
G Anthony Reina,
Alexey Gruzdev,
Patrick Foley,
Olga Perepelkina,
Mansi Sharma,
Igor Davidyuk,
Ilya Trushkin,
Maksim Radionov,
Aleksandr Mokrov,
Dmitry Agapov,
Jason Martin,
Brandon Edwards,
Micah J. Sheller,
Sarthak Pati,
Prakash Narayana Moorthy,
Shih-han Wang,
Prashant Shah,
Spyridon Bakas
Abstract:
Federated learning (FL) is a computational paradigm that enables organizations to collaborate on machine learning (ML) projects without sharing sensitive data, such as, patient records, financial data, or classified secrets. Open Federated Learning (OpenFL https://github.com/intel/openfl) is an open-source framework for training ML algorithms using the data-private collaborative learning paradigm…
▽ More
Federated learning (FL) is a computational paradigm that enables organizations to collaborate on machine learning (ML) projects without sharing sensitive data, such as, patient records, financial data, or classified secrets. Open Federated Learning (OpenFL https://github.com/intel/openfl) is an open-source framework for training ML algorithms using the data-private collaborative learning paradigm of FL. OpenFL works with training pipelines built with both TensorFlow and PyTorch, and can be easily extended to other ML and deep learning frameworks. Here, we summarize the motivation and development characteristics of OpenFL, with the intention of facilitating its application to existing ML model training in a production environment. Finally, we describe the first use of the OpenFL framework to train consensus ML models in a consortium of international healthcare organizations, as well as how it facilitates the first computational competition on FL.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
The Federated Tumor Segmentation (FeTS) Challenge
Authors:
Sarthak Pati,
Ujjwal Baid,
Maximilian Zenk,
Brandon Edwards,
Micah Sheller,
G. Anthony Reina,
Patrick Foley,
Alexey Gruzdev,
Jason Martin,
Shadi Albarqouni,
Yong Chen,
Russell Taki Shinohara,
Annika Reinke,
David Zimmerer,
John B. Freymann,
Justin S. Kirby,
Christos Davatzikos,
Rivka R. Colen,
Aikaterini Kotrotsou,
Daniel Marcus,
Mikhail Milchenko,
Arash Nazeri,
Hassan Fathallah-Shaykh,
Roland Wiest,
Andras Jakab
, et al. (7 additional authors not shown)
Abstract:
This manuscript describes the first challenge on Federated Learning, namely the Federated Tumor Segmentation (FeTS) challenge 2021. International challenges have become the standard for validation of biomedical image analysis methods. However, the actual performance of participating (even the winning) algorithms on "real-world" clinical data often remains unclear, as the data included in challenge…
▽ More
This manuscript describes the first challenge on Federated Learning, namely the Federated Tumor Segmentation (FeTS) challenge 2021. International challenges have become the standard for validation of biomedical image analysis methods. However, the actual performance of participating (even the winning) algorithms on "real-world" clinical data often remains unclear, as the data included in challenges are usually acquired in very controlled settings at few institutions. The seemingly obvious solution of just collecting increasingly more data from more institutions in such challenges does not scale well due to privacy and ownership hurdles. Towards alleviating these concerns, we are proposing the FeTS challenge 2021 to cater towards both the development and the evaluation of models for the segmentation of intrinsically heterogeneous (in appearance, shape, and histology) brain tumors, namely gliomas. Specifically, the FeTS 2021 challenge uses clinically acquired, multi-institutional magnetic resonance imaging (MRI) scans from the BraTS 2020 challenge, as well as from various remote independent institutions included in the collaborative network of a real-world federation (https://www.fets.ai/). The goals of the FeTS challenge are directly represented by the two included tasks: 1) the identification of the optimal weight aggregation approach towards the training of a consensus model that has gained knowledge via federated learning from multiple geographically distinct institutions, while their data are always retained within each institution, and 2) the federated evaluation of the generalizability of brain tumor segmentation models "in the wild", i.e. on data from institutional distributions that were not part of the training datasets.
△ Less
Submitted 13 May, 2021; v1 submitted 12 May, 2021;
originally announced May 2021.
-
Lensed Image Angles: New Statistical Evidence for Substructure
Authors:
Liliya L. R. Williams,
Patrick Foley,
Damon Farnsworth,
Jason Belter
Abstract:
We introduce a novel statistical way of analyzing the projected mass distribution in galaxy lenses based solely on the angular distribution of images in quads around the lens center. The method requires the knowledge of the lens center location, but the images' distances from the lens center are not used at all. If the images of a quad are numbered in order of arrival time, θ_1 through θ_4, and…
▽ More
We introduce a novel statistical way of analyzing the projected mass distribution in galaxy lenses based solely on the angular distribution of images in quads around the lens center. The method requires the knowledge of the lens center location, but the images' distances from the lens center are not used at all. If the images of a quad are numbered in order of arrival time, θ_1 through θ_4, and θ_{ij} is the angle between images i and j, then we define the 'bisector' plane whose axes are linear combinations of θ_{23} and θ_{14}. The bisector plane of a given lens contains all the quads produced by the lens. We show empirically that all two-fold symmetric lenses with convex, i.e. non-wavy or petal-like isodensity contours are identical in the bisector plane of their quads. We also study lenses with twisting isodensity contours, lumpy substructure, etc. Our results suggest that to reproduce the general characteristics of the observed quad population, kpc-scale substructure must be a common feature of galaxy lenses.
△ Less
Submitted 17 June, 2008;
originally announced June 2008.