1 Introduction

The recent years have witnessed a surge of interest related to healthcare data analytics, due to the fact that more and more such data are becoming readily available from various sources including clinical institutions, patient individuals, insurance companies, and pharmaceutical industries, among others. This provides an unprecedented opportunity for the development of computational techniques to dig data-driven insights for improving the quality of care delivery [72, 105].

Healthcare data are typically fragmented because of the complicated nature of the healthcare system and processes. For example, different hospitals may be able to access the clinical records of their own patient populations only. These records are highly sensitive with protected health information (PHI) of individuals. Rigorous regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) [32], have been developed to regulate the process of accessing and analyzing such data. This creates a big challenge for modern data mining and machine learning (ML) technologies, such as deep learning [61], which typically requires a large amount of training data.

Federated learning is a paradigm with a recent surge in popularity as it holds great promise on learning with fragmented sensitive data. Instead of aggregating data from different places all together, or relying on the traditional discovery then replication design, it enables training a shared global model with a central server while keeping the data in local institutions where the they originate.

The term “federated learning” is not new. In 1976, Patrick Hill, a philosophy professor, first developed the Federated Learning Community (FLC) to bring people together to jointly learn, which helped students overcome the anonymity and isolation in large research universities [42]. Subsequently, there were several efforts aiming at building federations of learning content and content repositories [6, 74, 83]. In 2005, Rehak et al. [83] developed a reference model describing how to establish an interoperable repository infrastructure by creating federations of repositories, where the metadata are collected from the contributing repositories into a central registry provided with a single point of discovery and access. The ultimate goal of this model is to enable learning from diverse content repositories. These practices in federated learning community or federated search service have provided effective references for the development of federated learning algorithms.

Federated learning holds great promises on healthcare data analytics. For both provider (e.g., building a model for predicting the hospital readmission risk with patient Electronic Health Records (EHR) [71]) and consumer (patient)-based applications (e.g., screening atrial fibrillation with electrocardiograms captured by smartwatch [79]), the sensitive patient data can stay either in local institutions or with individual consumers without going out during the federated model learning process, which effectively protects the patient privacy. The goal of this paper is to review the setup of federated learning, discuss the general solutions and challenges, and envision its applications in healthcare.

In this review, after a formal overview of federated learning, we summarize the main challenges and recent progress in this field. Then we illustrate the potential of federated learning methods in healthcare by describing the successful recent research. At last, we discuss the main opportunities and open questions for future applications in healthcare.

Difference with Existing Reviews

There has been a few review articles on federated learning recently. For example, Yang et al. [109] wrote the early federated learning survey summarizing the general privacy-preserving techniques that can be applied to federated learning. Some researchers surveyed sub-problems of federated learning, e.g., personalization techniques [59], semi-supervised learning algorithms [49], threat models [68], and mobile edge networks [66]. Kairouz et al. [51] discussed recent advances and presented an extensive collection of open problems and challenges. Li et al. [63] conducted the review on federated learning from a system viewpoint. Different from those reviews, this paper provided the potential of federated learning to be applied in healthcare. We summarized the general solution to the challenges in federated learning scenario and surveyed a set of representative federated learning methods for healthcare. In the last part of this review, we outlined some directions or open questions in federated learning for healthcare. An early version of this paper is available on arXiv [107].

2 Federated Learning

Federated learning is a problem of training a high-quality shared global model with a central server from decentralized data scattered among large number of different clients (Fig. 1). Mathematically, assume there are K activated clients where the data reside in (a client could be a mobile phone, a wearable device, or a clinical institution data warehouse, etc.). Let \(\mathcal {D}_{k}\) denote the data distribution associated with client k and nk the number of samples available from that client. \(n = {\sum }_{k=1}^{K} n_{k}\) is the total sample size. Federated learning problem boils down to solving a empirical risk minimization problem of the form [56, 57, 69]:

$$ \min\limits_{\mathbf{w}\in\mathbb{R}^{d}} F(\mathbf{w}):=\sum\limits_{k=1}^{K} \frac{n_{k}}{n}F_{k}(\mathbf{w})\ \ \ \ \text{where}\ \ \ F_{k}(\mathbf{w}):=\frac{1}{n_{k}}\sum\limits_{\mathbf{x}_{i}\in\mathcal{D}_{k}}f_{i}(\mathbf{w}), $$
(1)

where w is the model parameter to be learned. The function fi is specified via a loss function dependent on a pair of input-output data pair {xi,yi}. Typically, \(\mathbf {x}_{i}\in \mathbb {R}^{d}\) and \(y_{i}\in \mathbb {R}\) or yi ∈ {− 1, 1}. Simple examples include:

  • linear regression: \(f_{i}(\mathbf {w})=\frac {1}{2}(\mathbf {x}_{i}^{\top }\mathbf {w}-y_{i})^{2}\), \(y_{i}\in \mathbb {R}\);

  • logistic regression: \(f_{i}(\mathbf {w})=-\log (1+\exp (-y_{i}\mathbf {x}_{i}^{\top }\mathbf {w}))\), yi ∈ {− 1,1};

  • support vector machines: \(f_{i}(\mathbf {w})=\max \limits \{0, 1-y_{i}\mathbf {x}_{i}^{\top }\mathbf {w}\}\), yi ∈ {− 1,1}.

In particular, algorithms for federated learning face with a number of challenges [13, 96], specifically:

  • Statistical Challenge: The data distribution among all clients differ greatly, i.e., \(\forall k\neq \tilde {k}\), we have \(\mathbb {E}_{\mathbf {x}_{i} \sim {\mathcal {D}_{k}}}[f_{i}(\mathbf {w};\mathbf {x}_{i})] \neq \mathbb {E}_{\mathbf {x}_{i} \sim \mathcal {D}_{\tilde {k}}} [f_{i}(\mathbf {w};\mathbf {x}_{i})]\). It is such that any data points available locally are far from being a representative sample of the overall distribution, i.e., \(\mathbb {E}_{\mathbf {x}_{i} \sim \mathcal {D}_{k}}[f_{i}(\mathbf {w};\mathbf {x}_{i})] \neq F(\mathbf {w})\).

  • Communication Efficiency: The number of clients K is large and can be much bigger than the average number of training sample stored in the activated clients, i.e., K ≫ (n/K).

  • Privacy and Security: Additional privacy protections are needed for unreliable participating clients. It is impossible to ensure all clients are equally reliable.

Next, we will survey, in detail, the existing federated learning related works on handling such challenges.

Fig. 1
figure 1

Schematic of the federated learning framework. The model is trained in a distributed manner: the institutions periodically communicate the local updates with a central server to learn a global model; the central server aggregates the updates and sends back the parameters of the updated global model

2.1 Statistical Challenges of Federated Learning

The naive way to solve the federated learning problem is through Federated Averaging (FedAvg) [69]. It is demonstrated can work with certain non independent identical distribution (non-IID) data by requiring all the clients to share the same model. However, FedAvg does not address the statistical challenge of strongly skewed data distributions. The performance of convolutional neural networks trained with FedAvg algorithm can reduce significantly due to the weight divergence [111]. Existing research on dealing with the statistical challenge of federated learning can be grouped into two fields, i.e., consensus solution and pluralistic solution.

2.1.1 Consensus Solution

Most centralized models are trained on the aggregated training samples obtained from the samples drawn from the local clients [96, 111]. Intrinsically, the centralized model is trained to minimize the loss with respect to the uniform distribution [73]: \(\bar {\mathcal {D}}={\sum }_{k=1}^{K}\frac {n_{k}}{n}\mathcal {D}_{k}\), where \(\bar {\mathcal {D}}\) is the target data distribution for the learning model. However, this specific uniform distribution is not an adequate solution in most scenarios.

To address this issue, the recent proposed solution is to model the target distribution or force the data adapt to the uniform distribution [73, 111]. Specifically, Mohri et al. [73] proposed a minimax optimization scheme, i.e., agnostic federated learning (AFL), where the centralized model is optimized for any possible target distribution formed by a mixture of the client distributions. This method has only been applied at small scales. Compared to AFL, Li et al. [64] proposed q-Fair Federated Learning (q-FFL), assigning higher weight to devices with poor performance, so that the distribution of accuracy in the network reduces in variance. They empirically demonstrate the improved flexibility and scalability of q-FFL compared to AFL.

Another commonly used method is globally sharing a small portion of data between all the clients [75, 111]. The shared subset is required containing a uniform distribution over classes from the central server to the clients. In addition to handle non-IID issue, sharing information of a small portion of trusted instances and noise patterns can guide the local agents to select compact training subset, while the clients learn to add changes to selected data samples, in order to improve the test performance of the global model [38].

2.1.2 Pluralistic Solution

Generally, it is difficult to find a consensus solution w that is good for all components \(\mathcal {D}_{i}\). Instead of wastefully insisting on a consensus solution, many researchers choose to embracing this heterogeneity.

Multi-task learning (MTL) is a natural way to deal with the data drawn from different distributions. It directly captures relationships among non-IID and unbalanced data by leveraging the relatedness between them in comparison to learn a single global model. In order to do this, it is necessary to target a particular way in which tasks are related, e.g., sharing sparsity, sharing low-rank structure, and graph-based relatedness. Recently, Smith et al. [96] empirically demonstrated this point on real-world federated datasets and proposed a novel method MOCHA to solve a general convex MTL problem with handling the system challenges at the same time. Later, Corinzia et al. [22] introduced VIRTUAL, an algorithm for federated multi-task learning with non-convex models. They consider the federation of central server and clients as a Bayesian network and perform training using approximated variational inference. This work bridges the frameworks of federated and transfer/continuous learning.

The success of multi-task learning rests on whether the chosen relatedness assumptions hold. Compared to this, pluralism can be a critical tool for dealing with heterogeneous data without any additional or even low-order terms that depend on the relatedness as in MTL [28]. Eichner et al. [28] considered training in the presence of block-cyclic data and showed that a remarkably simple pluralistic approach can entirely resolve the source of data heterogeneity. When the component distributions are actually different, pluralism can outperform the “ideal” IID baseline.

2.2 Communication Efficiency of Federated Learning

In federated learning setting, training data remain distributed over a large number of clients each with unreliable and relatively slow network connections. Naively for synchronous protocol in federated learning [58, 96], the total number of bits that required during uplink (clinets → server) and downlink (server → clients) communication by each of the K clients during training is given by:

$$ \mathcal{B}^{\mathrm{up/down}}\in \mathcal{O}(U \times \underbrace{|\mathbf{w}|\times (H(\triangle\mathbf{w}^{\mathrm{up/down}})+\beta)}_{\text{update size}}) $$
(2)

where U is the total number of updates performed by each client, |w| is the size of the model and H(△wup/down) is the entropy of the weight updates exchanged during transmitting process. β is the difference between the true update size and the minimal update size (which is given by the entropy) [89]. Apparently, we can consider three ways to reduce the communication cost: (a) reduce the number of clients K, (b) reduce the update size, (c) reduce the number of updates U. Starting at these three points, we can organize existing research on communication-efficient federated learning into four groups, i.e., model compression, client selection, updates reducing, and peer-to-peer learning (Fig. 2).

Fig. 2
figure 2

Communication efficient federated learning methods. Existing research on improving communication efficiency can be categorized into a model compression, b client selection, c updates reducing, and d peer-to-peer learning

2.2.1 Client Selection

The most natural and rough way for reducing communication cost is to restrict the participated clients or choose a fraction of parameters to be updated at each round. Shokri et al. [92] use the selective stochastic gradient descent protocol, where the selection can be completely random or only the parameters whose current values are farther away from their local optima are selected, i.e., those that have a larger gradient. Nishio et al. [75] proposed a new protocol referred to as FedCS, where the central server manages the resources of heterogeneous clients and determines which clients should participate the current training task by analyzing the resource information of each client, such as wireless channel states, computational capacities, and the size of data resources relevant to the current task. Here, the server should decide how much data, energy, and CPU resources used by the mobile devices such that the energy consumption, training latency, and bandwidth cost are minimized while meeting requirements of the training tasks. Anh [5] thus proposes to use the Deep Q-Learning [102] technique that enables the server to find the optimal data and energy management for the mobile devices participating in the mobile crowd-machine learning through federated learning without any prior knowledge of network dynamics.

2.2.2 Model Compression

The goal of model compression is to compress the server-to-client exchanges to reduce uplink/downlink communication cost. The first way is through structured updates, where the update is directly learned from a restricted space parameterized using a smaller number of variables, e.g., sparse, low-rank [58], or more specifically, pruning the least useful connections in a network [37, 113], weight quantization [17, 89], and model distillation [43]. The second way is lossy compression, where a full model update is first learned and then compressed using a combination of quantization, random rotations, and subsampling before sending it to the server [2, 58]. Then the server decodes the updates before doing the aggregation.

Federated dropout, in which each client, instead of locally training an update to the whole global model, trains an update to a smaller sub-model [12]. These sub-models are subsets of the global model and, as such, the computed local updates have a natural interpretation as updates to the larger global model. Federated dropout not only reduces the downlink communication but also reduces the size of uplink updates. Moreover, the local computational costs is correspondingly reduced since the local training procedure dealing with parameters with smaller dimensions.

2.2.3 Updates Reduction

Kamp et al. [52] proposed to average models dynamically depending on the utility of the communication, which leads to a reduction of communication by an order of magnitude compared to periodically communicating state-of-the-art approaches. This facet is well suited for massively distributed systems with limited communication infrastructure. Bui et al. [11] improved federated learning for Bayesian neural networks using partitioned variational inference, where the client can decide to upload the parameters back to the central server after multiple passes through its data, after one local epoch, or after just one mini-batch. Guha et al. [35] focused on techniques for one-shot federated learning, in which they learn a global model from data in the network using only a single round of communication between the devices and the central server. Besides above works, Ren et al. [84] theoretically analyzed the detailed expression of the learning efficiency in the CPU scenario and formulate a training acceleration problem under both communication and learning resource budget. Reinforcement learning and round robin learning are widely used to manage the communication and computation resources [5, 46, 106, 114].

2.2.4 Peer-to-Peer Learning

In federated learning, a central server is required to coordinate the training process of the global model. However, the communication cost to the central server may be not affordable since a large number of clients are usually involved. Also, many practical peer-to-peer networks are usually dynamic, and it is not possible to regularly access a fixed central server. Moreover, because of the dependence on central server, all clients are required to agree on one trusted central body, and whose failure would interrupt the training process for all clients. Therefore, some researches began to study fully decentralized framework where the central server is not required [41, 60, 85, 91]. The local clients are distributed over the graph/network where they only communicate with their one-hop neighbors. Each client updates its local belief based on own data and then aggregates information from the one-hop neighbors.

2.3 Privacy and Security

In federated learning, we usually assume the number of participated clients (e.g., phones, cars, clinical institutions...) is large, potentially in the thousands or millions. It is impossible to ensure none of the clients is malicious. The setting of federated learning, where the model is trained locally without revealing the input data or the model’s output to any clients, prevents direct leakage while training or using the model. However, the clients may infer some information about another client’s private dataset given the execution of f(w), or over the shared predictive model w [100]. To this end, there have been many efforts focus on privacy either from an individual point of view or multiparty views, especially in social media field which significantly exacerbated multiparty privacy (MP) conflicts [97, 98] (Fig. 3).

Fig. 3
figure 3

Privacy-preserving schemes. a Secure multi-party computation. In security sharing, security values (blue and yellow pie) are split into any number of shares that are distributed among the computing nodes. During the computation, no computation node is able to recover the original value nor learn anything about the output (green pie). Any nodes can combine their shares to reconstruct the original value. b Differential privacy. It guarantees that anyone seeing the result of a differentially private analysis will make the same inference (answer 1 and answer 2 are nearly indistinguishable)

2.3.1 Secure Multi-party Computation

Secure multi-party computation (SMC) has a natural application to federated learning scenarios, where each individual client uses a combination of cryptographic techniques and oblivious transfer to jointly compute a function of their private data [8, 78]. Homomorphic encryption is a public key system, where any party can encrypt its data with a known public key and perform calculations with data encrypted by others with the same public key [29]. Due to its success in cloud computing, it comes naturally into this realm, and it has certainly been used in many federated learning researches [14, 40].

Although SMC guarantees that none of the parties shares anything with each other or with any third party, it can not prevent an adversary from learning some individual information, e.g., which clients’ absence might change the decision boundary of a classifier, etc. Moreover, SMC protocols are usually computationally expensive even for the simplest problems, requiring iterated encryption/decryption and repeated communication between participants about some of the encrypted results [78].

2.3.2 Differential Privacy

Differential privacy (DP) [26] is an alternative theoretical model for protecting the privacy of individual data, which has been widely applied to many areas, not only traditional algorithms, e.g., boosting [27], principal component analysis [15], support vector machine [86], but also deep learning research [1, 70]. It ensures that the addition or removal does not substantially affect the outcome of any analysis and is thus also widely studied in federated learning research to prevent the indirect leakage [1, 70, 92]. However, DP only protects users from data leakage to a certain extent and may reduce performance in prediction accuracy because it is a lossy method [18]. Thus, some researchers combine DP with SMC to reduce the growth of noise injection as the number of parties increases without sacrificing privacy while preserving provable privacy guarantees, protecting against extraction attacks and collusion threats [18, 100].

3 Applications

Federated learning has been incorporated and utilized in many domains. This widespread adoption is due in part by the fact that it enables a collaborative modeling mechanism that allows for efficient ML all while ensuring data privacy and legal compliance between multiple parties or multiple computing nodes. Some promising examples that highlight these capabilities are virtual keyboard prediction [39, 70], smart retail [112], finance [109], and vehicle-to-vehicle communication [88]. In this section, we focus primarily on applications within the healthcare space and also discuss promising applications in other domains since some principles can be applied to healthcare.

3.1 Healthcare

EHRs have emerged as a crucial source of real world healthcare data that has been used for an amalgamation of important biomedical research [30, 47], including for machine learning research [72]. While providing a huge amount of patient data for analysis, EHRs contain systemic and random biases overall and specific to hospitals that limit the generalizability of results. For example, Obermeyer et al. [76] found that a commonly used algorithm to determine enrollment in specific health programs was biased against African Americans, assigning the same level of risk to healthier Caucasian patients. These improperly calibrated algorithms can arise due to a variety of reasons, such as differences in underlying access to care or low representation in training data. It is clear that one way to alleviate the risk for such biased algorithms is the ability to learn from EHR data that is more representative of the global population and which goes beyond a single hospital or site. Unfortunately, due to a myriad of reasons such as discrepant data schemes and privacy concerns, it is unlikely that data will eve be connected together in a single database to learn from all at once. The creation and utility of standardized common data models, such as OMOP [44], allow for more wide-spread replication analyses but it does not overcome the limitations of joint data access. As such, it is imperative that alternative strategies emerge for learning from multiple EHR data sources that go beyond the common discovery-replication framework. Federated learning might be the tool to enable large-scale representative ML of EHR data and we discuss many studies which demonstrate this fact below.

Federated learning is a viable method to connect EHR data from medical institutions, allowing them to share their experiences, and not their data, with a guarantee of privacy [9, 25, 34, 45, 65, 82]. In these scenarios, the performance of ML model will be significantly improved by the iterative improvements of learning from large and diverse medical data sets. There have been some tasks were studied in federated learning setting in healthcare, e.g., patient similarity learning [62], patient representation learning, phenotyping [55, 67], and predictive modeling [10, 45, 90]. Specifically, Lee et al. [62] presented a privacy-preserving platform in a federated setting for patient similarity learning across institutions. Their model can find similar patients from one hospital to another without sharing patient-level information. Kim et al. [55] used tensor factorization models to convert massive electronic health records into meaningful phenotypes for data analysis in federated learning setting. Liu et al. [67] conducted both patient representation learning and obesity comorbidity phenotyping in a federated manner and got good results. Vepakomma et al. [103] built several configurations upon a distributed deep learning method called SplitNN [36] to facilitate the health entities collaboratively training deep learning models without sharing sensitive raw data or model details. Silva et al. [93] illustrated their federated learning framework by investigating brain structural relationships across diseases and clinical cohorts. Huang et al. [45] sought to tackle the challenge of non-IID ICU patient data by clustering patients into clinically meaningful communities that captured similar diagnoses and geological locations and simultaneously training one model per community.

Federated learning has also enabled predictive modeling based on diverse sources, which can provide clinicians with additional insights into the risks and benefits of treating patients earlier [9, 10, 90]. Brisimi et al. [10] aimed to predict future hospitalizations for patients with heart-related diseases using EHR data spread among various data sources/agents by solving the l1-regularized sparse Support Vector Machine classifier in federated learning environment. Owkin is using federated learning to predict patients’ resistance to certain treatment and drugs, as well as their survival rates for certain diseases [99]. Boughorbel et al. [9] proposed a federated uncertainty-aware learning algorithm for the prediction of preterm birth from distributed EHR, where the contribution of models with high uncertainty in the aggregation model is reduced. Pfohl et al. [80] considered the prediction of prolonged length of stay and in-hospital mortality across thirty-one hospitals in the eICU Collaborative Research Database. Sharma et al. [90] tested a privacy preserving framework for the task of in-hospital mortality prediction among patients admitted to the intensive care unit (ICU). Their results show that training the model in the federated learning framework leads to comparable performance to the traditional centralized learning setting. Summary of these work is listed in Table 1.

Table 1 Summary of recent work on federated learning for healthcare

3.2 Others

An important application of federated learning is for natural language processing (NLP) tasks. When Google first proposed federated learning concept in 2016, the application scenario is Gboard—a virtual keyboard of Google for touchscreen mobile devices with support for more than 600 language varieties [39, 70]. Indeed, as users increasingly turn to mobile devices, fast mobile input methods with auto-correction, word completion, and next-word prediction features are becoming more and more important. For these NLP tasks, especially next-word prediction, typed text in mobile apps is usually better than the data from scanned books or speech-to-text in terms of aiding typing on a mobile keyboard. However, these language data often contain sensitive information, e.g., passwords, search queries, or text messages with personal information. Therefore, federated learning has a promising application in NLP like virtual keyboard prediction [7, 39, 70].

Other applications include smart retail [112] and finance [54]. Specifically, smart retail aims to use machine learning technology to provide personalized services to customers based on data like user purchasing power and product characteristics for product recommendation and sales services. In terms of financial applications, Tencent’s WeBank leverages federated learning technologies for credit risk management, where several Banks could jointly generate a comprehensive credit score for a customer without sharing his or her data [109]. With the growth and development of federated learning, there are many companies or research teams that have carried out various tools oriented to scientific research and product development. Popular ones are listed in Table 2.

Table 2 Popular tools for federated learning research

4 Conclusions and Open Questions

In this survey, we review the current progress on federated learning including, but not limited to healthcare field. We summarize the general solutions to the various challenges in federated learning and hope to provide a useful resource for researchers to refer. Besides the summarized general issues in federated learning setting, we list some probably encountered directions or open questions when federated learning is applied in healthcare area in the following.

  • Data Quality. Federated learning has the potential to connect all the isolated medical institutions, hospitals, or devices to make them share their experiences with privacy guarantee. However, most health systems suffer from data clutter and efficiency problems. The quality of data collected from multiple sources is uneven and there is no uniform data standard. The analyzed results are apparently worthless when dirty data are accidentally used as samples. The ability to strategically leverage medical data is critical. Therefore, how to clean, correct, and complete data and accordingly ensure data quality is a key to improve the machine learning model weather we are dealing with federated learning scenario or not.

  • Incorporating Expert Knowledge. In 2016, IBM introduced Watson for Oncology, a tool that uses the natural language processing system to summarize patients’ electronic health records and search the powerful database behind it to advise doctors on treatments. Unfortunately, some oncologists say they trust their judgment more than Watson tells them what needs to be done.Footnote 1 Therefore, hopefully doctors will be involved in the training process. Since every data set collected here cannot be of high quality, so it will be very helpful if the standards of evidence-based machine are introduced, doctors will also see the diagnostic criteria of artificial intelligence. If wrong, doctors will give further guidance to artificial intelligence to improve the accuracy of machine learning model during training process.”

  • Incentive Mechanisms. With the internet of things and the variety of third party portals, a growing number of smartphone healthcare apps are compatible with wearable devices. In addition to data accumulated in hospitals or medical centers, another type of data that is of great value is coming from wearable devices not only to the researchers but more importantly for the owners. However, during federated model training process, the clients suffer from considerable overhead in communication and computation. Without well-designed incentives, self-interested mobile or other wearable devices will be reluctant to participate in federal learning tasks, which will hinder the adoption of federated learning [53]. How to design an efficient incentive mechanism to attract devices with high-quality data to join federated learning is another important problem.

  • Personalization. Wearable devices are more focus on public health, which means helping people who are already healthy to improve their health, such as helping them exercise, practice meditation, and improve their sleep quality. How to assist patients to carry out scientifically designed personalized health management, correct the functional pathological state by examining indicators, and interrupt the pathological change process are very important. Reasonable chronic disease management can avoid emergency visits and hospitalization and reduce the number of visits. Cost and labor savings. Although there are some general work about federated learning personalization [48, 94], for healthcare informatics, how to combining the medical domain knowledge and make the global model be personalized for every medical institutions or wearable devices is another open question.

  • Model Precision. Federated tries to make isolated institutions or devices share their experiences, and the performance of machine learning model will be significantly improved by the formed large medical dataset. However, the prediction task is currently restricted and relatively simple. Medical treatment itself is a very professional and accurate field. Medical devices in hospitals have incomparable advantages over wearable devices. And the models of Doc.ai could predict the phenome collection of one’s biometric data based on its selfie, such as height, weight, age, sex, and BMI.Footnote 2 How to improve the prediction model to predict future health conditions is definitely worth exploring.