Federated Learning With Non-Iid Data 2023

Expert Systems With Applications 237 (2024) 121390
Contents lists available at ScienceDirect
Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa
FedEL: Federated ensemble learning for non-iid data

Xing Wu a,b,c ,∗, Jie Pei a , Xian-Hua Han d , Yen-Wei Chen e , Junfeng Yao f , Yang Liu a ,
Quan Qian a,b,c , Yike Guo g
a School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
b Zhejiang Laboratory, Hangzhou 311100, Zhejiang, China
c
Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai 200444, China
d
Graduate School of Artificial Intelligence and Science, Rikkyo University, Tokyo 1718501, Japan
e
Graduate School of Information Science and Engineering, Ritsumeikan University, Kusatsu-shi 525-0058, Japan
f
CSSC Seago System Technology Co.,Ltd, Shanghai 200010, China
g
The Hong Kong University of Science and Technology, Hongkong 999077, China
ARTICLE INFO ABSTRACT
Keywords: Federated learning (FL) is a joint training pattern that fully utilizes data information whereas protecting data
Federated learning privacy. A key challenge in FL is statistical heterogeneity, which arises on account of the heterogeneity of local
Statistical heterogeneity data distributions among clients, leading to inconsistency in local optimization goals and ultimately reducing
Ensemble learning
the performance of globally aggregated models. We propose the Federated Ensemble Learning (FedEL), which
makes full use of the heterogeneity of data distribution among clients to train a group of weak learners with
diversity to construct a global model, which is a novel solution to the non-independent identical distribution
(non-IID) problem. Experiments demonstrate that the proposed FedEL can improve performance in non-IID
data scenarios. Even under extreme statistical heterogeneity, the average accuracy of FedEL is 3.54% higher
than the state-of-the-art FL method. Moreover, the proposed FedEL reduces model storage and reasoning costs
compared with traditional ensemble learning. The proposed FedEL demonstrates good generalization ability
in experiments across different datasets, including natural scene image datasets and medical image datasets.
1. Introduction 2017), which was introduced by McMahan et al. During every round of
FedAvg, the clients execute several stochastic gradient descents (SGD)
Deep Learning has made notable advancement in recent years, with rounds. After getting the model update from the clients, the server then
its impact felt in various domains (Wu, Chen et al., 2022; Wu, Zhong, performs an average model aggregation. FL has opened up a new field
Guo, & Fujita, 2020). It is widely acknowledged that the premise and of research in artificial intelligence, which has garnered the interest
foundation for a Deep Learning model to perform well is the availability of numerous researchers (Gamboa-Montero, Alonso-Martin, Marques-
of good-quality data. Typically, data is distributed and stored across Villarroya, Sequeira, & Salichs, 2023; Wang et al., 2023). FL provides
various clients, including hospitals, financial companies, factories, and a novel training method and privacy protection mechanism. With the
enterprises. Data islands arise due to public attention and the need possibility of implementation in various domains, finance (Cheng, Liu,
for data security and privacy (Gong, Feng, & Xie, 2020), restrictions Chen, & Yang, 2020) and healthcare (Borger et al., 2022), FL has
imposed by laws and regulations (Voigt & Von dem Bussche, 2017), as already been implemented in various areas, including target detec-
well as data storage and communication. Therefore, breaking data is-
tion (Liu et al., 2020), medical imaging (Wu, Pei et al., 2022), and
lands and maximizing available data usage for training models become
communication (Paragliola & Coronato, 2022; Wu, Liang, & Wang,
a practical problem.
2020).
Federated Learning (FL) can provide an effective solution to tackle
FL encounters several obstacles, with statistical heterogeneity of
these issues. FL is a technique that facilitates the joint training of
the data being one of the most significant (Kairouz et al., 2021).
machine learning models from multiple clients without exchanging
private data (Kairouz et al., 2021). FedAvg is among the most popularly The variation between local and global optimization objectives caused
used algorithms in FL (McMahan, Moore, Ramage, Hampson, & y Arcas, by inconsistencies in data distribution among clients can impede the
∗ Corresponding author at: School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China.
E-mail addresses: xingwu@shu.edu.cn (X. Wu), email_pj@shu.edu.cn (J. Pei), hanxhua@rikkyo.ac.jp (X.-H. Han), chen@is.ritsumei.ac.jp (Y.-W. Chen),
yaojf@seagoing.com.cn (J. Yao), yangliu_cs@shu.edu.cn (Y. Liu), qqian@shu.edu.cn (Q. Qian), yikeguo@ust.hk (Y. Guo).
https://doi.org/10.1016/j.eswa.2023.121390
Received 22 March 2023; Received in revised form 17 August 2023; Accepted 29 August 2023
Available online 9 September 2023
0957-4174/© 2023 Elsevier Ltd. All rights reserved.
X. Wu et al. Expert Systems With Applications 237 (2024) 121390
Fig. 1. The FedEL framework.
result (Zhu, Xu, Liu, & Jin, 2021). Such as, several hospitals want to co- the experiments, FedEL performed better than other state-of-the-art FL
operate to establish a disease detection classification model. However, algorithms and common ensemble learning algorithms in the natural
different hospitals may have different distributions of patient sample scene and medical image datasets.
data (Woźniak, Siłka, & Wieczorek, 2021; Woźniak, Wieczorek, & Siłka, The main contributions of this work are outlined as follows:
2023), eventually leading to poor performance of the jointly established
model. The difference in patient sample data distribution is closely 1. The FedEL algorithm proposed in this work addresses the non-
related to regional economic development, ecological environment, IID problem in FL by integrating ensemble learning techniques.
medical conditions, population density, and other conditions. While Specifically, the algorithm integrates the group of weak learners
FedAvg can handle some non-IID data, the accuracy of neural networks to get a more accurate and robust model. This approach effec-
trained by FedAvg on image datasets decreases significantly with in- tively leverages the global and local information in the weak
creasing label imbalance when facing different degrees of non-IID data, learners and offers a new method to the non-IID challenge.
as our experiment shows (see Section 4). Several studies have proposed 2. Compared to advanced FL algorithms and traditional ensem-
addressing the non-IID problem and modifying the FedAvg algorithm.
ble learning methods, FedEL boasts several advantages. Firstly,
FedProx (Li, Sahu et al., 2020) uses 𝓁2 -norm distance to constrain local
it can improve model performance in non-IID data scenarios.
updates directly. SCAFFOLD (Karimireddy et al., 2019) fixes the local
Secondly, it does so while minimizing storage and reasoning
updates through variance reduction. MOON (Li, He, & Song, 2021)
costs, which is not the case with traditional ensemble learning
enhances the consistency between the learned representation of the
methods.
current local and global model. FedDyn adds regularization items when
the local model is updated (Durmus et al., 2021). 3. The generalization and validity of FedEL are evaluated on two
This work puts forward a solution to the non-IID problem based datasets: natural scene image and medical image datasets. Across
on a straightforward idea. Local models updated by each client during diverse datasets, FedEL consistently outperforms other state-of-
local training become weak learners later in the FL training period due the-art FL algorithms and common ensemble learning methods
to uneven data distribution. Integration of this group of weak learners in the experiments, which validates its broad applicability and
may result in better results than model aggregation. FedEL algorithm is robustness.
introduced, inspired by the stacking method in ensemble learning (see
Fig. 1), which integrates the group of weak learners into one model, The organization of this work is as outlined below. Section 2
offering a new approach to resolve the non-IID problem. The effec- presents the background and related federated learning and ensemble
tiveness of FedEL is evaluated through experiments on natural scene learning works. Section 3 explains FedEL in detail. In Section 4, we
datasets (CIFAR-10 (Krizhevsky, Hinton, et al., 2009), CIFAR-100, Tiny- validate the effectiveness and robustness of FedEL through extensive
Imagenet (Benbrahim & Behloul, 2021)) and medical image datasets experiments on diverse natural scene and medical image datasets. In
(Hyper-Kvasir (Borgli et al., 2020)). FedEL is compared to other FL Section 5, we conclusion and discuss the advantages and limitations of
algorithms and common ensemble learning algorithms. According to FedEL, and give possible solutions in future work.
2
2. Related work Therefore, the training of weak classifiers and the selection of
aggregation functions are essential in the construction of ensemble
This section covers two main topics: the development of FL and learning models. There are two fundamental principles for training
current solutions to non-IID challenges, as well as the development of weak classifiers: First, the participating weak classifiers should have
ensemble learning, and a brief introduction to the stacking method. enough diversity. Second, the prediction performance of a single weak
classifier should be as good as possible. There are many representative
2.1. Federated learning methods for selecting 𝐺. The most direct is the weighting method.
Still taking the classification problem as an example, the conditional
The FedAvg algorithm was introduced in 2016 and was subse- probability vectors predicted by multiple weak classifiers are added
quently leveraged for privacy protection in mobile internet termi- together to select the class where the maximum value appears:
nals (McMahan et al., 2017). To address data islands, Yang et al.
proposed secure FL, which have been widely accepted (Yang, Liu, ( ( )) ( ) ∑ 𝐾
( )
𝑦̂𝑖 = arg max 𝜓 𝑥𝑖 where 𝜓 𝑥𝑖 = 𝑓𝑘 𝑥𝑖 (2)
Chen, & Tong, 2019). In 2021, the IEEE released the first international 𝑘=1
standard white paper on federated learning, indicating its growing
The 𝑦̂𝑖 represents the prediction result of the 𝑖th sample, 𝜓 rep-
importance and interest in the field (Yang, Fan, Tong, & Lv, 2021).
resents the integrated learning model, 𝑥𝑖 denotes the i-th sample, 𝐾
FedAvg has become a prominent algorithm in FL. The FedAvg
denotes the number of weak learners, and 𝑓 represents the weak
algorithm framework is illustrated in Step1 of Fig. 1. Each round in
learners.
FedAvg can be split into the following steps: In the beginning, the server
In the realm of meta-learning techniques, Stacking is one of the
disseminates the global model to all clients. Then, the clients use the
most frequently used methods, in addition to weighting approaches.
SGD to perform many rounds of local training. Next, Clients upload the
This technique was initially proposed by Wolpert (1992). The Stacking
local model. Finally, the server performs model aggregation to obtain
method is divided into two stages. Training multiple base models on
the updated global model.
the training set is the first step in meta-learning. Following this, the
Li, Fan, Tse, and Lin (2020) demonstrated through experiments
predictions made by these models on the training set samples are as
that the accuracy of a model trained using the FedAvg algorithm is
input to train the meta-model. Inspired by this, we propose FedEL.
significantly reduced when the data is highly skewed non-IID data.
The combination of ensemble learning and FL has been the subject
The decrease in accuracy can be attributed to weight dispersion. Li
of extensive research, particularly emphasis on privacy protection,
et al. proposed FedProx (Li, Fan et al., 2020), which modified the
personalization, and communication in FL Lin, Kong, Stich, and Jaggi
local optimization objective of the client . Specifically, a proximal
(2020). Nevertheless, the non-IID problem still exists.
term is added. The proximal term is the 𝓁2 -norm distance to prevent
significant deviations in the local update and diminish the impact of
data heterogeneity. Sai et al. proposed the SCAFFOLD (Karimireddy 3. Method
et al., 2019). By utilizing control variates (variance reduction), SCAF-
FOLD corrects for ‘‘client-drift’’ in local updates. Li et al. proposed the This section defines the problem formulation of FL, where the non-
MOON (Li et al., 2021), which corrects for local updates on clients, with IID problem arises due to inconsistency between the overall and local
similarities between model representations, that is, contrastive learning goals. Subsequently, the motivation to propose FedEL is explained,
at the model level. They perform experiments on several classical image which aims to address the non-IID problem by using ensemble learning.
classification datasets. Durmus et al. proposed the FedDyn (Durmus Finally, a detailed introduction to the proposed FedEL algorithm is
et al., 2021), which adds regularization items when the local model is provided.
updated. The regularization term guides the convergence of each model
of the client towards a globally optimal direction. All of the above 3.1. Problem formulation
algorithms focus on consistency between local and global models and { }
diminish the impact of non-IID data by limiting or correcting for local We suppose that there are 𝑚 clients in FL, denoted 𝐶1 , 𝐶2 , … , 𝐶𝑚 .
model updates. Therefore, we hope to propose an algorithm framework the global model parameters are denoted as 𝑤, the local model param-
that can be used for Deep Learning models in FL to process non-IID data eters are denoted as 𝑤𝑘 , the data is distributed on each client, and the
without regulating the update. local dataset is denoted as 𝑘 . In round 𝑟 of FL training, each client
For other research, Wang et al. proposed FedNova (Wang, Liu, downloads the global model parameters 𝑤𝑟 and loads into the local
Liang, Joshi, & Poor, 2020), a normalized average method for ad- model. Each client then performs local training to get updated local
dressing factual inconsistencies in non-IID data within FL. Meanwhile, model parameters 𝑤𝑟+1 𝑘
. Formally, the gradient descent method shown
Gao et al. introduced FedDC (Gao et al., 2022), which adds a local in Eq. (3) is adopted in the local training process:
drift variable for lightweight modifications to local updates. Our study ( )
𝑤𝑟+1
𝑘
= 𝑤𝑟𝑘 − 𝜂∇𝑤𝑟 𝑙 𝑤𝑟𝑘 ; 𝑏 (3)
is orthogonal to these approaches, and there is an opportunity for 𝑘
combination in future research. Within the context of federated learning, the symbols 𝜂, 𝑙(⋅), and 𝑏
represent the learning rate, loss function, and batch data, respectively.
2.2. Ensemble learning The server then to update the global model parameters 𝑤𝑟+1 :
∑𝑚
𝑛𝑘 𝑟+1
Ensemble learning approaches are used in various machine learning 𝑤𝑟+1 = 𝑤 (4)
challenges, often requiring the training of multiple models to replace 𝑘=1
𝑛 𝑘
the predictions of a single model with the predictions of multiple mod- The 𝑛 denotes the overall quantity of samples, 𝑛𝑘 represents the
els. Suppose
{( that) in the 𝑚classification task,} the dataset is represented quantity of samples owned by the 𝑘th client.
as 𝐷 = 𝑥𝑖 , 𝑦𝑖 ||𝑥𝑖 ∈ R , 𝑦𝑖 ∈ R𝑑 , || 𝐷 ∣= 𝑛 , where (𝑥𝑖 , 𝑦𝑖 ) represents The problem of FL is to solve an empirical risk minimization prob-
the 𝑖th sample 𝑥𝑖 and the corresponding label 𝑦𝑖 . The samples have 𝑚 lem:
features, and the labels are 𝑑-dimensional vectors. 𝐷 has 𝑛 samples. The
∑𝑚
𝑛𝑘
ensemble learning model 𝜓 uses { aggregate functions 𝐺 to}aggregate the min 𝐹 (𝑤) = 𝐹 (𝑤 ) (5)
predictions of 𝑘 weak classifiers 𝑓1 (𝑥𝑖 ), 𝑓2 (𝑥𝑖 ), … , 𝑓𝑘 (𝑥𝑖 ) on sample 𝑥𝑖 𝑤∈R𝑑 𝑘=1
𝑛 𝑘 𝑘
as the result 𝑦̂𝑖 : 1 ∑
The 𝐹 (⋅) is the total empirical loss, 𝐹𝑘 (𝑤𝑘 ) = 𝑥∈𝑘 𝑙𝑘 (𝑤𝑘 ; 𝑥) is the
( ) ( ( ) ( ) ( )) 𝑛𝑘
𝑦̂𝑖 = 𝜓 𝑥𝑖 = 𝐺 𝑓1 𝑥𝑖 , 𝑓2 𝑥𝑖 , … , 𝑓𝑘 𝑥𝑖 (1) 𝑘th client empirical loss.
3
In the actual situation, as shown in Eq. (6), the data distribution Algorithm 1 FedEL. 𝑤 is the model parameter, 𝑡 is the current training
among clients may vary significantly, and the overall and local goals cycle, 𝐺 represents the quantity of global epochs, 𝐿 is the number of
are inconsistent. As a result, during local model training, a ‘‘client-drift’’ local epochs, 𝑏 is the batchsize, 𝐷𝑘 is the local data of the 𝑘-th client,
phenomenon may occur, leading to the non-IID problem. 𝐶𝐾 (⋅) is the 𝑘-th client classifier module.
Step1:
E𝑥∼𝑘 [𝑙(𝑤; 𝑥)] ≠ E𝑥∼ [𝑙(𝑤; 𝑥)] (6)
𝑘 Server:
initialize 𝑤𝑡
3.2. Motivation
for 𝑡 from 0 to 𝐺 − 1 :
each client 𝑘 :
As previously mentioned, Many existing methods address the non- ( )
𝑤𝑡+1
𝑘∑
← LocalUpdate 𝑘, 𝑤𝑡
IID problem by restricting or correcting local client updates. However, 𝑛𝑘 𝑡+1
𝑤𝑡+1 ← 𝑚 𝑘=1 𝑛 𝑤𝑘
experimental results indicate that these algorithms have a limited
Client:
impact on model performance.
LocalUpdate(𝑘,𝑤):
FedEL tackles the non-IID problem by leveraging a straightforward
for 𝑖 from 1 to 𝐿 :
approach. As a result of the disparate data distribution across clients,
for batch 𝑏 ∈ 𝐷𝑘 :
each local model updated via local training becomes a collection of
𝑤 ← 𝑤 − 𝜂∇𝓁(𝑤; 𝑏)
weak learners during the later stages of FL training. These weak learn-
return 𝑤
ers possess global and local information and naturally exhibit diverse
Step2,3:
characteristics. Obtained from the global model after local training,
each client 𝑘 get 𝑤 from server and load 𝑤 into 𝑤𝐾
each local model also delivers reliable prediction performance. Rather
each client 𝑘 :
than simply aggregating individual models, FedEL integrates this group
Fine-tune local model and update classifier module 𝐶𝑘 (⋅)
of local models to produce superior results. However, incorporating
only
ensemble learning into FL may lead to specific issues. For instance,
Send 𝐶𝑘 (⋅) to server
conventional ensemble methods like weighting methods require the
Server builds a new global model 𝑤𝑡𝑛𝑒𝑤
storage of multiple models, thereby consuming significant storage re-
Step4:
sources. Additionally, these methods rely on averaging the inference
Server:
results of multiple models to generate sample prediction outcomes.
for 𝑡 from 0 to 𝐺 − 1 :
Consequently, this approach exponentially increases the computational
each client 𝑘 :
burden. In order to effectively utilize the advantages of integrated ( )
𝑤𝑡+1 ← LocalUpdate 𝑘, 𝑤𝑡𝑛𝑒𝑤
learning and reduce the storage and computing burden, FedEL uses the 𝑘,𝑛𝑒𝑤
∑ 𝑛𝑘 𝑡+1
𝑚
idea of Stacking for reference to integrate the model. Specific details 𝑤𝑡+1
𝑛𝑒𝑤 ← 𝑘=1 𝑛 𝑤𝑘,𝑛𝑒𝑤
are described in Section 3.3. Client:
LocalUpdate(𝑘,𝑤𝑛𝑒𝑤 ):
3.3. Federated ensemble learning framework for 𝑖 from 1 to 𝐿 :
for batch 𝑏 ∈ 𝐷𝑘 :
To address these challenges, we introduce FedEL as a solution. Fig. 1 𝑤𝑛𝑒𝑤 ← 𝑤𝑛𝑒𝑤 − 𝜂∇𝓁(𝑤𝑛𝑒𝑤 ; 𝑏)
and Algorithm 1 illustrate the steps involved in our proposed method. return 𝑤𝑛𝑒𝑤
Specifically, FedEL comprises the following steps:
1. Conduct FL training (for example, use the FedAvg algorithm).

Stop training when the model converges, or a certain number of server joins a group of received 𝐶(⋅) after the 𝐺(⋅), and a full connection
training rounds are achieved. layer is appended to form the global model.
2. All clients load the global model parameters into the local model In the Step2 of FedEL, each client only updates the parameters of the
and performs fine-tuning training. Unlike the local update per- classifier module 𝐶(⋅) and transmits them to the server. It is noteworthy
formed in Step1, only the classifier module is updated in this that excessive learning rates or too many local updates may result in the
step. After training, the classifier module is uploaded to the local model overfitting the local data and losing important global infor-
server. mation. Since the data is non-IID, overemphasizing local information
3. Building a new global model using the uploaded classifier mod- may lead to poor results. Similarly, this may also adversely affect tradi-
ule in Step2 and distributing it to each client. tional ensemble learning methods like Voting and Averaging. In step3
4. FL training resumes on the new global model, with each client of FedEL, the server concatenates the collected {𝐶1 (⋅), 𝐶2 (⋅), … , 𝐶𝑛 (⋅)}
updating only the last fully connected layer parameters. to 𝐺(⋅) in parallel. Each 𝐶(⋅) output probability vector (assuming 𝑚
dimension) is merged into a 𝑛 × 𝑚 dimension vector after Softmax
Both Step1 and Step4 can be implemented through the use of the operation and as input to the last layer. Suppose there is a sample 𝑥,
FedAvg algorithm. We then explained the key steps involved in fine- the sample 𝑥 first passes through the feature extraction module to get
tuning the local model (Step2) and constructing the ensemble model 𝐺(𝑥), then 𝐺(𝑥) passes through the classifier module respectively to get
(Step3). a group of prediction outputs, {𝐶1 (𝐺(𝑥)), 𝐶2 (𝐺(𝑥)), … , 𝐶𝑛 (𝐺(𝑥))}, which
Fig. 2 describes the process of building the new global model. are then concat and sent into the last full connection layer for result
Firstly, the local model is fine-tuned on the local dataset. The local prediction. The prediction process is shown in Eq. (7).
model is divided into two components, namely, the feature extrac-
𝑦̂ = 𝑤𝑛𝑒𝑤 (𝑥) = 𝐿𝑎𝑠𝑡𝐹 𝐶[𝐶1 (𝐺(𝑥)), 𝐶2 (𝐺(𝑥)), … , 𝐶𝑛 (𝐺(𝑥))] (7)
tion module 𝐺(⋅) and the classifier module 𝐶(⋅). Generally, the 𝐶(⋅)
corresponds to the network structure of the last several layers of the In Step2, the client only updates the classifier module 𝐶(⋅), allowing
model, such as the full connection layer, because this part of the all 𝐶(⋅) to share a 𝐺(⋅) when the server builds a new global model in
network structure contains more coarse-grained and semantic infor- Step3. This approach saves a lot of storage space compared to general
mation, which directly determines the output results. The rest of the ensemble learning methods that require multiple models to be saved.
network structure is a 𝐺(⋅), and parameters are frozen during fine- With only one new global model, the computational burden of model
tuning training. This part of the network structure contains fine-grained reasoning is significantly reduced. In Step4, FL training is conducted on
information. Secondly, all clients only upload the 𝐶(⋅). Finally, the the new global model. By updating and uploading only the last layer,
4
Fig. 2. The new global model building process in FedEL.
similar to the stacking method in ensemble learning, the client can simulate the real world non-IID situation. We sample 𝑝𝑖 ∼ Dir(𝛼) and
improve the training efficiency and reduce sensitivity to learning rate assign instances 𝑝𝑖,𝑗 of class 𝑖 to 𝑗 client, where 𝛼 is the distribution
and local updates during the fine-tuning process in Step2. parameter and the distribution changes as 𝛼 changes. As the value
In conclusion, FedEL addresses the non-IID problem by integrating of 𝛼 grows from small to large, the distribution of data shows the
ensemble learning into the federated learning framework. By leveraging characteristics from centralized distribution to uniform distribution.
the heterogeneity of data distribution, FedEL obtains a group of weak We conduct experiments with three distribution parameters, including
learners with diversity and integrates them into the global model. This 𝛼 = 0.1, 𝛼 = 0.5, and 𝛼 = 5. As shown in Fig. 3, ten client non-IID data
approach not only enhances the performance of global models but also distributions are generated by Dirichlet distributions on the CIFAR-10
reduces the storage and inference overhead of models. dataset. In each subgraph, the horizontal axis indicates different clients
and the vertical axis indicates the sample class, among which the color
4. Experiments indicates the number of samples. Each grid indicates different number
of data samples in different clients. For example, coordinate (1,9) in
This section begins by detailing the setup of the experiment. Then Figure (a) indicates that sample 9 in client 1 is more than 4000. When
verify the role of ensemble learning. Then a large number of com- 𝛼 = 0.1, the data distribution is highly unbalanced, and for some classes,
parative experiments are implemented to validate the effectiveness A client may have many, few, or no data samples. When 𝛼 = 5, the data
of FedEL. This section also verifies the robustness of FedEL and its distribution is nearly uniform distribution, and for each sample class,
combinability with other algorithms. Finally, the reasoning time of each client has a relatively average amount. Moreover, we also conduct
FedEL is compared with that of common ensemble learning algorithms. experiments under extreme distribution. For example, the number of
sample types for CIFAR-10 is 10. Each client have only two random
4.1. Experimental setup samples.
4.1.1. Implementation 4.1.3. Model architecture

In the experiment, GPU is a single NVIDIA GeForce RTX3090, CPU is We use two model architectures to conduct experiments. The first
Intel Xeon Silver 4210R, and the neural network model is built by using model architecture is a 7-layer CNN network. The model architecture
the Pytorch library. The experimental results are statistically analyzed includes two convolution layers, The first and second layer have 6
by the Numpy library in Python, and all experimental result graphs are and 16 channels respectively. The size of the convolutional kernel is
drawn by the Matplotlib library in Python. The model reasoning time 5 × 5, with a 2 × 2 max pooling layer following each convolutional
in the experiment includes data I/O and preprocessing. Moreover, we layer. Then there are two fully connected layers, with 120 and 84
mainly study the problem of decreasing accuracy of federated learning units. Because MOON adopts the method of contrastive learning, the
model due to data heterogeneity, so most experiments take accuracy as projection head is critical in the network structure. In order to be
the evaluation indicator. consistent with MOON, we adopt a two-layer MLP as the projection
head in the network architecture, with 84 and 256 units. Finally, a
4.1.2. Validation procedure fully connected layer is connected. The first two convolutional layers
This work conducts image classification experiments on three nat- are used as the feature extraction module 𝐺(⋅), and the later network
ural image datasets (CIFAR10, CIFAR100, Tiny-Imagenet) and one structure is used as the classifier module 𝐶(⋅). For the second model
medical image dataset (Hyper-kvasir). Among them, we randomly di- architecture, the Resnet-18 (He, Zhang, Ren, & Sun, 2016) is used as
vided the image data in Hyper-Kvasir into training and test set in an 8:2 𝐺(⋅) and replace the last fully connected layer of Resnet-18 with 𝐶(⋅)
ratio. Other datasets already have training set and test set. We perform of the first model architecture. The first model architecture is used for
comparison experiments with FedAvg, FedProx, SCAFFOLD, MOON, CIFAR-10 and Hyper-Kvasir, and the second model architecture is used
and FedDyn. Moreover, we also compare two ensemble learning meth- for other dataset.
ods, including Voting and Averaging. The so-called voting method and
averaging method refers to the prediction of samples using a set of weak 4.1.4. Parameter settings
classifiers, and the predicted results are obtained by voting or averaging For consistency with the experiments conducted in the previous
the results. This set of weak classifiers is obtained after the Step2 of study (Li et al., 2021), we use the SGD optimizer, and the learning rate
FedEL. The local data is generated by the Dirichlet distribution, which is 0.01 for all other methods and step1 of FedEL. The value of SGD
5
Fig. 3. Thermal maps with different Dirichlet distribution parameters on CIFAR-10. (a) 𝛼 = 0.1. (b) 𝛼 = 0.5. (c) 𝛼 = 5.
momentum is 0.9, and weight decay is 0.00001. The value of batch Table 1
size is 64. The local updates is 10. The communication rounds is 30 for Accuracy under different degrees of non-IID on CIFAR-10.
Tiny-Imagenet and Hyper-Kvasir and 100 for CIFAR-10 and CIFAR-100 Model 𝛼 = 0.1 𝛼 = 0.5 𝛼=5
(More communication rounds will not help with accuracy). For step4 𝑤𝑔 55.83% 58.11% 64.50%
of FedEL, the learning rate is 0.1, and the value of local updates is 1. 𝑤1 𝟓𝟕.𝟕𝟕% 𝟔𝟎.𝟔𝟒% 64.82%
𝑤2 57.57% 60.62% 𝟔𝟒.𝟖𝟔%
In all the following experiments, if the value of the super parameter is
not indicated, the above super parameter values are used by default. In
the following experiments, by default, the number of clients is 10.
4.4. Comparative experiments
4.2. Datasets introduction
For each client, fine-tuning training in the Step2 of FedEL, different
1. CIFAR-10: It possesses 10 categories. Each category has 6000
local updates are used for different datasets. For CIFAR-10 and Hyper-
RGB images. The size of each image is 32 × 32.
Kvasir, local updates are set to 10. For other datasets, local updates are
2. CIFAR-100: It possesses 100 categories. Each category has 600
set to 1. Since the loss functions of FedProx, MOON, and FedDyn both
RGB images. The size of each image is 32 × 32.
have a hyper-parameter 𝜇 (i.e., 𝓁 ← 𝓁𝐶𝐸 + 𝜇𝓁𝑝𝑟𝑜𝑥 , 𝓁 ← 𝓁𝐶𝐸 + 𝜇𝓁𝑐𝑜𝑛 ,
3. Tiny-Imagenet: It contains 200 categories. Each category has 600
RGB images. The size of each image is 64 × 64. 𝓁 ← 𝓁𝐶𝐸 + 𝜇𝓁𝑑𝑦𝑛 ). Therefore, we run several experiments to ensure a
fair comparison, using different 𝜇 and reporting the best results. For
4. Hyper-Kvasir: It is a significant public gastrointestinal endoscopy
dataset. The data came from actual patients. We select 8 types FedPorx and FedDyn, 𝜇 is set 0.001, 0.01, and 0.1. For MOON, 𝜇 is set
of gastrointestinal disease samples for the image classification 0.1, 1, and 5. For FedPorx, the best value for 𝜇 on all datasets is 0.001.
experiment. The images come in different sizes. For FedDyn, the best value of 𝜇 on all datasets is 0.1. For Moon, the
best value of 𝜇 is 5, 1, 1, and 1, respectively.
4.3. Effect of ensemble learning Table 2 presents the accuracy of FedEL, five advanced FL algo-
rithms, and two common ensemble learning algorithms on three natural
To verify the help of ensemble learning for non-IID problems in scene datasets with varying levels of non-IID data settings. Although
FL, we conducted a simple image classification experiment on CIFAR- FedProx, SCAFFOLD, MOON, FedDyn, Voting, and Averaging meth-
10. More specific experimental settings are as follows: The FL training ods performed better than the baseline FedAvg method under many
algorithm adopts FedAvg. There are 10 clients involved in the training. experimental settings, these methods do not consistently exceed the
The network model uses the first network structure, the 7-layer CNN. baseline method in all experiments. However, the experimental results
The global training rounds are 50 (more training rounds have little of FedEL not only exceeded the baseline method but also outperformed
benefit for accuracy), and the local updates of the client are 10. The the above state-of-the-art methods under every experimental settings.
batch size is 64. We use the SGD optimizer and the learning rate is It is noteworthy that the smaller the value of 𝛼 is and it is more
0.001. Experiments are carried out with different degrees of non-IID obvious the improvement of FedEL compared with baseline FedAvg
data. is. This shows that FedEL effectively utilizes the heterogeneity of data
After 50 training rounds, the global model is obtained, denoted as
distribution to strengthen the diversity among weak learners to enhance
𝑤𝑔 . Then, an additional round of FL training is carried out. However,
the accuracy of the global model. Specifically, when 𝛼 = 0.1, FedEL
after the local model update is uploaded, the server does not aggregate
achieves an average accuracy that is 2.29% higher than the baseline
the
{ 1models but }directly saves the set of local models, denoted as 𝑤1 =
FedAvg. while at 𝛼 = 0.5, the average accuracy of FedEL is 1.88%
𝑤𝑙 , 𝑤2𝑙 , … , 𝑤10
𝑙
. In addition, add another set of experiments, the same
way as obtained 𝑤1 . However, during the additional round, the module higher than FedAvg.
Table 3 presents the test accuracy results of all methods under
{ 1 of 2the model
𝐺(⋅) } is frozen, the parameter is not updated, and 𝑤2 = extreme non-IID settings for three natural scene datasets. In CIFAR-10,
𝑤𝑙∗ , 𝑤𝑙∗ , … , 𝑤10
𝑙∗
is obtained. For 𝑤1 and 𝑤2 , the prediction results of
samples are calculated as Eq. (8). This prediction form is the classic each client has only two data classes with equal amounts. Similarly,
Averaging method in ensemble learning. for CIFAR-100 and Hyper-Kvasir, each client has only 10 or 20 data
types. FedProx, MOON, and FedDyn perform better than the baseline
1 ∑ 𝑘
10
𝑦̂ = 𝑤 (𝑥) (8) FedAvg in most experimental settings, but the accuracy of SCAFFOLD
10 𝑘=1 𝑙 is worse than FedAvg. Averaging performs better than FedAvg in all
As shown in Table 1, 𝛼 represents the degree of non-IID. It can experimental settings. However, the accuracy of Voting is inferior, in-
be found that with the deepening of data distribution skew, 𝑤1 and dicating that the voting method is unsuitable for extreme non-IID data
𝑤2 have obvious advantages compared with 𝑤𝑔 , which proves the settings. The FedEL outperformed other methods in all experimental
effectiveness of our idea. It is preliminarily proved that ensemble Settings, demonstrating the effectiveness of FedEL in the case of highly
learning can effectively help solve the non-IID problem. Moreover, the heterogeneous data distribution.
gap between 𝑤1 and 𝑤2 is small. This indicates that non-IID data may Table 4 displays the test accuracy results of all methods for a
significantly affect the last few layers of the global model architecture medical image dataset across various non-IID settings. FedProx, MOON,
in federated learning. and FedDyn perform better than the baseline FedAvg under some
6
Table 2
Accuracy under different degrees of non-IID in natural scene image dataset.
Datasets CIFAR-10 CIFAR-100 Tiny-Imagenet
non-IID 𝛼 = 0.1 𝛼 = 0.5 𝛼=5 𝛼 = 0.1 𝛼 = 0.5 𝛼=5 𝛼 = 0.1 𝛼 = 0.5 𝛼=5
FedAvg 64.95% 67.22% 73.20% 63.24% 70.00% 70.96% 31.37% 37.61% 38.92%
FedProx 65.21% 67.71% 72.55% 63.21% 70.14% 71.04% 31.90% 37.35% 38.09%
SCAFFOLD 64.03% 68.62% 72.96% 62.13% 68.96% 70.54% 28.78% 31.71% 32.63%
MOON 61.56% 68.35% 73.08% 63.44% 70.56% 71.53% 32.21% 37.81% 38.60%
FedDyn 65.10% 67.04% 72.68% 63.45% 69.87% 71.30% 31.30% 36.94% 37.41%
Voting 61.73% 67.64% 73.75% 63.29% 70.22% 70.95% 32.00% 37.98% 38.77%
Averaging 65.04% 68.93% 73.52% 63.99% 70.35% 71.06% 33.36% 38.28% 39.02%
FedEL 𝟔𝟖.𝟐𝟗% 𝟕𝟏.𝟎𝟕% 𝟕𝟑.𝟖𝟓% 𝟔𝟒.𝟒𝟏% 𝟕𝟎.𝟖𝟒% 𝟕𝟏.𝟔𝟕% 𝟑𝟑.𝟕𝟐% 𝟑𝟖.𝟓𝟔% 𝟑𝟗.𝟐𝟒%
Table 3 Table 5
Accuracy under extreme non-IID in natural scene image dataset. Accuracy of different fine-tuning combinations.
Datasets CIFAR-10 CIFAR-100 Tiny-Imagenet Datasets CIFAR-10
non-IID 2-classes 10-classes 20-classes
non-IID 𝛼 = 0.1 𝛼 = 0.5
FedAvg 56.70% 39.86% 16.50%
Voting (0.01, 1) 65.29% 68.92%
FedProx 59.42% 40.49% 15.80%
Voting (0.01, 5) 64.24% 69.31%
SCAFFOLD 54.16% 32.15% 13.51%
Voting (0.01, 10) 61.73% 67.64%
MOON 58.78% 41.10% 16.47%
Voting (0.1, 10) 61.72% 69.41%
FedDyn 57.24% 40.37% 16.83%
Voting (0.001, 10) 65.36% 68.76%
Voting 32.88% 29.34% 8.53%
Averaging (0.01, 1) 66.88% 70.13%
Averaging 61.16% 41.19% 17.73%
Averaging (0.01, 5) 65.61% 69.23%
FedEL 𝟔𝟏.𝟓𝟑% 𝟒𝟏.𝟐𝟗% 𝟐𝟎.𝟒𝟑%
Averaging (0.01, 10) 65.04% 68.93%
Averaging (0.1, 10) 61.49% 67.75%
Averaging (0.001, 10) 66.86% 70.14%
Table 4
Accuracy under different degrees of non-IID on Hyper-Kvasir.
Datasets Hyper-Kvasir Table 6
non-IID 𝛼 = 0.1 𝛼 = 0.5 𝛼=5 Reasoning time of different ensemble methods.
Datasets CIFAR-10 CIFAR-100 Tiny-Imagenet Hyper-Kvasir
FedAvg 75.78% 81.89% 86.81%
FedProx 77.72% 79.88% 86.21% Voting 3.33 s 8.94 s 29.85 s 28.47 s
SCAFFOLD 69.45% 79.14% 81.52% Averaging 3.31 s 8.91 s 29.83 s 28.51 s
MOON 73.29% 84.12% 86.34% FedEL 𝟐.𝟑𝟔 s 𝟑.𝟗𝟒 s 𝟗.𝟎𝟐 s 𝟐𝟔.𝟗𝟕 s
FedDyn 78.80% 83.68% 86.36%
Voting 78.17% 84.86% 87.28%
Averaging 80.48% 85.11% 87.92%
FedEL 𝟖𝟑.𝟎𝟖% 𝟖𝟓.𝟐𝟕% 𝟖𝟖.𝟒𝟓%
4.6. Combination of FedEL with other methods
As described in Section 3, FedEL consists of four steps, with Step1

experimental settings, while the accuracy of SCAFFOLD is worse than based on FedAvg in the previous experiments. To assess the compati-
FedAvg. Voting and Averaging perform better than the baseline FedAvg bility of FedEL with other methods, we substituted Step1 with various
under all experimental settings. The FedEL is also significantly higher approaches and conducted further experiments. The value of 𝛼 in all
than baseline and other methods in all experimental Settings, demon- experiments is 0.1. Fig. 5 displays the results of these experiments,
strating that the FedEL is effective not only for natural scene image data indicating that all methods combined with FedEL improved accuracy
to varying degrees compared to the previous experimental results.
but also for medical image data.
Notably, combining SCAFFOLD with FedEL did not yield significant
accuracy improvement. We postulate that SCAFFOLD’s use of a control
4.5. Different learning rates and local updates
variable (variance reduction) to adjust its local updates may compro-
mise the diversity of weak classifiers and ultimately hinder accuracy
In Step 2 of FedEL, the setting of local updates and the learning improvement.
rate is crucial when fine-tuning the classifier module. A small learning
rate or too many local updates will increase the diversity of classifiers. 4.7. Comparison of reasoning time
However, much global information will be lost, leading to the deterio-
ration of the prediction performance of a single classifier. Conversely, Voting and Averaging require multiple local models to be stored,
too small learning rate or too few local updates cannot ensure the whereas FedEL integrates classifier modules into a single model, result-
diversity of the classifier. On the CIFAR-10 dataset, we set two non-IID ing in a significant saving in storage space. Thus, we only compared
environments, 𝛼 = 0.1 and 𝛼 = 0.5, and the performance of FedEL under the inference time of Voting, Averaging, and FedEL on four datasets.
different local updates and learning rate is tested. Table 6 presents the experimental results. The results demonstrate that,
As shown in Fig. 4. The accuracy of FedEL can surpass the base- compared to the traditional ensemble learning methods of Voting and
line by over 2.5% across various learning rates and local updates for Averaging, FedEL significantly reduces the inference time.
𝛼 = 0.1 and 𝛼 = 0.5. Additionally, the accuracy fluctuation range
is within 1%. This indicates that the setting of local updates and 5. Conclusion and future work
learning rate does directly affect the performance of FedEL. However,
compared with FedAvg, FedEL can improve the model accuracy under Federated learning is a feasible method to use data information
various experimental Settings. Table 5 shows the performance of Voting stored in different local locations and protect data privacy. However,
and Averaging under different learning rates and local update times. non-IID data poses a significant challenge in FL. We propose FedEL, a
The test accuracy of both methods varies significantly with different federated ensemble learning algorithm, to resolve this challenge. The
learning rates and local updates. key idea of FedEL is that each local model is a weak classifier in the late
7
Fig. 4. The top-1 test accuracy under different learning rates and local updates on CIFAR-10 dataset. (a) 𝛼 = 0.1. (b) 𝛼 = 0.5.
Fig. 5. The top1 test accuracy of FedEL based on different methods. (a) CIFAR-10, 𝛼 = 0.1. (b) CIFAR-100, 𝛼 = 0.1, (c) Tiny-Imagenet, 𝛼 = 0.1, (d) Hyper-Kvasir, 𝛼 = 0.1.
FL period. Integrating this group of weak classifiers can achieve better number of parameters of the model will be greatly increased, result-
results than model aggregation. The traditional FL algorithm ensures ing in an increase in storage space and extension of inference time.
the good predictive performance of this group of weak classifiers, In future work, we will consider introducing a clustering algorithm
while the data heterogeneity improves the diversity of this group. to group clients with similar data volume or distribution and select
Experimental results on natural and medical image datasets demon- representatives from each cluster to participate in FedEL training. It
strate that FedEL significantly improves model performance. Compared can reduce the quantity of parameters in the newly constructed global
to traditional ensemble learning methods, FedEL effectively reduces model.
storage and reasoning costs. Additionally, FedEL is compatible with
other state-of-the-art FL algorithms. FedEL can improve the accuracy CRediT authorship contribution statement
of global models in non-IID scenarios and is a training process model
independent of model architecture. Therefore, it can be widely applied Xing Wu: Conceptualization, Methodology, Funding acquisition,
in many fields such as healthcare, and finance. However, compared Project administration, Supervision, Writing – review & editing. Jie
with the existing research, FedEL has limitations. In large-scale scenar- Pei: Writing – original draft, Data curation, Software, Implementation,
ios, due to the increasing number of clients and the need to integrate Investigation, Formal analysis, Visualization. Xian-Hua Han: Supervi-
all client classifier modules into one model to build a new model, the sion, Writing – review & editing. Yen-Wei Chen: Supervision, Writing
8
– review & editing. Junfeng Yao: Supervision, Writing – review & Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., et al.
editing. Yang Liu: Supervision, Writing – review & editing. Quan (2021). Advances and open problems in federated learning. Foundations and Trends®
in Machine Learning, 14(1–2), 1–210.
Qian: Supervision, Writing – review & editing. Yike Guo: Supervision,
Karimireddy, S. P., Kale, S., Mohri, M., Reddi, S. J., Stich, S. U., & Suresh, A. T. (2019).
Writing – review & editing. SCAFFOLD: Stochastic controlled averaging for on-device federated learning.
Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny
Declaration of competing interest images. Citeseer.
Li, L., Fan, Y., Tse, M., & Lin, K.-Y. (2020). A review of applications in federated
learning. Computers & Industrial Engineering, 149, Article 106854.
We declare that we have no financial and personal relationships Li, Q., He, B., & Song, D. (2021). Model-contrastive federated learning. In Proceed-
with other people or organizations that can inappropriately influence ings of the IEEE/CVF conference on computer vision and pattern recognition (pp.
our work, there is no professional or other personal interest of any 10713–10722).
nature or kind in any product, service and/or company that could be Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020).
Federated optimization in heterogeneous networks. Proceedings of Machine Learning
construed as influencing the position presented in, or the review of, the
and Systems, 2, 429–450.
manuscript entitled. Lin, T., Kong, L., Stich, S. U., & Jaggi, M. (2020). Ensemble distillation for robust model
fusion in federated learning. Advances in Neural Information Processing Systems, 33,
Acknowledgments 2351–2363.
Liu, Y., Huang, A., Luo, Y., Huang, H., Liu, Y., Chen, Y., et al. (2020). Fedvision:
An online visual object detection platform powered by federated learning. In
This work is supported by the National Natural Science Foundation Proceedings of the AAAI conference on artificial intelligence, Vol. 34, no. 08 (pp.
of China (Grant No. 62172267), This work was sponsored by the 13172–13179).
National Key Research and Development Program of China (Grant No. McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017).
2022YFB3707800), the State Key Program of National Natural Science Communication-efficient learning of deep networks from decentralized data. In
Artificial intelligence and statistics (pp. 1273–1282). PMLR.
Foundation of China (Grant No. 61936001), the Shanghai Rising-Star
Paragliola, G., & Coronato, A. (2022). Definition of a novel federated learning approach
Program, China (Grant NO. 21QB1401900), the Key Research Project to reduce communication costs. Expert Systems with Applications, 189, Article
of Zhejiang Laboratory (No. 2021PE0AC02). 116109.
Voigt, P., & Von dem Bussche, A. (2017). The eu general data protection regulation
References (gdpr). In A practical guide, Vol. 10, no. 3152676 (1st Ed.). (pp. 10–5555). Cham:
Springer International Publishing, Springer.
Wang, J., Liu, Q., Liang, H., Joshi, G., & Poor, H. V. (2020). Tackling the objective
Benbrahim, H., & Behloul, A. (2021). Fine-tuned xception for image classification on
inconsistency problem in heterogeneous federated optimization. Advances in Neural
tiny imagenet. In 2021 international conference on artificial intelligence for cyber
Information Processing Systems, 33, 7611–7623.
security systems and privacy (pp. 1–4). IEEE.
Wang, T., Zheng, L., Lv, H., Zhou, C., Shen, Y., Qiu, Q., et al. (2023). A distributed joint
Borger, T., Mosteiro, P., Kaya, H., Rijcken, E., Salah, A. A., Scheepers, F., et al. (2022).
extraction framework for sedimentological entities and relations with federated
Federated learning for violence incident prediction in a simulated cross-institutional
learning. Expert Systems with Applications, 213, Article 119216.
psychiatric setting. Expert Systems with Applications, 199, Article 116720.
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259.
Borgli, H., Thambawita, V., Smedsrud, P. H., Hicks, S., Jha, D., Eskeland, S. L., et
Woźniak, M., Siłka, J., & Wieczorek, M. (2021). Deep neural network correlation learn-
al. (2020). HyperKvasir, a comprehensive multi-class image and video dataset for
ing mechanism for CT brain tumor detection. Neural Computing and Applications,
gastrointestinal endoscopy. Scientific Data, 7(1), 283.
1–16.
Cheng, Y., Liu, Y., Chen, T., & Yang, Q. (2020). Federated learning for
Woźniak, M., Wieczorek, M., & Siłka, J. (2023). BiLSTM deep neural network model
privacy-preserving AI. Communications of the ACM, 63(12), 33–36.
for imbalanced medical data of IoT systems. Future Generation Computer Systems,
Durmus, A. E., Yue, Z., Ramon, M., Matthew, M., Paul, W., & Venkatesh, S. (2021).
141, 489–499.
Federated learning based on dynamic regularization. In International conference on
Wu, X., Chen, C., Li, P., Zhong, M., Wang, J., Qian, Q., et al. (2022). FTAP:
learning representations.
Feature transferring autonomous machine learning pipeline. Information Sciences,
Gamboa-Montero, J. J., Alonso-Martin, F., Marques-Villarroya, S., Sequeira, J., &
593, 385–397.
Salichs, M. A. (2023). Asynchronous federated learning system for human–robot
Wu, X., Liang, Z., & Wang, J. (2020). Fedmed: A federated learning framework for
touch interaction. Expert Systems with Applications, 211, Article 118510.
language modeling. Sensors, 20(14), 4048.
Gao, L., Fu, H., Li, L., Chen, Y., Xu, M., & Xu, C.-Z. (2022). FedDC: Federated learning
Wu, X., Pei, J., Chen, C., Zhu, Y., Wang, J., Qian, Q., et al. (2022). Federated
with non-IID data via local drift decoupling and correction. In Proceedings of the
active learning for multicenter collaborative disease diagnosis. IEEE Transactions
IEEE/CVF conference on computer vision and pattern recognition (pp. 10112–10121).
on Medical Imaging.
Gong, M., Feng, J., & Xie, Y. (2020). Privacy-enhanced multi-party deep learning. Neural
Wu, X., Zhong, M., Guo, Y., & Fujita, H. (2020). The assessment of small bowel motility
Networks, 121, 484–496.
with attentive deformable neural network. Information Sciences, 508, 22–32.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image
Yang, Q., Fan, L., Tong, R., & Lv, A. (2021). White paper-IEEE federated machine learning.
recognition. In Proceedings of the IEEE conference on computer vision and pattern
IEEE.
recognition (pp. 770–778).
Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and
applications. ACM Transactions on Intelligent Systems and Technology, 10(2), 1–19.
Zhu, H., Xu, J., Liu, S., & Jin, Y. (2021). Federated learning on non-IID data: A survey.
Neurocomputing, 465, 371–390.

Federated Learning With Non-Iid Data 2023

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Federated Learning With Non-Iid Data 2023

Uploaded by

Copyright:

Available Formats

Expert Systems With Applications 237 (2024) 121390

Contents lists available at ScienceDirect

Expert Systems With Applications

FedEL: Federated ensemble learning for non-iid data

ARTICLE INFO ABSTRACT

Fig. 1. The FedEL framework.

1. Conduct FL training (for example, use the FedAvg algorithm).

Fig. 2. The new global model building process in FedEL.

4.1.1. Implementation 4.1.3. Model architecture

As described in Section 3, FedEL consists of four steps, with Step1

You might also like