Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views

Distributed_Learning_for_Wireless_Communications_Methods_Applications_and_Challenges

This paper reviews the role of distributed learning in wireless communications, highlighting its importance for next-generation mobile networks like 6G. It discusses various frameworks and algorithms, particularly federated learning, and their applications in physical, media access control, and network layers. The paper also addresses the challenges and future research directions in implementing distributed learning for efficient communication systems.

Uploaded by

Bhanu B Prakash
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Distributed_Learning_for_Wireless_Communications_Methods_Applications_and_Challenges

This paper reviews the role of distributed learning in wireless communications, highlighting its importance for next-generation mobile networks like 6G. It discusses various frameworks and algorithms, particularly federated learning, and their applications in physical, media access control, and network layers. The paper also addresses the challenges and future research directions in implementing distributed learning for efficient communication systems.

Uploaded by

Bhanu B Prakash
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

326 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 16, NO.

3, APRIL 2022

Distributed Learning for Wireless Communications:


Methods, Applications and Challenges
Liangxin Qian, Ping Yang , Senior Member, IEEE, Ming Xiao , Senior Member, IEEE,
Octavia A. Dobre , Fellow, IEEE, Marco Di Renzo , Fellow, IEEE, Jun Li , Senior Member, IEEE,
Zhu Han , Fellow, IEEE, Qin Yi, and Jiarong Zhao

Abstract—With its privacy-preserving and decentralized fea- of next generation mobile networks, e.g., the sixth generation
tures, distributed learning plays an irreplaceable role in the era mobile networks (6G), have attracted great attention [4]–[7]. Re-
of wireless networks with a plethora of smart terminals, an explo-
cently, machine learning-based methods have been viewed as a
sion of information volume and increasingly sensitive data privacy
issues. There is a tremendous increase in the number of scholars key enabler for 6G, since the key issues behind synchronization,
investigating how distributed learning can be employed to emerging channel estimation, equalization, multiple-input multiple-output
wireless network paradigms in the physical layer, media access con- (MIMO) signal detection, iterative decoding, and multi-user
trol layer and network layer. Nonetheless, research on distributed detection in communication systems can be solved by using
learning for wireless communications is still in its infancy. In this
carefully designed machine learning algorithms [8]–[10]. In
paper, we review the contemporary technical applications of dis-
tributed learning for wireless communications. We first introduce addition to academia and industry, the standardization bodies
the typical frameworks and algorithms for distributed learning. are considering to include machine learning in future mobile
Examples of applications of distributed learning frameworks in networks [11]. For instance, in Release 16, 3GPP has started
emerging wireless network paradigms are then provided. Finally, to improve the data exposure capability to support data-driven
main research directions and challenges of distributed learning for
machine learning [12].
wireless communications are discussed.
To date, most existing machine learning approaches and
Index Terms—Distributed learning, federated learning, wireless solutions for communication networks require centralizing the
communications. training data and inference processes at a single data center [8],
[10]. In other words, the collected data have to be first sent to a
I. INTRODUCTION center server (or cloud) and analyzed, and then, the results are
ITH the fast development of smart terminals and emerg- sent back to the actuators. However, due to privacy constraints
W ing new applications (e.g., real-time and interactive ser-
vices and Internet-of-Things (IoT)), communication data traffic
and limited communication resources for data transmission in
networks, it is impractical for all communication devices that are
has drastically increased, and current communication networks engaged in learning to transmit all of their collected data to a data
cannot sufficiently match the quickly rising technical require- center or a cloud that can subsequently use a centralized learning
ments [1]–[3]. As a result, the expectation and development algorithm for data analysis. To elaborate further, the centralized
machine learning approaches have inherent disadvantages that
Manuscript received September 5, 2021; revised January 20, 2022; accepted limit their practicality, such as significant signaling overhead, in-
February 21, 2022. Date of publication March 14, 2022; date of current version creased implementation complexity and high latency in dealing
May 17, 2022. This work is supported by the National Key R&D Program of
China under Grant 2020YFB1807203, the National Science Foundation of China
with communication problems [13]–[15]. Moreover, emerging
under Grant 61876033 and the National Science Foundation of China under wireless networking paradigms, e.g., cognitive radio networks,
Grant U19B2014. The guest editor coordinating the review of this manuscript industrial control networks, device-to-device (D2D) communi-
and approving it for publication was Prof. T. Q. S. Quek. (Corresponding author:
Ping Yang.)
cations and unmanned aerial vehicle (UAV)-based swarming
Liangxin Qian, Ping Yang, Qin Yi, and Jiarong Zhao are with the networks are inherently distributed [16]–[18]. Furthermore, in
National Key Laboratory of Science and Technology on Communications, view of future applications, the centralized approaches may not
University of Electronic Science and Technology of China, Sichuan 611731,
China (e-mail: 201921220237@std.uestc.edu.cn; yang.ping@uestc.edu.cn;
be suitable for applications that require low latency, such as
yq1835940812@163.com; zhaojiarong21@163.com). controlling a self-driving car or sending instructions to a robotic
Octavia A. Dobre is with the Memorial University of Newfoundland, St. Johns surgeon. For mission-critical tasks, wireless systems must make
NL A1B 3X9, Canada (e-mail: odobre@mun.ca).
Ming Xiao is with the Royal Institute of Technology, 114 28 Stockholm,
quick and reliable decisions at the network edge.
Sweden (e-mail: mingx@kth.se). To solve this massive scalability challenge while addressing
Marco Di Renzo is with the Universitè Paris-Saclay, CNRS, CentraleSupèlec, privacy, latency, reliability and bandwidth efficiency, distributed
Laboratoire des Signaux et Systèmes, 91192 Gif-sur-Yvette, France (e-mail:
marco.di-renzo@universite-paris-saclay.fr).
learning frameworks [19]–[23], e.g., federated learning
Jun Li is with the Nanjing University of Science and Technology, Nanjing (FL) [24]–[26] and MapReduce [42], are needed, and conse-
210094, China (e-mail: jun.li@njust.edu.cn). quently intelligence must be pushed to the network edge in future
Zhu Han is with the University of Houston, Texas, TX 77004 USA (e-mail:
zhan2@uh.edu).
communication systems by using appropriate optimization
Digital Object Identifier 10.1109/JSTSP.2022.3156756 algorithms, e.g., alternating direction method of multipliers

1932-4553 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
QIAN et al.: DISTRIBUTED LEARNING FOR WIRELESS COMMUNICATIONS: METHODS, APPLICATIONS AND CHALLENGES 327

(ADMM) [27], [28] and distributed gradient descend [29]. Against this background and to explore whether distributed
In these frameworks, communication units/devices/nodes are learning is suitable for wireless communication scenarios, we re-
capable of collaboratively building a shared learning model view the research works on distributed learning for wireless com-
by training their collected data locally. Considering their munications in recent years. The framework and algorithms of
potential applications across industry, business, utilities and distributed learning and other meritorious variants are provided.
the public sector, distributed machine learning techniques have Besides, we discuss the potential applications of distributed
attracted significant research attention in communication system learning in wireless communications. In particular, we focus our
design. For example, FL has been proposed to enable users to attention on the physical layer, the medium access control layer,
collaboratively learn a shared prediction model while keeping the network layer, and other novel fields such as blockchain and
their collected data on their devices for user behavior prediction, tensor-based technologies. Then, we also discuss the main future
user identification, and wireless environment analysis [24]. research directions and challenges that distributed learning may
Similarly, to increase robustness, ADMM is widely considered face in wireless communications, including the communication
for large scale distributed learning. Likewise, distributed cost, low-latency communication, security, and robustness.
gradient descend methods are also studied for various potential This paper is organized as follows. Section II reviews the
applications [27], [28]. conventional distributed learning architectures, algorithms, and
However, the field of decentralized/distributed machine their relevant variants. Potential distributed learning applications
learning is still at its infancy as there are many open theoretical in wireless networks and primary future research directions
and practical problems yet to be addressed, such as robustness, as well as challenges are presented in Sections III and IV,
privacy, communication costs, convergence, complexity and respectively. Finally, Section V concludes the paper.
combinations with physical layer transmission networks [9],
[19], [24]. To provide solutions to these challenging problems,
it is necessary to take advantage of local and global information II. DISTRIBUTED LEARNING ARCHITECTURES
AND ALGORITHMS
including the background information (i.e., the environment
knowledge) as well as the locally collected information. Then, Research activities on distributed learning have been con-
advanced signal processing techniques are also required to ducted for more than a decade, during which many frameworks
achieve high robustness, ultra-low latency, massive connectivity, and algorithms have emerged. The vast majority of the research
and ultra-high reliability through network/information content is based on the FL framework. In this section, we
cooperation. present some typical distributed machine learning architectures
Several papers have studied distributed learning in wireless and algorithms.
communications, but the focus is less consistent. The authors
of [30] summarize the technical challenges of distributed learn-
ing and existing frameworks and their limitations. Machine A. Distributed Learning Architectures
learning and communication techniques are also applied to 1) Parameter Server Architecture: The parameter server
achieve efficient communication. In addition, the authors of [31] framework, such as the 3GPP implementation in [36], is a clas-
address the challenges in distributed learning by focusing on sical architecture in distributed machine learning. It is the most
three aspects: learning algorithms, system architecture and net- widely used centralized multi-node machine learning approach,
work infrastructure, and conduct an experimental study on com- consisting of one or several server nodes and other worker nodes.
munication optimization techniques. The authors of [42] provide Server nodes and worker nodes can send messages to each other,
a systematic summary of the algorithms and architecture of and the parameter model is shared globally. Worker nodes obtain
distributed machine learning, which focuses on the topology the model parameters from server nodes, compute parameters
configuration and recovery of wireless networks, power man- (e.g., gradients) with the locally collected data sets, and return
agement, wireless resource allocation, quality of service (QoS), them to server nodes. Then, the server nodes update the param-
and mobile edge computing (MEC). eter model once by using some optimization algorithms (e.g.,
In addition, there are articles that focus on FL. The authors stochastic gradient descent (SGD)). This process is repeated
of [39] compare federated learning with other machine learning until the parametric model converges to a certain precision.
methods and present applications of federated learning in edge Therefore, the computing cost occurs mainly at worker nodes.
computing and spectrum management. The authors of [32] Besides, since local data does not leave worker nodes, this
present a distributed learning architecture for 6G and the chal- framework has a certain degree of privacy protection.
lenges for the high performance requirements of 6G networks. • Federated Learning: FL is an emerging distributed machine
The authors of [35] provide an introduction to the research and learning approach, first proposed by Google in [37], which has
progress of federated learning in IoT systems. The authors of an extension framework of the parameter server architecture.
[33] classify the contributions of federated learning in research The difference is that the worker nodes in the parameter server
and industry, establish a classification of federated learning framework belong to the server modes and the computational
application domains, and provide a focused analysis of feder- performance and availability are guaranteed, while the worker
ated learning applications in privacy and resource management. nodes in FL are autonomous and their capacity, distribution
The authors of [34] describe the future challenges of federated of data samples and computational performance vary, and the
learning and introduce potential techniques to address them. availability is also not stable, which may cause some traditional

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
328 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 16, NO. 3, APRIL 2022

Algorithm 1: Federated Averaging Algorithm (FedAvg).


Input: Number of worker nodes K, number of local
training epochs I, number of global aggregation epochs J,
learning rate α, global parameter model ωG , local
parameter model ωt , and the gradient of the loss function
∇F (ωt ).
Output: ωG
Local learning phase:
Step1: The worker nodes get the global parameter
model ωt that the server node has initialized or
updated.
Step2: For local training epochs i = 1, . . . , I, the kth
device, does ωtk ← ωt − α∇F (ωt ; k). Fig. 1. Communication loads versus communication rounds in centralized
learning and federated learning.
Global aggregation phase:
For global aggregation epochs j = 1, . . . , J, do
Step1: Randomly select K new devices as worker are unstable, and they may vary greatly in computational power,
nodes and send the initialized or updated global battery capacity, and memory overhead. Third, the local data
parameter model ωG to the worker nodes. participating in FL are usually non-independent and identically
1 K k
Step2: ωG = K k=1 ωt . distributed (non-i.i.d), and the data varies in the amount and
size from a worker node to another worker node [39]. Last, the
Return result communication overhead in FL is usually much larger than the
computational overhead.
As an example, we compare the communication loads of
algorithms such as parallel stochastic gradient descent to be centralized learning and FL in Fig. 1. The setup of FL is as
inapplicable in FL. follows: 100 agents are connected to a centralized parameter
The sever modes in FL need to aggregate parameters uploaded server to collaboratively learn a general model via the MNIST
by worker nodes for updating the global parameter model. The dataset [37] for digit recognition tasks. 60, 000 training samples
learning process of FL has two main phases: the local training are uniformly partitioned over 100 agents, each of which is only
phase and the global aggregation phase. Next, we will utilize viable at one agent. For each communication round, 10 percent
federated averaging algorithm (FedAvg) to illustrate these two of clients are randomly selected to upload updated local FL
phases [37], [38]. model. We consider a convolutional neural networks (CNN),
In the local training phase, the server node first acts as a consisting of two 5 × 5 convolutional layers, a fully connected
task publisher, selecting K devices in the alternative device layer and a final softmax output layer. As shown in Fig. 1, FL
sets as working nodes and shielding the other remaining de- can significantly reduce the communication loads. Specifically,
vices. The server node sends the initialized global parameter at the communication round equal to 50, the communication
model ω0 to each device for training, ωt ← ω0 . The kth de- loads of FL is 1/100 of the centralized learning, and at the
vice is taken as an example (k = 1, . . . , K), and this device communication round equal to 250, the communication loads
performs several rounds of parameter updates with local data, of FL is still less than 1/10 of the centralized learning.
ωtk ← ωt − α∇F (ωt ; k), where α denotes the learning rate and • Federated Learning Based on Fog Learning: Fog learning is
∇F (ωt ; k) denotes the gradient of the loss function on the kth a novel learning framework proposed in [40], which considers
device. Afterwards, each device returns the updated parameters the network topology between devices, D2D communication
ωtk to the server node. and the collaboration among wired or wireless nodes. Fog
In the global aggregation phase, the server node aggregates the learning encapsulates all IoT elements among edge devices to
parameters computed by the local working nodes to perform the the main server, such as edge computing devices, local area
1 K k
global parameter model update, ωG = K k=1 ωt . The server servers, UAVs, cloud servers and core servers, which drives its
then reselects the new K devices as worker nodes, and sends the multi-layer structure. The fog learning first clusters the devices
updated global parameter model to each device for a new round at the bottom layer, which enables parameter or data sharing
of iteration. The whole process is iterated over several rounds in the same group network. Second, the upper layer servers are
until a specific accuracy is met. The learning process of FedAvg also clustered, and the computing nodes in each server are able to
is summarized in Algorithm 1. communicate and share parameters via wired, wireless or even
FL also differs from most traditional distributed learning in various relay devices (e.g., reconfigurable intelligent surfaces,
several aspects. First, the user at the worker node has control UAVs). Therefore, horizontal communication and vertical pa-
over the local device and data, and the user can control whether rameter transfer among nodes are possible.
the device has sufficient computing power and memory to par- The learning process of fog learning can be roughly described
ticipate in the training. Second, the devices at the worker nodes as follows: the computational nodes in the bottom layer are

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
QIAN et al.: DISTRIBUTED LEARNING FOR WIRELESS COMMUNICATIONS: METHODS, APPLICATIONS AND CHALLENGES 329

trained through data and parameter sharing. The updated pa- three state-of-the-art decentralized machine learning techniques,
rameters are uploaded to the nodes in the upper layer, which Prague, Allreduce-SGD and asynchronous decentralized par-
first perform local aggregation and then upload to the nodes in allel SGD, which achieve 3.7 times, 3.4 times and 1.9 times
the higher layer for next local aggregation. Finally, the param- speedups, respectively.
eters are sent to the core server node for global aggregation. However, the bottleneck in the master node may increase
The D2D communication and local aggregation features of fog the cost of communication and reduction operations with the
learning meet the requirements of contemporary data-intensive number of worker nodes increases.
and latency-sensitive applications to a certain extent. Besides, Ring-AllReduce is an algorithm that alleviates the just men-
the upstream dimensionality reduction also significantly reduces tioned challenge. In the Ring-AllReduce model, no master node
the transmission traffic and the communication cost between is selected and each worker node can communicate with the
different network layers. neighbouring nodes. The computation of task tensors is per-
2) Other Distributed Learning Architectures: The above formed through the exchanges between neighbouring nodes. In
learning frameworks are based on parameter server frameworks, each communication between worker nodes, each worker node
in which all have core server nodes that act as aggregators for sends and receives a part of these task tensor. The received task
centralized learning frameworks. Some of the most adopted tensor part is added to the corresponding part that are already
decentralized learning frameworks are introduced below, which processed at the node. This process iterates until all tasks have
do not contain core server nodes. been transferred or processed. The authors of [49] studied the
• MapReduce: MapReduce is a parallel programming D-PSGD algorithm and showed that the decentralized algo-
paradigm in which users can perform computations through map rithm may outperform the centralized algorithm in distributed
or reduce operations [42], [43]. In this framework, the input data stochastic gradient descent. Also, they compared the distributed
is partitioned into several data blocks for parallel computation PSGD algorithm with the CNTK framework implemented with
on multiple worker nodes. The MapReduce model consists of a AllReduce and showed that D-PSGD requires less inter-node
Map phase and a Reduce phase. In the Map phase, the key-value communication. The authors of [50] used the correlation of gra-
pairs in the original input data are processed to produce a set of dients between nodes to improve the compression efficiency and
intermediate key-value pairs, and then all the same intermediate proposed two examples of gradient compression according to the
key-values are collected for processing in the Reduce phase. communication protocol parameter server or Ring-AllReduce,
Finally, output the processed data. respectively.
Although MapReduce is highly scalable, it has a serious • Other Decentralized Learning Frameworks: As for other
weakness in machine learning as it does not support iteration. decentralized learning frameworks, they are briefly described
The authors of [44] propose an extension to the MapReduce below because they are not applied widely in wireless commu-
programming paradigm called iterative MapReduce and devel- nications. The all-to-all (A2A) architecture in [31] and [41] has
oped an optimizer for iterative MapReduce programs cover- no central server, and the nodes use message passing or other
ing most machine learning techniques. A distributed real-time similar functions to communicate data among themselves. This
optimization method for MapReduce frameworks in emerging learning framework also supports only data parallelism, while
cloud platforms supporting dynamic speed scaling capabilities the graph processing architecture supports model parallelism.
is presented in [45], which is capable of dynamically schedul- The graph processing-based framework distributes the training
ing input data of sufficient size and synthesizing intermediate data and model parameters among computational nodes within
processing results based on the state of the application and the the same cluster, distributing the computational and communi-
data center, and the proposed method is able to significantly cation overhead locally [42], [51].
improve throughput. It is shown in [46] how MapReduce pa- Fig. 2 shows the basic frameworks of the parameter server
rameters affect the distributed processing of machine learning architecture, FL, fog learning, MapReduce, and AllReduce.
programs that are supported by the Hadoop Mahout and Spark
MLlib machine learning libraries. A virtualized cluster is built
on Docker Containers and Hadoop parameters such as number B. Distributed Learning Algorithms
of replicas and data block size are changed to measure DML 1) Deep Learning: Deep learning (DL) is often used to auto-
performance. matically learn implicit functional relationships or data features
• AllReduce: AllReduce is another paradigm for parallel between inputs and outputs data, avoiding a lot of manual
programming. It mainly deploys an operation to reduce task operations (e.g., modeling complex systems, manually charac-
tensors in all worker nodes to a single tensor block and return terizing features). DL is widely used in areas such as voice/image
these blocks to them. One worker node needs to be selected as processing, auto-encoders, sparse coding, and sparse channel
a master node to gather all task tensors and perform reduction estimation [52], [53].
operations locally and then return the processed tensors to other A widely used DL method is deep neural network (DNN),
worker nodes. The authors of [47] proposed a synchronous which is a neural network with multiple layers, including an
AllReduce SGD algorithm for parameter updating. The authors input layer, several hidden layers and an output layer. DNNs
of [48] proposed a communication-efficient asynchronous de- use the nonlinear processing units between multiple layers of
centralized parallel SGD (D-PSGD) method to speed up the neural networks to perform the computation, using optimization
training speed. The proposed algorithm is also compared with algorithms and back propagation mechanisms to minimize the

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
330 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 16, NO. 3, APRIL 2022

Fig. 2. Distributed learning architectures. The workers of the same color in (a) indicate computing terminals with the same computational power and capacity
and the local data of the same color indicate that they are i.i.d. The workers of different colors in (b) indicate computing terminals with different computational
power and capacity and the local data of different colors indicate that they are non-i.i.d. Figure (c) shows the multi-layer learning structure of fog learning, where
the nodes of the same layer (horizontal) perform local aggregations first, and the lower layer transmits the updated model parameters to the upper layer (vertical)
nodes for global aggregations. Figure (d) shows the process of MapReduce architecture. The input data is divided into several blocks, which are then processed to
produce intermediate key-value pairs, and the same ones are merged for processing to output the results. Figure (e) shows the process of AllReduce architecture,
which reduces data in all worker nodes to a single data blocks and returns all these processed blocks to them.

loss function and obtain the model parameters. The mapping binarization method and optimal weight update coding. In [57],
relationship of the feed-forward DNN with L layers can be given an entropy-based distributed deep learning method for gradient
by compression is proposed, mainly consisting of an entropy-based
threshold selection algorithm and an automatic learning rate
yL = FL−1 (FL−2 (· · · (F1 (y0 ; w1 ), · · · ); wL−2 ); wL−1 ), (1)
correction algorithm. The experimental results show that the
where yL is the output vector through L iterations, and wl and method can achieve a gradient compression ratio of about 1000
Fl (rl ; wl ) represent the parameter and activation function of the times while keeping the accuracy constant or even higher com-
lth layer (l = 1, . . . , L), respectively. pared to existing work.
The authors of [54] proposed a fully autonomous power 2) Reinforcement Learning: Reinforcement learning (RL) is
allocation method based on distributed deep learning for cellular a learning algorithm that can cope with dynamic environments
network-based IoT D2D communication to achieve higher cell and control systems to maximize long-term benefits. It has been
throughput by bringing the power set close to optimization. widely used in vehicle to everything (V2X) and MEC net-
In [55], an enhanced federated learning technique is proposed works [58]. An intelligent agent takes an action in the initial state
with an asynchronous learning strategy on the client side and a to interact with the environment and receives a corresponding
time-weighted aggregation of local models on the server. In the reward to move on to the next state until the state with the optimal
asynchronous learning strategy, the different layers of the deep reward is achieved.
neural network are divided into shallow and deep layers, and the In the case of Q-learning, for example, the intelligent agent
parameters of the deep layers are updated less frequently than the interacts with the environment by taking an action at at state st
shallow layers. Experimental results show that the asynchronous according to a certain strategy (e.g., -greedy), and then observes
joint deep learning algorithm outperforms the baseline algorithm the next state st+1 and gets the reward rt (st , at ) for state st+1 .
in terms of communication overhead and model accuracy. To As a result, the Q function is updated as
reduce the communication cost in distributed deep learning, the
authors of [56] proposed a sparse binary compression framework Qt+1 (st , at ) ← Qt (st , at ) + α[rt (st , at )
based on distributed deep learning, combining existing commu-
nication delay and gradient sparsification techniques with a new + γmaxat+1 Qt (st+1 , at+1 ) − Qt (st , at )], (2)

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
QIAN et al.: DISTRIBUTED LEARNING FOR WIRELESS COMMUNICATIONS: METHODS, APPLICATIONS AND CHALLENGES 331

where Qt (st , at ) represents the Q value of state st by taking an is described as follows:


action at , α denotes the learning rate, γ denotes the discount ⎧ k+1
factor and maxat+1 Qt (st+1 , at+1 ) represents the highest Q ⎨x
⎪ = arg min Lρ (x, z k , y k ),
x
value among all actions under state st+1 . This process is repeated z k+1 = arg min Lρ (xk+1 , z, y k ), (5)

⎩ k+1 z
until the action of the optimal Q value is obtained. y = y k + ρ(Axk+1 + Bz k+1 − c).
Q-learning requires computing the Q-values and storing them
in a table, which has very good performance in small-scale In ADMM, the variables x and z are updated in an alternating
models. However, it does not perform well in large-scale models manner. The advantage of ADMM is that when f (x) and g(z)
because these practical models often have to compute more are separable functions, the update of x and z is decomposed
than ten thousand states and the speed of learning will have a into two steps, so that the task can be assigned to different
significant impact on the latency of the system [59]. In this case, nodes for processing, enabling a more efficient distributed
deep reinforcement learning (DRL) or deep Q-learning (DQL) is optimization algorithm.
employed to approximate the Q value by using neural networks To solve large-scale problems, The authors of [65] proposed
instead of computing it, which speeds up the data processing the proximal Jacobian ADMM for parallel and distributed com-
and greatly reduces the system latency [60], [61]. puting with parallel updated variables. In order to reduce the
In large-scale machine-type communication scenarios, the au- number of communication rounds, the communication-censored
thors of [62] proposed a distributed Q-learning-assisted unautho- linearized ADMM (COLA) was proposed in [66], where the
rized random access scheme to mitigate inter-device conflicts. nodes do not communicate directly after each update by the
the authors of [63] proposed a novel collaborative distributed COLA algorithm and have to wait for the communication review.
Q-learning mechanism for resource-constrained machine-type By reviewing the update information of the nodes, if the differ-
communication devices to enable them to find unique random ence between two iterations is small, the nodes can continue to
access time slots for their transmissions and reduce the possible use the previous iteration values for computation, and commu-
conflicts. Simulation results show that the proposed learning nicate only when the difference is sufficiently large. In order
scheme can significantly reduce the random access channel con- to reduce the amount of data transmitted for communication,
gestion in cellular IoT. The authors of [97] proposed an improved the authors of [67] proposed a quantized ADMM. To solve
distributed Q-learning algorithm to address the optimization of the problem of working points competing for communication
the energy efficiency and delay trade-offs in a cellular network resources in the network, the authors of [68] proposed group
underlying energy harvesting device-to-device communication. ADMM (GADMM) that divides the network nodes connected
Simulation results show that the algorithm proposed in the paper to one line into two groups, head and tail, such that each
can obtain performance comparable to classical centralized re- working point in the head group is connected to other working
inforcement learning at a faster convergence rate by sacrificing points through two tail working points. The working points
an additional signaling overhead. in the head group update their model parameters, and each
3) Alternating Direction Method of Multipliers: ADMM de- head working point transmits its updated model to its directly
composes a problem into several parallel sub-problems and connected tail neighbors. The tail working points update their
orchestrates the overall scheduling across them to solve the model parameters to complete an iteration. In this way, each
original problem [64]. In the ADMM algorithm, the “multiplier working point (except edge working points) communicates with
method” refers to a dual ascent method using the augmented only two neighbors to update its parameters. With GADMM,
Lagrange function (with quadratic penalty term). Further, the only half of the working points, in each round of communication,
“alternate direction” means that two variables are updated al- compete for the limited bandwidth.
ternately, and the alternating update of two variables is the key 4) Other Distributed Algorithms: Apart from the main-
reason for the decomposition of the problem. stream algorithms described above, there are some other com-
The optimization problem solved by the ADMM algorithm is monly used distributed algorithms.
as follows: • DANE: A communication efficient distributed approximate
Newton (DANE) algorithm is proposed in [69], [70] for solv-
min f (x) + g(z) ing stochastic optimization and learning problems. The DANE
algorithm performs two distributed averaging calculations per
s.t.Ax + Bz = c, (3) iteration. First, the host obtains the global gradient by averaging
the local gradients over all machines and sends it to all machines.
where x and z are the optimization variables, x ∈ Rn , z ∈ Rm , Then, each machine independently updates the parameters based
A ∈ Rp×n , B ∈ Rp×m , and c ∈ Rp . Assume that both f (x) and on the received global gradient and the local optimization prob-
g(z) are convex functions. The augmented Lagrangian function lem. Finally, the host receives the local parameters from each
can be obtained as follows: machine and obtains the global parameters by averaging over the
local parameters.The DANE algorithm converges faster than the
Lρ (x, z, y) = f (x) + g(z) + y T (Ax + Bz − c) gradient descent algorithm and also avoids the disadvantage of
ρ high computational complexity due to the need to calculate the
+ Ax + Bz − c22 , (4)
2 inverse of the Hessian matrix by the traditional Newton method.
• CoCoA: The authors of [71] proposed the CoCoA frame-
where y denotes the Lagrangian multiplier vector and ρ > 0 is work for distributed computing environments. It is a com-
the penalty factor. The iterative process of the ADMM algorithm munication efficient primal-dual framework that successfully
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
332 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 16, NO. 3, APRIL 2022

Fig. 3. Applications of distributed learning for wireless communications.

Fig. 4. Taxonomy of applications of distributed learning for wireless communications in each network layer.

exploits convex duality to decompose the global problem into reductions of the communication overhead in a distributed en-
subproblems solved in parallel, then solves the subproblems by vironment. (2) Allowing the use of arbitrary local solvers in
local solvers on each machine, and finally uses the primal-dual parallel on each machine, which allows the framework to directly
structure of the global problem to efficiently combine the local merge state-of-the-art, application-specific stand-alone solvers
updates. Two key CoCoA advantages are communication effi- into a distributed setup. The CoCoA method is generalized
ciency and the ability to use off-the-shelf single-machine solvers and improved in [72], making the theoretical convergence rate
internally. (1) Sharing information between machines through applicable to both smooth and non-smooth losses, and giving a
a highly flexible communication scheme allows for significant more general framework.

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
QIAN et al.: DISTRIBUTED LEARNING FOR WIRELESS COMMUNICATIONS: METHODS, APPLICATIONS AND CHALLENGES 333

III. APPLICATIONS OF DISTRIBUTED LEARNING RISs to reduce the dropout problem in over-the-air FL, and
TO WIRELESS COMMUNICATIONS the authors of [80] employ RISs to achieve over-the-air model
With the gradual deployment of IoT devices, the coverage of aggregation without channel state information at the transmitters
smart hardware devices further increases, while the requirements in a joint edge learning system. To increase the throughput,
of 5G/6G systems for low-latency and ultra-reliable communi- the authors of [81] use RISs to combine over-the-air FL and
cations, and a variety of smart communication scenarios prompt non-orthogonal multiple access (NOMA) in a single framework.
the development of wireless communication systems endowed Although FL does not directly transmit raw data, the local model
with a distributed architecture. Next, we present the applications transmitted by each edge device reflects the information about
of distributed learning frameworks in some potentially novel the local data to a certain extent. To further mitigate privacy
communication scenarios and show potential application sce- breaches caused by small changes in the data source, the authors
narios in Fig. 3 and offer a taxonomy of those applications in of [82] introduce differential privacy in a RIS-assisted wireless
Fig. 4. FL system. In addition, the authors of [83] utilize RISs and
FL-assisted millimeter-wave channel estimation to maximize
the achievable rate by training the optimal model through FL
A. Physical Layer and establishing a mapping function between the channel state
1) MIMO Communications: The large-scale MIMO technol- information (CSI) and the RIS configuration matrix.
ogy has wide applications in 5G systems, and plays an impor-
tant role in the improvement of communication capacity and
coverage. The large-scale MIMO technology can be efficiently B. Media Access Control Layer
optimized by utilizing distributed machine learning. Since the physical layer performs bitstream transmission and
In [73], a new cell-free large-scale multiple-input multiple- cannot perform error control, solutions to ensure the reliable
output (CFm-MIMO) network scheme is proposed to support transmission of data have to be implemented at the media
the FL framework. It is also shown how the performance of FL access control layer. To ensure reliable transmission, one of
can be improved based on minimizing the training time. Also the most important aspects is to optimize channel allocation to
the impact of local accuracy and user processing frequency, ensure transmission quality, reduce transmission conflicts, and
etc. on the effective training time are analyzed. In [74], a maximize the spectrum efficiency.
distributed algorithm is used to find the gradient estimation in In distributed systems, a large number of edge users and
large-scale MIMO systems. A central server equipped with a central processors work in coordination for information trans-
large number of antennas works collaboratively with multiple mission. To prevent mutual interference or interference to other
wireless devices, and the central processor accurately estimates users, dynamic spectrum access techniques are often used for
the gradient vectors from the wireless devices. The new gradient effective transmission while improving spectrum utilization ef-
estimation algorithm, which exploits the sparsity of the local ficiency. A distributed adaptive learning and access strategy
gradient vectors, has been validated on the MINIST dataset and is proposed in [84]. Each user learns to dynamically adjust
its performance is very close to that of the centralized algorithm. the channel selection among users to avoid collisions from
On the other hand, it reduces the computational complexity its own historical access to the communication, the channel
by more than 70% compared to the linear minimum mean availability and the collision situation of all users in the network.
squared error (LMMSE) method. Also distributed algorithms Specifically, the channel selection problem is transformed into a
can be applied to solve dynamic resource allocation problems non-cooperative policy game problem. The resulting algorithm
in multi-cell MIMO, and in [75] the authors use algorithms for is valid for a variety of average availability distributions across
collaborative deep learning and game theory to train the base primary channels. The multi-carrier dynamic spectrum access
stations and master their reconciliation strategies. The algorithm cross-layer technique with adaptive power allocation proposed
automatically optimizes the spectrum allocation of all users and in [85]. This approach decomposes the spectrum and power
later translates it into a power allocation problem. All base allocation problem into two sub-problems to be solved sep-
stations obtain the optimal system capacity by iteratively allocate arately. In the first stage, the spectrum allocation problem is
the power until convergence to the equilibrium state. solved by a learning approach. In the second stage, the power
2) Communications With Reconfigurable Intelligent Sur- allocation problem is solved by a conventional optimization
faces: The propagation of electromagnetic waves is largely un- solver. In [86], the dynamic spectrum access method under high
controllable due to scattering and multipath in complex environ- dynamic interference scenarios is analyzed, and a distributed
ments. Reconfigurable intelligent surface (RIS) is a general term multi-intelligence strategy is proposed, where all nodes can
for a class of special surfaces that can change the propagation effectively predict and avoid dynamic interference and reduce
characteristics of incident signals [76]. An RIS consists of a collision conflicts.
large number of passive units, and by adjusting the parameters For different application scenarios, different channel alloca-
of the structural units on the surface the incident signals can be tion methods can be targeted. For example, the authors of [87]
changed in amplitude and phase to enhance the useful signal proposes a distributed Q-learning algorithm to mitigate the col-
quality and improve the system performance. The combination lision problem of channel random selection in massive machine
of RIS and FL can protect users’ privacy while improving type of communication (mMTC) scenarios. The effectiveness of
spectrum utilization. The authors of [77] apply both RIS and the algorithm is demonstrated by the access success probability.
FL in smart IoT systems. The authors of [78] and [79] use In [88], a cellular spectrum sharing scenario is considered and
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
334 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 16, NO. 3, APRIL 2022

a dynamic access mechanism for distributed mobile network mobility and flexibility of UAVs, communication services
operators (MNOs) sharing cellular frequency bands is proposed. facilities can be rapidly deployed by UAVs, making UAV-
Simulation results show that the scheme not only improves the assisted communication systems advantageous in many ap-
throughput, but also ensures fairness for operators. There exist plication scenarios. [100] investigates FL-assisted multi-UAV
also relatively novel approaches to study channel allocation, networks for scene image classification tasks. The large amount
such as [89], where the advantages of traditional graph models of data generated by drone devices requires high network band-
and physical models are combined by building a hypergraph width for transmission to the server, consuming the energy
interference model for analysis. The channel allocation problem of the drone. Second, the generated data may contain private
is transformed into a local altruistic game problem, and the data such as location. To protect the privacy of the data, a
simulation results demonstrate that the spectrum efficiency can distributed learning solution is used to efficiently process the
be significantly improved. datasets generated by UAV devices.
The learning convergence of UAV swarms is affected by the
wireless channel because the updates of the learning model are
C. Network Layer transmitted through the wireless network. In [101], the influence
1) Device-to-Device Communications: Agents in distributed of wireless factors on the FL convergence is investigated, which
learning systems are usually connected in a star topology, e.g., is optimized by jointly designing the power allocation and
in a main working architecture, or in a D2D topology. The scheduling of the UAV network.
D2D technology allows two user nodes to play the roles of In order to provide energy-efficient UAV-based communi-
both server and client and to communicate directly for allevi- cation network services, the UAVs need to be deployed in
ating the communication interference in the cell [90], [91]. At suitable locations to guarantee transmission efficiency. In [102],
present, terminal devices are becoming more intelligent and have the UAVs are deployed as wireless powered users to achieve
stronger computing power, based on which the development of sustainable FL-based wireless networks. The UAV transmission
D2D communications has a good prospect. power efficiency is maximized through joint optimization of
Distributed algorithms are often used to solve problems such transmission time and bandwidth allocation, power control,
as resource allocation in D2D. A distributed channel allocation and UAV layout. However, UAVs deployed as airborne base
scheme for end-to-end communications is proposed in [89], stations for wireless communication with ground users may
which transforms the channel allocation problem into a local compete for limited RF resources and cause interference to
altruistic game by hyper-graph modeling and finds its Nash ground equipment. In addition, the limited energy of UAVs
equilibrium. This distributed algorithm significantly improves hinders the applicability of UAVs that use RF to provide high
the spectral efficiency [92]. In [93], a binary log-linear learning data rate communication. In [103], the deployment of UAV
algorithm (BLLA) is proposed considering the D2D wireless networks based on visible light communications is investigated.
network resource allocation problem under the noisy-potential The authors proposed an FL framework based on a convolu-
games framework. As in [89], it also converges to the Nash tional autoencoder machine learning algorithm to predict the
equilibrium of the resource allocation game. light distribution across the service area and determine the
Rational allocation of communication resources by dis- optimal UAV deployment that minimizes the total UAV transmit
tributed algorithms can effectively reduce transmission power power.
and maximize throughput. For example, in [94], the resource Real-time control of UAVs helps to accomplish the mission
allocation problem for distributed two-dimensional commu- when responding to critical tasks such as disaster scenes and
nication in heterogeneous networks is considered to reduce rescue missions. Moreover, real-time control of UAV positions
the total transmission power. In [95], the channel and power is required to reduce collisions between UAVs. The authors
allocation problem is solved using distributed federation. A of [104] investigate the online path control of large-scale UAVs
fully autonomous power allocation method based on distributed for efficient communication and propose a mean field game
learning for IoT D2D communications is proposed in [96], which strategy based on FL to reduce communication costs. The
is pervasive in ensuring that every D2D communication device authors of [105] study the radio mapping and path planning
can use the same model after the training is completed in addition problem for cellular connected UAV networks and propose a
to improving the capacity. In addition to pure resource allocation fast search random tree based path planning algorithm. In [106],
performance optimization schemes, some distributed algorithms the authors introduce a framework based on decentralized DRL
for D2D communications also consider the balance between and to navigate each UAV in a distributed manner and to
resource allocation rationality and delay is [97], [98]. Further, provide long-term communication coverage for ground mobile
in [99], distributed learning is used to predict the quality of ser- users.
vice of D2D communications, the interference it generates, and
the optimal communication policy selection from the network
perspective. D. Other Techniques With Distributed Learning in
2) Unmanned Aerial Vehicle Communication Networks: Communication Networks
UAV communication networks are widely considered for mili- 1) Blockchain Based Distributed Learning: Blockchain is a
tary and civilian applications, industry data transmission, anti- new application model of distributed data storage, peer-to-peer
jamming, surveillance and reconnaissance. Due to the high transmission, consensus mechanism, encryption algorithms and

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
QIAN et al.: DISTRIBUTED LEARNING FOR WIRELESS COMMUNICATIONS: METHODS, APPLICATIONS AND CHALLENGES 335

other computer technologies, which is essentially a decentral- IV. RESEARCH DIRECTIONS AND CHALLENGES OF
ized database [107]. In order to secure large-scale intelligent DISTRIBUTED LEARNING FOR WIRELESS COMMUNICATIONS
applications, blockchain-based distributed machine learning can Although distributed learning is not a new technology and
be used as a solution.
has been researched for many years, there is still a long way for
Due to the slow convergence speed of distributed learning op-
its practical implementation. In this section, we discuss future
timization algorithms, high requirements for computational and research directions and their challenging issues in distributed
memory resources, and training difficulties, the authors of [108]
learning for wireless communications, as is summarized in
investigate blockchain-based distributed learning and propose
Fig. 5.
a distributed computing framework for the limited-memory
BFGS algorithm (a method to solve unconstrained nonlinear
programming problems) based on variance reduction to speed A. Communication Cost
up the convergence process by reducing the variance of gradient Distributed learning relies on frequent communication rounds
estimation during stochastic iterations. between working nodes to exchange the parameters to complete
FL is implemented through a central server that aggregates the training. The high communication cost is a serious bottleneck
all local model updates to produce a global model update. Each due to uplink bandwidth limitations and slow, unreliable network
device then downloads the global model update and computes connections between the worker nodes and the central server.
its next local model update until the global model training is There are generally two ways to reduce the communication
complete. FL relies on a single central server, and a server failure overhead during model training: reducing the traffic and the
can result in inaccurate global model updates, thus making all frequency of communication.
local model updates incorrect. Also, FL does not reward local 1) Reducing the Traffic of Communication: Gradient com-
devices that contribute more to the global training, such that pression is an effective method to reduce the communication
devices with more data samples are less willing to participate to content without changing the model structure and communica-
the training with devices having fewer data samples. The authors tion process. Gradient compression reduces the amount of data
of [109] and [110] propose a blockchain FL (BlockFL) architec- to be transmitted by gradient quantization or gradient sparsifica-
ture where the blockchain network is able to exchange and verify tion. Gradient quantization, where each gradient value is repre-
local model updates of devices while providing corresponding sented using fewer bits, reduces the bit width of the gradient. For
rewards. BlockFL overcomes the single point of failure problem example, the authors of [116] propose a federally trained ternary
and facilitates training among more devices. quantization (FTTQ) algorithm, which optimizes the quantiza-
Traditional blockchain consensus mechanisms such as proof tion network at the working point by a self-learning quantization
of work can cause great resource consumption and greatly reduce factor. the authors of [117] use Grassmannian codebooks for
the efficiency of FL. In order to solve the device asynchronous quantization of high-dimensional stochastic gradient vectors. On
and anomaly detection problems in FL, while avoiding the extra the other hand, in gradient sparsification, gradient sparsification
resource consumption caused by blockchain, the authors of [111] selectively transmits gradients according to a specific threshold,
propose a framework for enhancing FL in blockchain systems reducing the number of gradients that need to be transmitted.
based on direct acyclic graph (DAG-FL). The DAG-FL algo- For example, the authors of [118] and [119] propose the general
rithm can well satisfy the asynchronous nature of devices and gradient sparsification (GGS) adaptive optimization framework.
allow nodes to participate in FL iterations without considering The sparse binary compression (SBC) framework is proposed
the state of other nodes. At the same time, the workload of model in [120], and the sparse ternary compression (STC) framework
validation is distributed to each node in DAG-FL, enabling for FL is proposed in [121]. Gradient sparsification can achieve
anomaly detection and immunity to anomalous nodes. a higher compression rate than gradient quantization, but it can
2) Tensor Optimization Based Distributed Learning: FL re- seriously affect the convergence and accuracy of the model.
quires exchange of model parameters between nodes at each The standard deviation-based adaptive gradient compression
model update, which requires significant communication costs. (SDAGC) method is proposed in [122], which can achieve higher
To reduce the size of the transmitted model parameters, tensor model performance in training.
decomposition is an effective approach that uses low-rank rep- As FL alternative framework to reduce the communication
resentations to approximate the high-dimensional model param- overhead, called federated distillation (FD), has been recently
eters and significantly reduces these parameters without reduc- proposed, which only requires the devices to exchange the aver-
ing the classification accuracy [112]. For example, the authors age model output. A wireless protocol for FD and its enhanced
of [113] propose a FL gradient compression algorithm based on version are studied in [127]. Moreover, FD can be applied
the tensor-train decomposition. In the framework of edge com- simultaneously with other techniques. The authors of [128]
puting, the authors of [114] introduce a distributed hierarchical introduce a two-step joint learning framework, which is referred
tensor deep computation model for FL, which compresses the to as robust federated augmentation and distillation (RFA-RFD),
model parameters from a high-dimensional tensor space to a which improves the communication efficiency while preserving
set of low-dimensional subspaces to reduce the bandwidth con- the data privacy.
sumption and energy consumption of FL. In addition, the authors 2) Reducing the Frequency of Communication: One option
of [115] propose a sparse tensor compression communication to reduce the number of communication rounds is to increase the
framework applicable to distributed DNN training. convergence speed of the training algorithm, for example, by

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
336 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 16, NO. 3, APRIL 2022

Fig. 5. Future research directions and challenges of distributed learning for wireless communications.

decentralized gradient descent (DGD) [130], momentum resource allocation strategy can effectively reduce interference
gradient descent (MGD) [131], overlap local-SGD [132], and improve the data transmission rate [135], [136].
decentralized ADMM [133], asynchronous decentralized In [137], user scheduling and power allocation schemes in
consensus ADMM [134] and proximal Jacobian ADMM [65] FL uplink wireless fading channels are studied to maximize
methods. There exist also some studies that incorporate the data rate by by using NOMA as the transmission scheme
censorship in distributed learning, where workers transmit for updating the FL model. In addition to the data rate as the
only highly informative updates and eliminate unnecessary optimization objective, common optimization objectives include
communication. If the difference between the transmitted reducing the power consumption, increasing the throughput.
gradients of two workers is small, the communication is The authors of [96] propose a fully autonomous power allo-
skipped and the server reuses the gradients sent previously cation method for D2D communications in cellular networks
sent. The authors of [123] investigate an orthogonal approach based on distributed deep learning methods to maximize the
that identifies irrelevant updates made by the worker and total throughput of D2D links. The authors of [138] study the
prevents them from being uploaded based on the feedback optimal power control problem in wireless fading channels to
provided by the server about the model updates. Censorship in minimize the model aggregation error by jointly optimizing the
distributed learning reduces communication, but some useful transmit power of each device and the denoising factor of the
information may be lost. In [124], the authors study an ordered edge server. In [139], the distributed joint channel and power
gradient method that uses sorting to eliminate some of the allocation problem in D2D communication networks is studied
worker-to-server upstream communication rounds typically using a game-theoretic learning approach. The joint channel
required in gradient descent methods. The authors of [125] and power allocation problem is described as a multi-intelligent
and [126] study gradient coding to reduce the communication learning problem with discrete policy sets, and a fully distributed
cost while being able to reduce the latency caused by learning algorithm is proposed to determine the channel metrics
slow-running machines. In particular, the authors of [129] apply and power levels used by each device pair. In [85], distributed
the two-stream model commonly used in migration learning learning-based adaptive power allocation methods for multi-
and domain adaptation to FL, by using a two-stream model carrier dynamic spectrum access across layers are investigated,
trained on each client instead of a single model, and introduce a allowing dynamic spectrum access (DSA) to efficiently locate
maximum mean deviation constraint to the training iterations of and exploit unused spectrum opportunities. In [140], the problem
FL, forcing the local two-stream model to learn more from other of distributed power allocation for edge users in decentralized
devices, thus reducing the number of communication rounds. wireless networks is studied, and a joint learning framework
algorithm based on cooperation and augmentation (FL-CA)
is proposed. Each edge device obtains the power allocation
policy locally by training a local actor-critic model, and then
B. Resource Allocation periodically uploads the gradients and weights generated by the
The massive access to mobile devices and the growing de- actor network to the base station for information aggregation.
mand for wireless services have brought about an explosion of Due to the unbalanced data distribution in distributed learn-
data traffic and mobile connections, resulting in a tighter sup- ing, the complex environment in real application scenarios
ply of wireless spectrum resources. A reasonable and effective and the individual needs of users, the data size of each user’s

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
QIAN et al.: DISTRIBUTED LEARNING FOR WIRELESS COMMUNICATIONS: METHODS, APPLICATIONS AND CHALLENGES 337

computational task varies over time and the resource allocation to make a balance with these conditions to meet different re-
scheme needs to be dynamically adjusted to meet the users’ quirements for different scenarios. In this regard, the power
needs. In [141], a support vector machine-based joint learning minimization problem for distributed FL applied to vehicular
approach is proposed for cellular networks with MEC capa- communications while ensuring low-latency and high reliability
bilities to actively determine user associations. Given a user for probabilistic queue length is studied in [26]. An improved
association, the base station can collect information related to distributed algorithm is proposed in [97] to ensure some balance
the computational tasks of its associated users, and utilize this between the energy efficiency and the delay in D2D networks,
information to optimize the transmit power and task allocation which improves the convergence speed. Also in [144], the low-
for each user while minimizing the energy consumption of each latency problem for joint multi-task learning in MEC networks
user. To address the dynamic malicious interference problem is presented, and the impact of the number of participants,
in communication networks, the authors of [142] propose a edge node capacity, local accuracy, and energy threshold on
distributed multi-intelligent spectrum access strategy without low-latency is considered.
interactive communication overhead, which employs a simpli-
fied Q-reinforcement learning algorithm to mitigate spectrum
conflicts among nodes. D. Security and Robustness
Due to the dramatic growth in the number of mobile devices With the rapid growth of the Internet, the scale of data trans-
and the unbalanced data distribution in distributed learning, fur- fer has expanded dramatically, and data security and privacy
ther research on the dynamic allocation of resources in wireless have received widespread attention. Effective privacy protection
communication networks is needed to devise power allocation against single points of failure has been achieved by using a
schemes that better meet individual user needs and further im- blockchain-enabled FL approach, such as the use of a fog server
prove spectrum utilization. The computational capacity and en- to interact with and update end devices in [110]. In [145], the
ergy consumption of each terminal device in distributed learning reliability of task execution is improved by reputation screening
are different, Therefore, the design of power allocation schemes of reliable computing endpoints and effective reputation man-
based on different residual energy of terminal devices can further agement by setting up a blockchain. The ADMM algorithm is
improve energy utilization and avoid wasting resources. In dis- commonly used to solve distributed convex optimization prob-
tributed learning, a large number of devices are non-stationary, lems, and the incremental ADMM algorithm proposed in [146],
the wireless channel state between devices and the central server is an improvement of the traditional ADMM method. Random
is often uncertain, and accurate CSI is often difficult to obtain. initialization and step perturbation are used to communicate
Further research on power allocation schemes under imperfect efficiently while maintaining privacy.
CSI is thus required. Messages from distributed nodes are prone to errors due
to hardware failures or software errors, computational errors,
data corruption and network transmission problems. There are
C. Low-Latency Communications
even malicious attacks on the system by unreliable distributed
Ultra-reliable and low-latency communication is one of the workers who actively send erroneous and malicious messages
three major directions of 5G application scenarios defined by to the master server. Among these interference models, the most
3GPP. For example, some virtual reality scenarios require suffi- important one is the Byzantine threat model. In this model,
ciently low latency to ensure user experience, while in the case computing nodes can act arbitrarily and maliciously. Therefore,
of autonomous driving, remote control, etc., latency is directly it is important to study distributed algorithms with good capabil-
related to the system implementation and safety factor. ity to deal with Byzantine attacks. Several papers investigated
One advantage of distributed systems is that multiple comput- this aspect, such as [147] where the authors consider a total
ers are interconnected, which enables to dynamically distribute variation canonical penalty approximation formula to deal with
tasks and to improve the speed of task execution. However, Byzantine attacks, and a structured ADMM algorithm with fault
how to cope with different scenarios or communication needs, tolerance is proposed to cope with Byzantine attacks. A dis-
coordinate the task interaction between the central processor and tributed gradient descent algorithm is proposed in [148], where
the end devices, and find algorithms to achieve the best perfor- a simple thresholding based on the gradient parametrization
mance remain an important research directions. For example, is applied to mitigate the failure of blocking Byzantine style.
the authors of [98] optimize data sharing and radio resource A two-step learning framework is proposed in [128], which
allocation for a D2D-enabled network model, with good con- generates independent but with the same-distribution datasets at
vergence for non-independent and homogeneously distributed the edge devices and only requires uploading the output of the
data samples and reduced iterative training delays. A distributed local model, thus reducing private data uploads and achieving
learning framework is proposed in [143] for low-latency model robustness to Byzantine attacks.
communication in large-scale IoT systems, and the effectiveness Differential privacy is also a way to improve data security.
of the learning algorithm is analyzed with different latency The most common method in distributed algorithms based on
targets. differential privacy is to add a certain amount of noise in the
The low-latency performance of algorithms is also affected transmission of data from the client to the central server, with
by other conditions such as computational accuracy, energy the aim of making it difficult for an attacker to find the private
efficiency, user capacity, and power. The latency problem has information of a single endpoint. A differential privacy-based

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
338 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 16, NO. 3, APRIL 2022

non-coding transmission scheme that does not affect the learning [14] K. Merchant, S. Revay, G. Stantchev, and B. Nousain, “Deep learning for
performance under privacy constraints below a certain threshold RF device fingerprinting in cognitive communication networks,” IEEE J.
Sel. Topics Signal Process., vol. 12, no. 1, pp. 160–167, Feb. 2018.
is studied in [150]. Distributed algorithms based on differential [15] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, and Y.
privacy can be found in [151] and [152] as well. A theoretical Beery, “Deep learning methods for improved decoding of linear codes,”
analysis of the tradeoff between privacy and convergence perfor- IEEE J. Sel. Topics Signal Process., vol. 12, no. 1, pp. 119–131, Feb. 2018.
[16] M. Amjad, F. Akhtar, M. H. Rehmani, M. Reisslein, and T. Umer, “Full-
mance during training is presented in [151]. In [152], the tradeoff duplex communication in cognitive radio networks: A survey,” IEEE
between communication efficiency and privacy performance is Commun. Surv. Tut., vol. 19, no. 4, pp. 2158–2191, Jun. 2017.
analyzed, and the impact of multiple medium design parameters [17] Q. Wu, L. Liu, and R. Zhang, “Fundamental trade-offs in communication
and trajectory design for UAV-enabled wireless network,” IEEE Wireless
on the communication efficiency is pointed out. Commun., vol. 26, no. 1, pp. 36–44, Feb. 2019.
Research on improving the security and privacy of distributed [18] A. Thornburg, T. Bai, and R. W. Heath, “Performance analysis of outdoor
algorithms with higher attack tolerance and lower computational mmWave ad hoc networks,” IEEE Trans. Signal Process., vol. 64, no. 15,
pp. 4065–4079, Aug. 2016.
cost is still required, which is an important direction for research. [19] J. Park, S. Samarakoon, M. Bennis, and M. Debbah, “Wireless network
intelligence at the edge,” Proc. IEEE, vol. 107, no. 11, pp. 2204–2239,
Nov. 2019.
V. CONCLUSION [20] M. M. Amiri and D. Gndz, “Machine learning at the wireless edge:
Distributed stochastic gradient descent over-the-air,” IEEE Trans. Signal
In this paper, we have provided an overview of distributed Process., vol. 68, pp. 2155–2169, Mar. 2020.
learning techniques in wireless communications. We have pre- [21] Y. Xu, P. Cheng, Z. Chen, Y. Li, and B. Vucetic, “Mobile collabo-
rative spectrum sensing for heterogeneous networks: A Bayesian ma-
sented typical distributed learning frameworks and algorithms chine learning approach,” IEEE Trans. Signal Process., vol. 66, no. 21,
that lay the foundation for subsequent application discussions. pp. 5634–5647, Nov. 2018.
We have highlighted promising applications of distributed learn- [22] R. Xin, S. Kar, and U. A. Khan, “Decentralized stochastic optimization
and machine learning: A unified variance-reduction framework for robust
ing in emerging wireless communication scenarios in the phys- performance and fast convergence,” IEEE Signal Process. Mag., vol. 37,
ical layer, media access control layer and network layer. We no. 3, pp. 102–113, May 2020.
have also highlighted the primary future research directions of [23] R. Nassif, S. Vlaski, C. Richard, J. Chen, and A. H. Sayed, “Multi-
task learning over graphs: An approach for distributed, streaming ma-
distributed learning techniques in wireless communications and chine learning,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 14–25,
their challenges. May 2020.
[24] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning:
Challenges, methods, and future directions,” IEEE Signal Process. Mag.,
REFERENCES vol. 37, no. 3, pp. 50–60, May 2020.
[25] X. Wang, C. Wang, X. Li, V. C. M. Leung, and T. Taleb, “Federated
[1] K. David and H. Berndt, “6G vision and requirements: Is there any need deep reinforcement learning for Internet of Things with decentralized
for beyond 5G?,” IEEE Veh. Technol. Mag., vol. 13, no. 3, pp. 72–80, cooperative edge caching,” IEEE Internet Things J., vol. 7, no. 10,
Sep. 2018. pp. 9441–9455, Oct. 2020.
[2] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems: [26] S. Samarakoon, M. Bennis, W. Saad, and M. Debbah, “Distributed feder-
Applications, trends, technologies, and open research problems,” IEEE ated learning for ultra-reliable low-latency vehicular communications,”
Netw., vol. 34, no. 3, pp. 134–142, May/Jun. 2020. IEEE Trans. Commun., vol. 68, no. 2, pp. 1146–1159, Feb. 2020.
[3] P. Yang, Y. Xiao, M. Xiao, and S. Li, “6G wireless communications: [27] Z. Huang, R. Hu, Y. Guo, E. Chan-Tin, and Y. Gong, “DP-ADMM:
Vision and potential techniques,” IEEE Netw., vol. 33, no. 4, pp. 70–75, ADMM-based distributed learning with differential privacy,” IEEE
Jul./Aug. 2019. Trans. Inf. Forensics Secur., vol. 15, pp. 1002–1012, Jul. 2019.
[4] F. Tang, Y. Kawamoto, N. Kato, and J. Liu, “Future intelligent and [28] C. Kumar and K. Rajawat, “Network dissensus via distributed ADMM,”
secure vehicular network toward 6G: Machine-learning approaches,” IEEE Trans. Signal Process., vol. 68, pp. 2287–2301, Apr. 2020.
Proc. IEEE, vol. 108, no. 2, pp. 292–307, Feb. 2020. [29] A. Nedic, “Distributed gradient methods for convex machine learning
[5] L. Zhang, Y. Liang, and D. Niyato, “6G visions: Mobile ultra-broadband, problems in networks: Distributed optimization,” IEEE Signal Process.
super Internet-of-Things, and artificial intelligence,” China Commun., Mag., vol. 37, no. 3, pp. 92–101, May 2020.
vol. 16, no. 8, pp. 1–14, Aug. 2019. [30] J. Park et al., “Communication-efficient and distributed learning over
[6] L. Bariah et al., “A prospective look: Key enabling technologies, appli- wireless networks: Principles and applications,” Proc. IEEE, vol. 109,
cations and open research topics in 6G networks,” IEEE Access, vol. 8, no. 5, pp. 796–819, May 2021.
pp. 174792–174820, 2020. [31] S. Shi, Z. Tang, X. Chu, C. Liu, W. Wang, and B. Li, “A quantitative
[7] Z. Zhang et al., “6G wireless networks: Vision, requirements, architec- survey of communication optimizations in distributed deep learning,”
ture, and key technologies,” IEEE Veh. Technol. Mag., vol. 14, no. 3, IEEE Netw., vol. 35, no. 3, pp. 230–237, May/Jun. 2021.
pp. 28–41, Sep. 2019. [32] Y. Liu, X. Yuan, Z. Xiong, J. Kang, X. Wang, and D. Niyato, “Feder-
[8] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, ated learning for 6G communications: Challenges, methods, and future
“Deep reinforcement learning: A brief survey,” IEEE Signal Process. directions,” China Commun., vol. 17, no. 9, pp. 105–118, Sep. 2020.
Mag., vol. 34, no. 6, pp. 26–38, Nov. 2017. [33] S. Abdulrahman, H. Tout, H. Ould-Slimane, A. Mourad, C. Talhi, and M.
[9] L. Xiao, X. Wan, X. Lu, Y. Zhang, and D. Wu, “IoT security techniques Guizani, “A survey on federated learning: The journey from centralized to
based on machine learning: How do IoT devices use AI to enhance secu- distributed on-site learning and beyond,” IEEE Internet Things J., vol. 8,
rity?,” IEEE Signal Process. Mag., vol. 35, no. 5, pp. 41–49, Sep. 2018. no. 7, pp. 5476–5497, Apr. 2021.
[10] D. Gndz, P. de Kerret, N. D. Sidiropoulos, D. Gesbert, C. R. Murthy, [34] O. A. Wahab, A. Mourad, H. Otrok, and T. Taleb, “Federated machine
and M. van der Schaar, “Machine learning in the air,” IEEE J. Sel. Areas learning: Survey, multi-level classification, desirable criteria and future
Commun., vol. 37, no. 10, pp. 2184–2199, Oct. 2019. directions in communication and networking systems,” IEEE Commun.
[11] J. Zhang and K. B. Letaief, “Mobile edge intelligence and computing Surv. Tut., vol. 23, no. 2, pp. 1342–1397, Apr.–Jun. 2021.
for the Internet of vehicles,” Proc. IEEE, vol. 108, no. 2, pp. 246–261, [35] L. U. Khan, W. Saad, Z. Han, E. Hossain, and C. S. Hong, “Federated
Feb. 2020. learning for Internet of Things: Recent advances, taxonomy, and open
[12] 3GPP TR 38.885, Study on NR Vehicular to Everything (V2X) (Release challenges,” IEEE Commun. Surv. Tut., vol. 23, no. 3, pp. 1759–1799,
16), v16.0.0, Mar. 2019. Jul.–Sep. 2021.
[13] T. J. O’Shea, T. Roy, and T. C. Clancy, “Over-the-air deep learning based [36] M. Li et al., “Scaling distributed machine learning with the parameter
radio signal classification,” IEEE J. Sel. Topics Signal Process., vol. 12, server,” in Proc. USENIX Symp. Operating Syst. Des. Implementation,
no. 1, pp. 168–179, Feb. 2018. Broomfield, WI, 2014, pp. 583–598.

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
QIAN et al.: DISTRIBUTED LEARNING FOR WIRELESS COMMUNICATIONS: METHODS, APPLICATIONS AND CHALLENGES 339

[37] H. B. McMahan et al., “Communication-efficient learning of deep net- [59] O. Naparstek and K. Cohen, “Deep multi-user reinforcement learning for
works from decentralized data,” in Proc. 20th Int. Conf. Artif. Intell. distributed dynamic spectrum access,” IEEE Trans. Wireless Commun.,
Statist., Fort Lauderdale, Florida, USA, 2017, pp. 1273–1282. vol. 18, no. 1, pp. 310–323, Jan. 2019.
[38] C. S. Hong, L. U. Khan, M. Chen, D. Chen, W. Saad, and Z. Han, [60] Z. Ning et al., “When deep reinforcement learning meets 5G-enabled
Federated Learning for Wireless Networks. Berlin, Germany: Springer, vehicular networks: A distributed offloading framework for traffic Big
2021. Data,” IEEE Trans. Ind. Informat., vol. 16, no. 2, pp. 1352–1361,
[39] S. Niknam, H. S. Dhillon, and J. H. Reed, “Federated learning for Feb. 2020.
wireless communications: Motivation, opportunities, and challenges,” [61] J. Li, H. Gao, T. Lv, and Y. Lu, “Deep reinforcement learning based
IEEE Commun. Mag., vol. 58, no. 6, pp. 46–51, Jun. 2020. computation offloading and resource allocation for MEC,” in Proc. IEEE
[40] S. Hosseinalipour, C. G. Brinton, V. Aggarwal, H. Dai, and M. Chiang, Wireless Commun. Netw. Conf., Barcelona, Spain, 2018, pp. 1–6 .
“From federated to fog learning: Distributed machine learning over [62] Z. Shi, W. Gao, J. Liu, N. Kato, and Y. Zhang, “Distributed Q-learning-
heterogeneous wireless networks,” IEEE Commun. Mag., vol. 58, no. 12, assisted grant-free NORA for massive machine-type communications,”
pp. 41–47, Dec. 2020. in Proc. IEEE Glob. Commun. Conf., Taipei, Taiwan, 2020, pp. 1–5.
[41] S. Shi, X. Chu, and B. Li, “MG-WFBP: Efficient data communication for [63] S. K. Sharma and X. Wang, “Collaborative distributed Q-learning for
distributed synchronous SGD algorithms,” in Proc. IEEE Conf. Comput. RACH congestion minimization in cellular IoT networks,” IEEE Com-
Commun., Paris, France, Apr. 2019, pp. 172–180. mun. Lett., vol. 23, no. 4, pp. 600–603, Apr. 2019.
[42] S. Hu, X. Chen, W. Ni, E. Hossain, and X. Wang, “Distributed ma- [64] Z. Han, H. Li, and W. Yin, Compressive Sensing for Wireless Networks.
chine learning for wireless communication networks: Techniques, ar- Cambridge, U.K.: Cambridge Univ. Press, 2013.
chitectures, and applications,” IEEE Commun. Surv. Tut., vol. 23, no. 3, [65] W. Deng et al., “Parallel multi-block ADMM with O(1/k) convergence,”
pp. 1458–1493, Jun. 2021. J. Sci. Comput., vol. 71, no. 2, pp. 712–736, May 2017.
[43] S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, “A fundamental [66] W. Li, Y. Liu, Z. Tian, and Q. Ling, “Communication-censored linearized
tradeoff between computation and communication in distributed com- ADMM for decentralized consensus optimization,” IEEE Trans. Signal
puting,” IEEE Trans. Inf. Theory, vol. 64, no. 1, pp. 109–128, Jan. 2018. Inf. Process. Netw., vol. 6, pp. 18–34, Dec. 2020.
[44] J. Rosen et al., “Iterative MapReduce for large scale machine learning,” [67] S. Zhu and B. Chen, “Quantized consensus by the ADMM: Probabilistic
Comput. Sci., 2013, in NIPS ML Systems Workshop, Barcelona, Spain, versus deterministic quantizers,” IEEE Trans. Signal Process., vol. 64,
2016.. no. 7, pp. 1700–1713, Apr. 2016.
[45] S. Hou, W. Ni, S. Zhao, B. Cheng, S. Chen, and J. Chen, “Real-time [68] A. Elgabli, J. Park, A. S. Bedi, M. Bennis, and V. Aggarwal, “Communi-
optimization of dynamic speed scaling for distributed data centers,” IEEE cation efficient framework for decentralized machine learning,” in Proc.
Trans. Netw. Sci. Eng., vol. 7, no. 3, pp. 2090–2103, Jul.–Sep. 2020. 54th Annu. Conf. Inf. Sci. Syst., Princeton, NJ, 2020, pp. 1–5.
[46] S. Jeon, H. Chung, W. Choi, H. Shin, and Y. Nah, “MapReduce tuning [69] O. Shamir, N. Srebro, and T. Zhang, “Communication efficient distributed
to improve distributed machine learning performance,” in Proc. IEEE optimization using an approximate Newton-type method,” in Proc. Int.
First Int. Conf. Artif. Intell. Knowl. Eng., Laguna Hills, USA, 2018, Conf. Mach. Learn., Beijing, China, 2014, pp. 1000–1008.
pp. 198–200. [70] T. Anderson, C. -Y. Chang, and S. Martnez, “Weight design of distributed
[47] P. H. Jin, Q. Yuan, F. Iandola, and K. Keutzer, “How to scale distributed approximate Newton algorithms for constrained optimization,” in Proc.
deep learning?,” in Proc. NIPS ML Syst. Workshop, Barcelona, Spain, IEEE Conf. Control Technol. Appl., Maui, USA, 2017, pp. 632–637.
2016. [71] V. Smith, S. Forte, C. Ma, M. Takac, M. Jordan, and M. Jaggi, “CoCoA:
[48] P. Zhou, Q. Lin, D. Loghin, B. C. Ooi, Y. Wu, and H. Yu, A general framework for communication-efficient distributed optimiza-
“Communication-efficient decentralized machine learning over hetero- tion,” J. Mach. Learn. Res., vol. 18, no. 230, pp. 1–49, Jan. 2018.
geneous networks,” in Proc. IEEE 37th Int. Conf. Data Eng., Chania, [72] C. Ma et al., “Distributed optimization with arbitrary local solvers,”
Greece, 2021, pp. 384–395. Optimazation Methods Softw., vol. 32, no. 4, pp. 813–848, Feb. 2017.
[49] L. Abrahamyan, Y. Chen, G. Bekoulis, and N. Deligiannis, “Learned [73] T. T. Vu, D. T. Ngo, N. H. Tran, H. Q. Ngo, M. N. Dao, and R. H. Mid-
gradient compression for distributed deep learning,” IEEE Trans. Neural dleton, “Cell-free massive MIMO for wireless federated learning,” IEEE
Netw. Learn. Syst., doi: 10.1109/TNNLS.2021.3084806. Trans. Wireless Commun., vol. 19, no. 10, pp. 6377–6392, Oct. 2020.
[50] C.-Y. Chen et al., “ScaleCom: Scalable sparsified gradient compression [74] Y. S. Jeon, M. M. Amiri, J. Li, and H. V. Poor, “A compressive sensing
for communication-efficient distributed training,” in Proc. Adv. Neural approach for federated learning over massive MIMO communication
Inf. Process. Syst., Vancouver, Canada, 2020, Art. no. 13551C13563. systems,” IEEE Trans. Wireless Commun., vol. 20, no. 3, pp. 1990–2004,
[51] X. Tian, B. Xie, and J. Zhan, “Cymbalo: An efficient graph processing Mar. 2021.
framework for machine learning,” in Proc. IEEE Intl Conf Parallel [75] K. K. Wong, G. Liu, W. Cun, W. Zhang, M. Zhao, and Z. Zheng,
Distrib. Process. Appl., Ubiquitous Comput. Commun., Big Data Cloud “Truly distributed multicell multi-band multiuser MIMO by synergizing
Comput., Social Comput. Netw., Sustain. Comput. Commun., Melbourne, game theory and deep learning,” IEEE Access, vol. 9, pp. 30347–30358,
VIC, Australia, 2018, pp. 572–579. 2021.
[52] H. Ye, L. Liang, G. Y. Li, and B. Juang, “Deep learning-based end-to-end [76] H. Zhang, B. Di, L. Song, and Z. Han, Recongurable Intelligent Surface-
wireless communication systems with conditional GANs as unknown Empowered 6G. Berlin, Germany: Springer, 2021.
channels,” IEEE Trans. Wireless Commun., vol. 19, no. 5, pp. 3133–3143, [77] K. Yang, Y. Shi, Y. Zhou, Z. Yang, L. Fu, and W. Chen, “Federated ma-
May 2020. chine learning for intelligent IoT via reconfigurable intelligent surface,”
[53] W. Ma, C. Qi, Z. Zhang, and J. Cheng, “Sparse channel estimation and hy- IEEE Netw., vol. 34, no. 5, pp. 16–22, Sep./Oct. 2020.
brid precoding using deep learning for millimeter wave massive MIMO,” [78] H. Liu, X. Yuan, and Y. J. A. Zhang, “Joint communication-learning
IEEE Trans. Commun., vol. 68, no. 5, pp. 2838–2849, May 2020. design for RIS-assisted federated learning,” in Proc. IEEE Int. Conf.
[54] J. Kim, J. Park, J. Noh, and S. Cho, “Autonomous power allocation Commun. Workshops, Montreal, QC, Canada, 2021, pp. 1–6.
based on distributed deep learning for device-to-device communication [79] H. Liu, X. Yuan, and Y. J. A. Zhang, “Reconfigurable intelligent surface
underlaying cellular network,” IEEE Access, vol. 8, pp. 107853–107864, enabled federated learning: A unified communication-learning design ap-
2020. proach,” IEEE Trans. Wireless Commun., vol. 20, no. 11, pp. 7595–7609,
[55] Y. Chen, X. Sun, and Y. Jin, “Communication-efficient federated deep Nov. 2021.
learning with layerwise asynchronous model update and temporally [80] H. Liu, X. Yuan, and Y. J. A. Zhang, “CSIT-free model aggregation
weighted aggregation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, for federated edge learning via reconfigurable intelligent surface,” IEEE
no. 10, pp. 4229–4238, Oct. 2020. Wireless Commun. Lett., vol. 10, no. 11, pp. 2440–2444, Nov. 2021.
[56] F. Sattler, S. Wiedemann, K. -R. Mller, and W. Samek, “Sparse binary [81] W. Ni, Y. Liu, Z. Yang, and H. Tian, “Over-the-air federated learning
compression: Towards distributed deep learning with minimal communi- and non-orthogonal multiple access unified by reconfigurable intelligent
cation,” in Proc. Int. Joint Conf. Neural Netw., Budapest, Hungary, 2019, surface,” in Proc. IEEE Conf. Comput. Commun. Workshops, Vancouver,
pp. 1–8. BC, Canada, 2021, pp. 1–6.
[57] D. Kuang, M. Chen, D. Xiao, and W. Wu, “Entropy-based gradient [82] Y. Yang, Y. Zhou, T. Wang, and Y. Shi, “Reconfigurable intelligent surface
compression for distributed deep learning,” in Proc. IEEE 21st Int. Conf. assisted federated learning with privacy guarantee,” in Proc. IEEE Int.
High Perform. Comput. Commun.; IEEE 17th Int. Conf. Smart City; IEEE Conf. Commun. Workshops, Montreal, QC, Canada, 2021, pp. 1–6.
5th Int. Conf. Data Sci. Syst., Zhangjiajie, China, 2019, pp. 231–238. [83] L. Li et al., “Enhanced reconfigurable intelligent surface assisted
[58] L. Liang, H. Ye, and G. Y. Li, “Spectrum sharing in vehicular networks mmWave communication: A federated learning approach,” China Com-
based on multi-agent reinforcement learning,” IEEE J. Sel. Areas Com- mun., vol. 17, no. 10, pp. 115–128, Oct. 2020.
mun., vol. 37, no. 10, pp. 2282–2292, Oct. 2019.
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
340 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 16, NO. 3, APRIL 2022

[84] M. Zandi, M. Dong, and A. Grami, “Distributed stochastic learning and [105] B. Khamidehi and E. S. Sousa, “Federated learning for cellular-connected
adaptation to primary traffic for dynamic spectrum access,” IEEE Trans. UAVs: Radio mapping and path planning,” in Proc. IEEE Glob. Commun.
Wireless Commun., vol. 15, no. 3, pp. 1675–1688, Mar. 2016. Conf., Taipei, Taiwan, 2020, pp. 1–6.
[85] M. B. Ghorbel, B. Hamdaoui, M. Guizani, and B. Khalfi, “Distributed [106] C. H. Liu, X. Ma, X. Gao, and J. Tang, “Distributed energy-efficient
learning-based cross-layer technique for energy-efficient multicarrier multi-UAV navigation for long-term communication coverage by deep
dynamic spectrum access with adaptive power allocation,” IEEE Trans. reinforcement learning,” IEEE Trans. Mobile Comput., vol. 19, no. 6,
Wireless Commun., vol. 15, no. 3, pp. 1665–1674, Mar. 2016. pp. 1274–1285, Jun. 2020.
[86] Q. Shan, J. Xiong, D. Ma, J. Li, and T. Hu, “Distributed multi-agent [107] Z. Xiong, Y. Zhang, D. Niyato, P. Wang, and Z. Han, “When mobile
Q-learning for anti-dynamic jamming and collision-avoidance spectrum blockchain meets edge computing,” IEEE Commun. Mag., vol. 56, no. 8,
access in cognitive radio system,” in Proc. 24th Asia-Pacific Conf. Com- pp. 33–39, Aug. 2018.
mun., Ningbo, China, 2018, pp. 428–432. [108] Z. Huang, F. Liu, M. Tang, J. Qiu, and Y. Peng, “A distributed computing
[87] Z. Shi, W. Gao, J. Liu, N. Kato, and Y. Zhang, “Distributed Q-learning- framework based on lightweight variance reduction method to accelerate
assisted grant-free NORA for massive machine-type communications,” machine learning training on blockchain,” China Commun., vol. 17, no. 9,
in Proc. IEEE Glob. Commun. Conf., Taipei, Taiwan, China, 2020, pp. 77–89, Sep. 2020.
pp. 1–5. [109] H. Kim, J. Park, M. Bennis, and S. Kim, “Blockchained on-device
[88] M. Shin and M. Y. Chung, “Learning-based distributed multi-channel federated learning,” IEEE Commun. Lett., vol. 24, no. 6, pp. 1279–1283,
dynamic access for cellular spectrum sharing of multiple operators,” in Jun. 2020.
Proc. 25th Asia-Pacific Conf. Commun., Ho Chi Minh City, Vietnam, [110] Y. Qu et al., “Decentralized privacy using blockchain-enabled federated
2019, pp. 384–387. learning in fog computing,” IEEE Internet Things J., vol. 7, no. 6,
[89] Y. Sun, Q. Wu, Y. Xu, Y. Zhang, F. Sun, and J. Wang, “Distributed chan- pp. 5171–5183, Jun. 2020.
nel access for device-to-device communications: A hypergraph-based [111] M. Cao, B. Cao, W. Hong, Z. Zhao, X. Bai, and L. Zhang, “DAG-FL:
learning solution,” IEEE Commun. Lett., vol. 21, no. 1, pp. 180–183, Direct acyclic graph-based blockchain empowers on-device federated
Jan. 2017. learning,” in Proc. IEEE Int. Conf. Commun., Montreal, QC, Canada,
[90] L. Song, D. Niyato, Z. Han, and E. Hossain, Wireless Device-to-Device 2021, pp. 1–6.
Communications and Networks. Cambridge, U.K. Cambridge Univ. [112] K. C. Tsai et al., “Tensor-based reinforcement learning for network
Press, 2015. routing,” IEEE J. Sel. Topics Signal Process., vol. 15, no. 3, pp. 617–629,
[91] D. Shi, L. Li, T. Ohtsuki, M. Pan, Z. Han, and V. Poor, “Make smart Apr. 2021.
decisions faster: Deciding D2D resource allocation via stackelberg game [113] X. Qiu, D. Lin, X. Feng, Y. Chen, J. Hu, and H. Zheng, “A deep
guided multi-agent deep reinforcement learning,” IEEE Trans. Mobile gradient compression model based on tensor-train decomposition for
Comput., early access, doi: 10.1109/TMC.2021.3085206. federated learning,” in Proc. Cross Strait Radio Sci. Wireless Technol.
[92] P. Gandotra, R. K. Jha, and S. Jain, “A survey on device-to-device (D2D) Conf., Fuzhou, China, 2020, pp. 1–3.
communication: Architecture and security issues,” J. Netw. Comput. [114] H. Zheng, M. Gao, Z. Chen, and X. Feng, “A distributed hierar-
Appl., vol. 78, pp. 9–29, Jan. 2017. chical deep computation model for federated learning in edge com-
[93] M. S. Ali, P. Coucheney, and M. Coupechoux, “Distributed learning in puting,” IEEE Trans. Ind. Informat., vol. 17, no. 12, pp. 7946–7956,
noisy-potential games for resource allocation in D2D networks,” IEEE Dec. 2021.
Trans. Mobile Comput., vol. 19, no. 12, pp. 2761–2773, Dec. 2020. [115] K. Kostopoulou, H. Xu, A. Dutta, X. Li, A. Ntoulas, and P. Kalnis,
[94] Y. Huang, T. Tan, N. Wang, Y. Chen, and Y. Li, “Resource allocation “DeepReduce: A sparse-tensor communication framework for distributed
for D2D communications with a novel distributed Q-learning algorithm deep learning,” 2021, arXiv:2102.03112.
in heterogeneous networks,” in Proc. Int. Conf. Mach. Learn. Cybern., [116] J. Xu, W. Du, Y. Jin, W. He, and R. Cheng, “Ternary compression for
Chendu, China, 2018, pp. 533–537. communication-efficient federated learning,” IEEE Trans. Neural Netw.
[95] S. Dominic and L. Jacob, “Distributed learning approach for joint channel Learn. Syst., vol. 33, no. 3, pp. 1162–1176, Mar. 2022.
and power allocation in underlay D2D networks,” in Proc. Int. Conf. [117] Y. Du, S. Yang, and K. Huang, “High-dimensional stochastic gradi-
Signal Process. Commun., Noida, India, 2016, pp. 145–150. ent quantization for communication-efficient edge learning,” in Proc.
[96] J. Kim, J. Park, J. Noh, and S. Cho, “Autonomous power allocation IEEE Glob. Conf. Signal Inf. Process., Ottawa, ON, Canada, 2019,
based on distributed deep learning for device-to-device communication pp. 1–5.
underlaying cellular network,” IEEE Access, vol. 8, pp. 107853–107864, [118] S. Li, Q. Qi, J. Wang, H. Sun, Y. Li, and F. R. Yu, “GGS: General gradient
2020. sparsification for federated learning in edge computing,” in Proc. IEEE
[97] Y. Luo, M. Zeng, and H. Jiang, “Learning to tradeoff between energy Int. Conf. Commun., Dublin, Ireland, 2020, pp. 1–7.
efficiency and delay in energy harvesting-powered D2D communication: [119] H. Sun, S. Li, F. R. Yu, Q. Qi, J. Wang, and J. Liao, “Toward
A distributed experience-sharing algorithm,” IEEE Internet Things J., communication-efficient federated learning in the Internet of Things
vol. 6, no. 3, pp. 5585–5594, Jun. 2019. with edge computing,” IEEE Internet Things J., vol. 7, no. 11,
[98] X. Cai, X. Mo, J. Chen, and J. Xu, “D2D-enabled data sharing for pp. 11053–11067, Nov. 2020.
distributed machine learning at wireless network edge,” IEEE Wireless [120] F. Sattler, S. Wiedemann, K. R. Mller, and W. Samek, “Sparse binary
Commun. Lett., vol. 9, no. 9, pp. 1457–1461, Sep. 2020. compression: Towards distributed deep learning with minimal commu-
[99] F. Librino and G. Quer, “Distributed mode and power selection for non- nication,” in Proc. Int. Joint Conf. Neural Netw., Budapest, Hungary,
orthogonal D2D communications: A stochastic approach,” IEEE Trans. 2019, pp. 1–8.
Cogn. Commun. Netw., vol. 4, no. 2, pp. 232–243, Jun. 2018. [121] F. Sattler, S. Wiedemann, K. R. Mller, and W. Samek, “Robust and
[100] H. Zhang and L. Hanzo, “Federated learning assisted multi-UAV net- communication-efficient federated learning from non-i.i.d. data,” IEEE
works,” IEEE Trans. Veh. Technol., vol. 69, no. 11, pp. 14104–14109, Trans. Neural Netw. Learn. Syst., vol. 31, no. 9, pp. 3400–3413,
Nov. 2020. Sep. 2020.
[101] T. Zeng, O. Semiari, M. Mozaffari, M. Chen, W. Saad, and M. Bennis, [122] M. Chen, Z. Yan, J. Ren, and W. Wu, “Standard deviation based adap-
“Federated learning in the sky: Joint power allocation and scheduling tive gradient compression for distributed deep learning,” in Proc. 20th
with UAV swarms,” in Proc. IEEE Int. Conf. Commun., Dublin, Ireland, IEEE/ACM Int. Symp. Cluster, Cloud Internet Comput., Melbourne, VIC,
2020, pp. 1–6. Australia, 2020, pp. 529–538.
[102] Q. V. Pham, M. Zeng, R. Ruby, T. Huynh-The, and W. J. Hwang, “UAV [123] L. Wang, W. Wang, and B. LI, “CMFL: Mitigating communication
communications for sustainable federated learning,” IEEE Trans. Veh. overhead for federated learning,” in. Proc IEEE 39th Int. Conf. Distrib.
Technol., vol. 70, no. 4, pp. 3944–3948, Apr. 2021. Comput. Syst., Dallas, TX, 2019, pp. 954–964.
[103] Y. Wang, Y. Yang, and T. Luo, “Federated convolutional auto-encoder [124] Y. Chen, B. M. Sadler, and R. S. Blum, “Ordered gradient approach for
for optimal deployment of UAVs with visible light communications,” communication-efficient distributed learning,” in Proc. IEEE 21st Int.
in Proc. IEEE Int. Conf. Commun. Workshops, Dublin, Ireland, 2020, Workshop Signal Process. Adv. Wireless Commun., Atlanta, GA, 2020,
pp. 1–6. pp. 954–964.
[104] H. Shiri, J. Park, and M. Bennis, “Communication-efficient massive UAV [125] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran,
online path control: Federated learning meets mean-field game theory,” “Speeding up distributed machine learning using codes,” IEEE Trans.
IEEE Trans. Commun., vol. 68, no. 11, pp. 6840–6857, Nov. 2020. Inf. Theory, vol. 64, no. 3, pp. 1514–1529, Mar. 2018.

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
QIAN et al.: DISTRIBUTED LEARNING FOR WIRELESS COMMUNICATIONS: METHODS, APPLICATIONS AND CHALLENGES 341

[126] S. Kadhe, O. O. Koyluoglu, and K. Ramchandran, “Communication- [149] A. Triastcyn and B. Faltings, “Federated learning with Bayesian differen-
efficient gradient coding for straggler mitigation in distributed learning,” tial privacy,” in Proc. IEEE Int. Conf. Big Data, Los Angeles, CA, 2019,
in Proc. IEEE Int. Symp. Inf. Theory, Los Angeles, CA, 2020, pp. 2634– pp. 2587–2596.
2639. [150] D. Liu and O. Simeone, “Privacy for free: Wireless federated learning via
[127] J. H. Ahn, O. Simeone, and J. Kang, “Cooperative learning VIA federated uncoded transmission with adaptive power control,” IEEE J. Sel. Areas
distillation over fading channels,” in Proc. IEEE Int. Conf. Acoust., Commun., vol. 39, no. 1, pp. 170–185, Jan. 2021.
Speech Signal Process., Barcelona, Spain, 2020, pp. 8856–8860. [151] K. Wei et al., “Federated learning with differential privacy: Algorithms
[128] H. Wen, Y. Wu, C. Yang, H. Duan, and S. Yu, “A unified federated learning and performance analysis,” IEEE Trans. Inf. Forensics Secur., vol. 15,
framework for wireless communications: Towards privacy, efficiency, and pp. 3454–3469, Apr. 2020.
security,” in Proc. IEEE Conf. Comput. Commun. Workshops, Toronto, [152] Y. Li, T. H. Chang, and C. Y. Chi, “Secure federated averaging algorithm
ON, Canada, 2020, pp. 653–658. with differential privacy,” in Proc. IEEE 30th Int. Workshop Mach. Learn.
[129] X. Yao, C. Huang, and L. Sun, “Two-stream federated learning: Reduce Signal Process., Espoo, Finland, 2020, pp. 1–6.
the communication costs,” in Proc. IEEE Vis. Commun. Image Process.,
Taichung, Taiwan, China, 2018, pp. 1–4.
[130] K. Y. uan, Q. Ling, and W. Yin, “On the convergence of decentral-
ized gradient descent,” SIAM J. Optim., vol. 26, no. 3, Sep. 2016,
Art. no. 1835C1854.
[131] W. Liu, L. Chen, Y. Chen, and W. Zhang, “Accelerating federated learning
via momentum gradient descent,” IEEE Trans. Parallel Distrib. Syst., Liangxin Qian received the bachelor’s degree in
vol. 31, no. 8, pp. 1754–1766, Aug. 2020. communication engineering in 2019 from the Univer-
[132] J. Wang, H. Liang, and G. Joshi, “Overlap local-SGD: An algorithmic sity of Electronic Science and Technology of China,
approach to hide communication delays in distributed SGD,” in Proc. Chengdu, China, where he is working toward the mas-
IEEE Int. Conf. Acoust., Speech Signal Process., Barcelona, Spain, 2020, ter’s degree. His research interests include multiple-
pp. 8871–8875. input, multiple-output, machine learning, and index
[133] W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin, “On the linear convergence modulation technologies.
of the ADMM in decentralized consensus optimization,” IEEE Trans.
Signal Process., vol. 62, no. 7, pp. 1750–1761, Apr. 2014.
[134] J. Zhang, “Asynchronous decentralized consensus ADMM for distributed
machine learning,” in Proc. Int. Conf. High Perform. Big Data Intell. Syst.,
Shenzhen, China, 2019, pp. 22–28.
[135] Z. Han and K. J. R. Liu, Resource Allocation for Wireless Networks:
Basics, Techniques, and Applications. Cambridge, U.K.: Cambridge
Ping Yang (Senior Member, IEEE) received the
Univ. Press, 2008.
Ph.D. degree from the University of Electronic Sci-
[136] Z. Han, D. Niyato, W. Saad, T. Basar, and A. Hjrungnes, Game Theory
ence and Technology of China, Chengdu, Sichuan,
in Wireless and Communication Networks: Theory, Models and Applica-
in 2013. He is currently a Full Professor with the
tions. Cambridge, U.K.: Cambridge Univ. Press, 2011.
University of Electronic Science and Technology of
[137] X. Ma, H. Sun, and R. Q. Hu, “Scheduling policy and power allocation for
China. From 2012 to 2013, he was a Visiting Student
federated learning in NOMA based MEC,” in Proc. IEEE Glob. Commun.
with the School of Electronics and Computer Sci-
Conf., Taipei, Taiwan, China, 2020, pp. 1–7.
ence, University of Southampton, Southampton, U.K.
[138] N. Zhang and M. Tao, “Gradient statistics aware power control for over-
From 2014 to 2016, he was a Research Fellow with
the-air federated learning in fading channels,” in Proc. IEEE Int. Conf.
the School of Electrical and Electronic Engineering,
Commun. Workshops, Dublin, Ireland, 2020, pp. 1–6.
Nanyang Technological University, Singapore. He
[139] S. Dominic and L. Jacob, “Distributed learning approach for joint channel
has authored or coauthored and presented more than 100 papers in journals
and power allocation in underlay D2D networks,” in Proc. Int. Conf.
and conference proceedings. His research interests include 5G and beyond
Signal Process. Commun., Noida, India, 2016, pp. 145–150.
wireless systems, machine learning, and bionic communication systems. He
[140] M. Yan, B. Chen, G. Feng, and S. Qin, “Federated cooperation and
is the Editor of IEEE COMMUNICATIONS LETTERS and the Lead Guest Editor of
augmentation for power allocation in decentralized wireless networks,”
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING.
IEEE Access, vol. 8, pp. 48088–48100, 2020.
[141] S. Wang, M. Chen, W. Saad, and C. Yin, “Federated learning for energy-
efficient task computing in wireless networks,” in Proc. IEEE Int. Conf.
Commun., Dublin, Ireland, 2020, pp. 1–6.
[142] Q. Shan, J. Xiong, D. Ma, J. Li, and T. Hu, “Distributed multi-agent
Q-learning for anti-dynamic jamming and collision-avoidance spectrum
access in cognitive radio system,” in Proc. 24th Asia-Pacific Conf. Com-
mun., Ningbo, China, 2018, pp. 428–432. Ming Xiao (Senior Member, IEEE) received the
[143] T. Park and W. Saad, “Distributed learning for low latency machine type bachelor’s and master’s degrees in engineering from
communication in a massive Internet of Things,” IEEE Internet Things the University of Electronic Science and Technology
J., vol. 6, no. 3, pp. 5562–5576, Jun. 2019. of China, Chengdu, China, in 1997 and 2002, re-
[144] D. Chen et al., “Matching-theory-based low-latency scheme for multitask spectively, and the Ph.D degree from the Chalmers
federated learning in MEC networks,” IEEE Internet Things J., vol. 8, University of Technology, Gothenburg, Sweden, in
no. 14, pp. 11415–11426, Jul. 2021. November 2007. From 1997 to 1999, he was a net-
[145] J. Kang, Z. Xiong, D. Niyato, Y. Zou, Y. Zhang, and M. Guizani, “Reli- work and software Engineer with ChinaTelecom.
able federated learning for mobile networks,” IEEE Wireless Commun., From 2000 to 2002, he also held a position with
vol. 27, no. 2, pp. 72–80, Apr. 2020. the Sichuan Communications Administration. Since
[146] Y. Ye, H. Chen, M. Xiao, M. Skoglund, and H. V. Poor, “Incremental November 2007, he has been with the Department
ADMM with privacy-preservation for decentralized consensus optimiza- of Information Science and Engineering, School of Electrical Engineering and
tion,” in Proc. IEEE Int. Symp. Inf. Theory, Los Angeles, CA, 2020, Computer Science, Royal Institute of Technology, Stockholm, Sweden, where he
pp. 209–214. is currently an Associate Professor. He was the Editor of IEEE TRANSACTIONS
[147] F. Lin, Q. Ling, W. Li, and Z. Xiong, “Stochastic Admm for Byzantine- ON COMMUNICATIONSduring 2012–2017, and the Senior Editor of IEEE Wire-
robust distributed learning,” in Proc. IEEE Int. Conf. Acoust., Speech less Communications Letters during 2012–2016. Since January 2015, he has
Signal Process., Barcelona, Spain, 2020, pp. 3172–3176. been the Senior Editor of IEEE COMMUNICATIONS LETTERS, and the Editor of
[148] A. Ghosh, R. K. Maity, S. Kadhe, A. Mazumdar, and K. Ramachan- IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS since 2018. He was the
dran, “Communication efficient and Byzantine tolerant distributed learn- lead Guest Editor of IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS
ing,” in Proc. IEEE Int. Symp. Inf. Theory, Los Angeles, CA, 2020, Special issue on Millimeter Wave Communications for future mobile networks
pp. 2545–2550. in 2017.

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.
342 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 16, NO. 3, APRIL 2022

Octavia A. Dobre (Fellow, IEEE) received the Dipl. Jun Li (Senior Member, IEEE) received the Ph.D.
Ing. and Ph.D. degrees from the Polytechnic Insti- degree in electronics engineering from Shanghai Jiao
tute of Bucharest, Bucharest, Romania, in 1991 and Tong University, Shanghai, China, in 2009. In 2009,
2000, respectively. Between 2002 and 2005, she was he was a Research Scientist with the Department of
with the New Jersey Institute of Technology, Newark, Research and Innovation, Alcatel Lucent Shanghai
NJ, USA. In 2005, she joined Memorial University, Bell. From 2009 to 2012, he was a Postdoctoral
Canada, where she is currently a Professor and the Fellow with the School of Electrical Engineering
Research Chair. She was a Visiting Professor with the and Telecommunications, University of New South
Massachusetts Institute of Technology, Cambridge, Wales, Sydney NSW, Australia. From 2012 to 2015,
MA, USA, and Universit de Bretagne Occidentale, he was a Research Fellow with the School of Electri-
Brest, France. cal Engineering, The University of Sydney, Sydney,
She has authored or coauthored more than 400 refereed papers in his research NSW, Australia. Since 2015, he has been a Professor with the School of Elec-
areas, which include wireless communication and networking technologies, and tronic and Optical Engineering, Nanjing University of Science and Technology,
optical and underwater communications. Nanjing, China. His research interests include network information theory, chan-
Dr. Dobre is the Director of Journals and Editor-in-Chief (EiC) of the IEEE nel coding theory, wireless network coding, and cooperative communications.
Open Journal of the Communications Society. She was the EiC of the IEEE
COMMUNICATIONS LETTERS, Senior Editor, Editor, and Guest Editor for various
prestigious journals and magazines. She was also the General Chair, Technical
Program Co-Chair, Tutorial Co-Chair, and Technical Co-Chair of symposia at
numerous conferences. Zhu Han (Fellow, IEEE) received the B.S. degree
Dr. Dobre was a Fulbright Scholar, Royal Society Scholar, and Distinguished in electronic engineering from Tsinghua University,
Lecturer of the IEEE Communications Society. She was the recipient of the Best Beijing, China, in 1997, and the M.S. and Ph.D. de-
Paper Awards at various conferences, including IEEE ICC, IEEE Globecom, grees in electrical and computer engineering from the
IEEE WCNC, and IEEE PIMRC. Dr. Dobre is an Elected Member of the University of Maryland, College Park, MD, USA, in
European Academy of Sciences and Arts, a Fellow of the Engineering Institute 1999 and 2003, respectively. He is currently a Profes-
of Canada, and a Fellow of the Canadian Academy of Engineering. sor with the Department of Electrical and Computer
Engineering and Department of Computer Science,
University of Houston, Houston, TX, USA.
Marco Di Renzo (Fellow, IEEE) received the Laurea
(cum laude) and Ph.D. degrees in electrical engi-
neering from the University of L.Aquila, Italy, in
2003 and 2007, respectively, and the Habilitation à
Diriger des Recherches (Doctor of Science) degree Qin Yi received the bachelor’s degree from the North
from University Paris-Sud (now Paris-Saclay Uni- University of China, Taiyuan, China, in 2020. She
versity), Paris, France, in 2013. Since 2010, he has is currently working toward the master’s degree with
been with the French National Center for Scientific the University of Electronic Science and Technology
Research (CNRS), where he is a CNRS Research of China, Chengdu, China. Her research interests
Director (Professor) with the Laboratory of Signals include multiple-input, multiple-output, distributed
and Systems (L2S) of Paris-Saclay University. CNRS learning, and beamforming.
and CentraleSupelec, Paris. In Paris-Saclay University, he serves as the Co-
ordinator of the Communications and Networks Research Area of the Lab-
oratory of Excellence DigiCosme, and as a Member of the Admission and
Evaluation Committee of the Ph.D. School on Information and Communication
Technologies. He is the Editor-in-Chief of IEEE COMMUNICATIONS LETTERS
and a Distinguished Speaker of the IEEE Vehicular Technology Society. In
2017–2020, he was a Distinguished Lecturer of the IEEE Vehicular Technology Jiarong Zhao received the bachelor’s degree in com-
Society and IEEE Communications Society. He has received several research munication engineering from the Hefei University of
distinctions, which include the SEE-IEEE Alain Glavieux Award, the IEEE Technology of China, Hefei, China, in 2020. She is
Jack Neubauer Memorial Best Systems Paper Award, the Royal Academy of currently working toward the master’s degree with
Engineering Distinguished Visiting Fellowship, the Nokia Foundation Visiting the University of Electronic Science and Technology
Professorship, the Fulbright Fellowship, and the 2021 EURASIP Journal on of China, Chengdu, China. Her research interests
Wireless Communications and Networking Best Paper Award. He is a Fellow include integrated sensing and communications, and
of the UK Institution of Engineering and Technology (IET), a Fellow of the index modulation technologies.
Asia-Pacific Artificial Intelligence Association (AAIA), an Ordinary Member of
the European Academy of Sciences and Arts (EASA), and an Ordinary Member
of the Academia Europaea (AE). Also, he is a Highly Cited Researcher.

Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on January 18,2025 at 10:36:19 UTC from IEEE Xplore. Restrictions apply.

You might also like