Edge Artificial Intelligence For 6G Vision Enabling Technologies and Applications

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 40, NO.
1, JANUARY 2022 5
Edge Artificial Intelligence for 6G: Vision,

Enabling Technologies, and Applications
Khaled B. Letaief , Fellow, IEEE, Yuanming Shi , Senior Member, IEEE,
Jianmin Lu, and Jianhua Lu, Fellow, IEEE
(Invited Paper)
Abstract— The thriving of artificial intelligence (AI) appli- disruptive technologies for future 6G. In particular, the
cations is driving the further evolution of wireless networks. United States [1], European Union [2], and China [3] have
It has been envisioned that 6G will be transformative and recently funded 6G projects with a common goal of enabling
will revolutionize the evolution of wireless from “connected
things” to “connected intelligence”. However, state-of-the-art connected intelligence. Besides, the International Telecommu-
deep learning and big data analytics based AI systems require nication Union (ITU) has published the system requirements
tremendous computation and communication resources, causing and driving characteristics for Network 2030 [4]. To improve
significant latency, energy consumption, network congestion, and a real-time immersive experience and interaction, as well
privacy leakage in both of the training and inference processes. as accelerate intelligence upgrades for industrial internet-
By embedding model training and inference capabilities into the
network edge, edge AI stands out as a disruptive technology for of-things (IoT) and digital twins, multiple companies are
6G to seamlessly integrate sensing, communication, computation, now considering new usage scenarios. For example, based
and intelligence, thereby improving the efficiency, effectiveness, on typical use cases in 5G [5], [6] (i.e., enhanced mobile
privacy, and security of 6G networks. In this paper, we shall broadband (eMBB), ultra-reliable and low-latency communi-
provide our vision for scalable and trustworthy edge AI systems cations (URLLC), and massive machine type communications
with integrated design of wireless communication strategies and
decentralized machine learning models. New design principles of (mMTC)), Huawei has recently proposed three additional
wireless networks, service-driven resource allocation optimization application scenarios in the vision of 5.5G. These include
methods, as well as a holistic end-to-end system architecture to uplink centric broadband communication (UCBC), real-time
support edge AI will be described. Standardization, software and broadband communication (RTBC), and harmonized commu-
hardware platforms, and application scenarios are also discussed nication and sensing (HCS) [7]. It is expected that 6G will
to facilitate the industrialization and commercialization of edge
AI systems. go beyond the mobile internet to support ubiquitous artificial
intelligence (AI) services and Internet of Everything (IoE)
Index Terms— 6G, edge AI, edge training, edge infer- applications [1]–[4], [8], including sustainable cities, con-
ence, federated learning, over-the-air computation, task-oriented
communication, service-driven resource allocation, large-scale nected autonomous systems, brain-computer interfaces, digital
optimization, end-to-end architecture. twins, tactile and haptic internet, high-fidelity holographic
society, extended reality (XR) and metaverse [9], e-health,
I. I NTRODUCTION etc. Researchers in industry and academia have published
A. Roadmap to 6G: Vision and Technologies many visionary 6G proposals [10]–[12] to provide a better
understanding, sensing, controlling, and interacting for a phys-
W ITH the standardization and worldwide deployment
of 5G networks, researchers, companies, and gov-
ernments have initiated the vision, usage scenarios, and
ical world. In particular, three new application services were
envisioned for 6G, including computation oriented commu-
nications (COC), contextually agile eMBB communications
Manuscript received June 22, 2021; revised September 8, 2021; accepted (CAeC), and event defined uRLLC (EDuRLLC) [12]. Based
October 11, 2021. Date of publication November 8, 2021; date of current on these quoted usage scenarios, we present the evolution of
version December 17, 2021. This work was supported in part by Project
No. 20210400L016 under RDC Corporation Ltd. and by the Natural Science visionary use cases for 6G in Fig. 1 by integrating intelligence,
Foundation of Shanghai under Grant 21ZR1442700. (Corresponding author: coordination, sensing, and computing for a connected cyber-
Yuanming Shi.) physical world.
Khaled B. Letaief is with the Department of Electronic and Computer Engi-
neering, The Hong Kong University of Science and Technology (HKUST), To shape the future of 6G use cases in 2030, multi-
Hong Kong, and also with the Peng Cheng Laboratory, Shenzhen 518066, disciplinary research and various disruptive technologies are
China (e-mail: eekhaled@ust.hk). required, including spectrum exploration technologies, devices
Yuanming Shi is with the School of Information Science and
Technology, ShanghaiTech University, Shanghai 201210, China (e-mail: and circuit technologies, as well as networking, computing,
shiym@shanghaitech.edu.cn). sensing, and learning functionalities. In particular, AI, espe-
Jianmin Lu is with Huawei Technologies Company Ltd., Shenzhen 518066, cially deep learning (DL), provides a revolutionary approach
China (e-mail: lujianmin@huawei.com).
Jianhua Lu is with the Department of Electronic Engineering and the Beijing to design and optimize 6G wireless networks across the
National Research Center for Information Science and Technology, Tsinghua physical, medium-access, and application layers [12], [13].
University, Beijing 100084, China (e-mail: lhh-dee@mail.tsinghua.edu.cn). Specifically, DL provides a novel way to design 6G air
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/JSAC.2021.3126076. interface by optimizing the radio environment [14], com-
Digital Object Identifier 10.1109/JSAC.2021.3126076 munication algorithms [15], hardware, and applications in a
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
6 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 40, NO. 1, JANUARY 2022
Fig. 1. Towards 6G: the evolution of use cases from 5G to 6G.
unified way [16], [17]. This has inspired the recent success fault-tolerance, as well as reduce network traffic congestion
applications for joint source-channel coding (JSCC) [18], and energy consumption. For instance, over-the-air federated
task-oriented communication [19], [20], semantic communi- learning (FL) provides a collaborative ML framework to train a
cation [21]. Besides, machine learning (ML) also provides a global statistical model over wireless networks without access-
paradigm shift for automatically learning high performance ing edge devices’ private raw data [31]. By directly executing
and fast optimization algorithms to solve the resource allo- the AI models at the network edge, edge inference can pro-
cation problems in wireless networks [22]–[25]. The domain vide low-latency and high-reliability AI services by requiring
knowledge (e.g., optimization models and theoretical tools) less computation, communication, storage, and engineering
was further incorporated into the DL framework for optimizing resources. For example, edge device-server co-inference is able
ultra-reliable and low-latency communication networks [26]. to remove the communication and computation bottlenecks
An ML approach was also developed for addressing the by splitting a large DNN model between edge devices and
communication, networking, and security challenges for vehic- edge servers [32]. However, edge AI will cause task-oriented
ular applications [27]. With the development of wireless data data traffic flows over wireless networks, for which disruptive
collection, learning models and algorithms, as well as software wireless techniques, efficient resource allocation methods and
and hardware platforms, we envision that AI will become holistic system architectures need to be developed. To embrace
a native tool to design disruptive wireless technologies for the era of edge AI, wireless communication systems and
accelerating the design, standardization, and commercializa- edge AI algorithms need to be co-designed for seamlessly
tion of 6G. On the other hand, the evolution of 6G wireless integrating communication, computation, and learning.
communication technologies and communication theory will
also inspire the progress and development of AI techniques
in terms of novel learning theory, new deep neural net- B. Edge AI: Challenges and Solutions
work (DNN) architectures, customized software and hardware Creating a trustworthy and scalable edge AI system will
platforms. be of utmost importance for imbuing connected intelligence
Given the requirements of emerging 6G, connected intel- in 6G. The challenges of trustworthiness and scalability
ligence is expected to be the central focus and an indis- are multidisciplinary spanning ML, wireless networking, and
pensable component in 6G [28]. This shall revolutionize the operation research. Specifically, trustworthiness in terms of
evolution of wireless from “connected things” to “connected privacy and security is one of the key requirements for 6G
intelligence”, thereby enabling the interconnections between intelligent services and applications, for which the general
humans, things, and intelligence within a hyper-connected data protection regulation (GDPR) needs to be satisfied, and
cyber-physical world [12]. Edge AI provides a promising directly transmitting or collecting data from users are forbid-
solution for connected intelligence by enabling data collection, den. To tame privacy leakages and adversarial attacks, various
processing, transmission, and consumption at the network edge learning models and architectures have been proposed,
edge [29], [30]. Specifically, by embedding the training including FL (i.e., server-client network architecture with data
capabilities across the network nodes, edge training is able to partition among edge devices) [33], [34], swarm learning
preserve privacy and confidentiality, achieve high security and (i.e., decentralized device-to-device (D2D) communication
LETAIEF et al.: EDGE AI FOR 6G: VISION, ENABLING TECHNOLOGIES, AND APPLICATIONS 7
architecture without central authority) [35], and split learning C. Edge AI Empowered 6G Networks
(i.e., model parameters partitioned among edge devices and
The developed edge AI technology will serve as a distrib-
edge servers) [36], [37]. Distributed reinforcement learning
uted neural network to accelerate the evolution of sensing
(RL) [38], [39] and trustworthy learning techniques [40], [41]
capabilities, communication strategies, network optimizations,
were further proposed to address the dynamic and adversarial
and application scenarios in 6G networks. Specifically, edge
learning environments, respectively. In particular, differen-
AI paves the way for network sensing and cooperative per-
tial privacy [42], lagrange coded computing [43], security
ception to understand the network environments and services
multi-party computation, quantum computing, blockchain, and
for an agile and intelligent decision making. For example,
distributed ledger technologies can be further leveraged to
edge simultaneous localization and mapping (SLAM) [56],
build trustworthy edge AI architectures. However, with limited
[57] has recently been developed to deploy DL based visual
storage, computation, and communication resources in the
SLAM algorithms on vehicles by edge inference. Edge AI
wireless edge networks, deploying an edge AI system causes
can also help design AI-native communication strategies for
a significant scalability issue in terms of latency, energy and
the physical layer (e.g., task-oriented semantic communica-
accuracy. To address this challenge, a paradigm shift for
tion [58]) and medium access control layer (e.g., random
wireless system design is required from data-oriented com-
access protocol [59]). For instance, edge DL approach has
munication (i.e., maximizing communication rate or reliability
been developed in [58] to deliver low-latency semantic tasks
based on Shannon theory) to task-oriented communication
(e.g., text messages) by learning the communication strategies
(i.e., achieving fast and accurate intelligence distillation at the
in an end-to-end fashion based on JSCC. Furthermore, edge AI
network edge).
provides a new paradigm for optimization algorithms design to
In this paper, we shall provide a comprehensive picture
enable service-driven resource allocation in 6G networks [60].
for the design of scalable and trustworthy edge AI sys-
For instances, distributed RL [55], decentralized graph neural
tems by matching the principles and architectures of wire-
networks [23], and distributed DNN [61], are able to auto-
less networks with the task structures of edge AI models
matically learn the distributed resource allocation optimization
and algorithms. The system performance metrics for edge
algorithms. By seamlessly integrating sensing, communica-
AI are further characterized to facilitate efficient resource
tion, computation, and intelligence, edge AI shall empower
allocations based on operation research and ML. Specifi-
6G networks to support diversified intelligent applications,
cally, to design a communication-efficient edge AI train-
including autonomous driving, industrial IoT, smart healthcare,
ing system, we will provide novel multiple access schemes
etc.
(e.g., over-the-air computation (AirComp) for model aggre-
To further imbue native intelligence, native trustworthiness,
gation [31], [44], [45]) to support massive access for edge
and native sensing in 6G, mimicking nature for innovat-
devices, new multiple antenna techniques (e.g., cell-free mas-
ing edge AI empowered future networks can be envisioned.
sive MIMO [46], [47] and reconfigurable intelligent surface
Inspired by the dynamic spiking neurons in the human
(RIS) [48], [49]) to support fast exchange for high-dimensional
brain, the energy consumption and latency of edge AI can
model updates, and next-generation network architectures
be significantly reduced by processing the learning tasks
(e.g., space-air-ground integrated network (SAGIN) [50],
in an event-driven manner [62], [63]. The brain-inspired
[51]) to support diverse edge learning models and topolo-
stigmergy-based federated collective intelligence mechanism
gies. To design a communication-efficient edge inference
was proposed in [64] to accomplish multi-agent tasks (e.g.,
system with low-latency and reliability guarantees, interfer-
autonomous driving) through simple indirect communications.
ence management, cooperative transmission, and task-oriented
By leveraging the prior knowledge of the immune system and
communication will be introduced to support edge device
brain neurotransmission, a brand-new network security archi-
distributed inference [52], edge server cooperative infer-
tecture and fully-decoupled radio access network have recently
ence [53], [54], and edge device-server co-inference [32],
been proposed in [65] and [66], respectively. These results
respectively. We then provide a holistic view for mathemat-
on nature-inspired edge AI models and network architectures
ically modeling the resource allocation problems in edge
provide a strong evidence that one can establish an integrated
training and inference systems, which are categorized as
data-driven and knowledge-guided framework to design and
mixed combinatorial optimization, nonconvex optimization
optimize 6G networks. Further details and description of the
and stochastic optimization models. A “learning to optimize”
edge AI empowered 6G network are provided in Fig. 3,
framework is further introduced to facilitate scalable, real-
which highlight the integration of sensing, communication,
time, robust, parallel, distributed, and automatic optimization
computation and intelligence in a closed-loop ecosystem.
algorithms design for service-driven resource allocation in
edge AI systems [22], [23], [25], [55]. We also provide a holis-
tic end-to-end architecture for edge AI systems. Moreover,
standardizations, resource allocation optimization solvers, soft- D. Key Contributions
ware and hardware platforms, and application scenarios are
discussed. The roadmap to edge AI ecosystem is demonstrated We provide extensive discussions, visions, and summaries
in Fig. 2 to encourage multidisciplinary collaborations among of wireless techniques, resource allocations, standardizations,
information science, computer science, operation research, and platforms, and application scenarios to embrace the era of edge
integrated circuits. AI for 6G. The major contributions are summarized as follows:
Fig. 2. Roadmap to edge AI.
• The vision (i.e., connected intelligence for 6G), chal- cooperative transmission and task-oriented communica-
lenges (i.e., trustworthiness and scalability) and solutions tion, respectively.
(i.e., wireless techniques, resource allocations and system • A unified framework for resource allocation in edge AI
architectures) for edge AI, as well as edge AI empow- systems is provided in Section IV. Here, we present oper-
ered 6G network, are introduced and summarized in ation research based theory-driven and machine learn-
Section I. ing based data-driven approaches for designing efficient
• The communication-efficient edge training system is pre- resource allocation optimization algorithms.
sented in Section II, including the edge learning models • A holistic end-to-end architecture for edge AI systems
and algorithms, followed by the promising wireless tech- is proposed in Section V, including network infrastruc-
niques and architectures to support their deployment. ture, data governance, edge network function, edge AI
• The communication-efficient edge inference system is management and orchestration.
introduced in Section III. Here, we introduce hori- • The standardizations, software and hardware platforms,
zontal edge inference and vertical edge inference by and application scenarios are discussed in Section VI.
Fig. 3. Edge AI empowered 6G networks: integrated sensing, communication, computation, and intelligence.
This will help facilitate the booming market of edge AI developed. As shown in Fig. 4, based on the data partition and
in the 6G era. model partition principles [29], we will first introduce various
We summarize the main topics and relevant technologies as edge training architectures, including FL, decentralized learn-
well as highlight the representative results in Table I. ing, and model split learning. We then present distributed RL
and trustworthy learning techniques to accommodate dynamic
II. C OMMUNICATION -E FFICIENT E DGE T RAINING and adversarial environments, respectively, as shown in Fig. 5.
In this section, we shall present various communication- 1) Federated Learning: FL is a collaborative ML frame-
efficient distributed optimization algorithms for edge training, work to train a global statistical model without accessing
followed by promising enabling wireless techniques to support edge devices’ private raw data, wherein a dedicated edge
the deployment of edge learning models and algorithms. server is responsible for aggregating local learning model
updates and disseminating global learning model updates [34],
as shown in Fig. 4 (a). FL is being adopted by many industrial
A. Edge Learning Models and Algorithms practitioners, including Google’s Gboard mobile keyboard for
The training process of edge AI models typically involves next word prediction and emoji suggestion, Apple’s QuickType
minimizing a loss or empirical risk function to fit a global keyboard for vocal classifier, NVIDIA for COVID-19 patients
model from decentralized data generated by a massive number oxygen needs prediction, and WeBank for money laundering
of intelligent devices. The goal of the distributed optimization detection [68]. Compared with the cloud data center based
for edge training is to minimize the global loss function L, distributed learning, cross-device FL raises unique challenges
namely, for solving the distributed training optimization problems,
including high communication costs with a large model fre-
minimize L(θ) := wk Lk (θ; Dk ), (1) quently exchanged over wireless networks, statistical hetero-
θ∈Rd
k∈S
geneity with non-identical local data distributions and sizes,
where θ ∈ Rd are the model parameters, Lk is the local loss system heterogeneity with varied storage, computation and
function of device k over local dataset Dk , S denotes
the communication capabilities, as well as dynamic devices partic-
set of participating edge nodes, and wk ≥ 0 with wk = ipation [122]. A growing body of recent works have developed
1 denotes the weight for each local loss function. Considering effective methods to address these unique challenges in FL.
the network topology for edge training, the heterogeneous To address the challenge of expensive communication over-
local dataset Dk , varying device participation S, dynamic heads for intermediate local updates with a central server,
communication and computation environments, as well as federated averaging [67] turns out to be effective to reduce
privacy concerns and adversarial attacks, highly-efficient and the number of communication rounds by performing multi-
trustworthy distribution optimization algorithms need to be ple local updates, e.g., running multiple stochastic gradient
TABLE I
A N OVERVIEW OF THE M AIN T OPICS AND R EPRESENTATIVE R ESULTS
descent (SGD) iterations on each edge device. The local updat- communication rounds compared with the vanilla distributed
ing approach is able to learn a global model within much fewer SGD method, i.e., only running one mini-batch with SGD
Fig. 4. Edge learning models and architectures.
proposes to only communicate the informative elements of the

gradient or model vectors among nodes [128], [129]. A set of
algorithms combining the local updates method and model
compression have shown the capability of achieving high
communication efficiency [130], [131]. In particular, a lazily
aggregated quantized gradient method was further proposed
in [132] to reduce both the amount of exchanged data and
communication rounds by reusing the outdated gradients for
the less informative quantized gradients.
Although the above periodical compressed update methods
have shown empirical or theoretical success for tackling the
communication challenge, the heterogeneity in systems and
local datasets may slow down or even diverge the conver-
gence [133], [134], for which various algorithms and models
have been proposed to address the statistical and system
heterogeneity challenges. To learn the AI models from sta-
tistically heterogeneous local datasets, various effective and
personalized models have been proposed to rectify the original
model (1), including regularizing local loss functions at each
device [134]–[136], distributionally robust modeling [137],
[138], multi-task learning [139], as well as the meta-learning
approaches [140]. Running a local update at the devices with
heterogeneous computation capabilities may yield objective
inconsistency or client drift, i.e., the learned model can be
far from the desired true model. To address this problem,
an operator splitting method was proposed to avoid the local
models drifting apart from the global model [141]. A nor-
malized model aggregation method was also developed to
ensure that the global model converges to the desired true
model [142]. A novel federated aggregation scheme was
further developed in [143] to address the system hetero-
geneity issue concerning the dynamic, sporadic and partial
Fig. 5. Edge learning modes in dynamic and adversarial environments. device participation. To leverage the computation capabilities
across the device-edge-cloud heterogenous network, a hier-
archical model aggregation approach was proposed in [130]
at each edge device. Model compression, such as quanti- to reduce the latency by controlling the two aggregation
zation and sparsification, is another notable way to address intervals.
the communication bottleneck by reducing the size of the 2) Decentralized Learning: Decentralized ML learns a
exchanged messages during each model update round. Scalar global model from inherently decentralized data structures via
quantization is a typical way to implement lossy compression peer-to-peer communications over the underlying communi-
for the high-dimensional gradient vectors by quantizing each cation network topology without a central authority [144],
of their entries to a finite-bit low precision value [123]–[125], as shown in Fig. 4 (b). It has great potentials for applications
which was further improved by the recent proposal of vector in the autonomous industrial systems, including cooperative
quantization [126], [127]. Sparsification, on the other hand, automated driving, cooperative simultaneous localization and
mapping, and collaborative robotics in advanced manufac- general DL models [155]. However, this approach is prone to
turing environments [145]. The decentralized learning archi- data privacy leakage as the datasets need to be shared across
tecture harnesses the benefits of communication efficiency, edge devices. Vertical FL, on the other hand, can directly learn
computation scalability and data locality. In particular, swarm the global model from the partitioned data features among
learning [35] provides a completely decentralized AI solution different edge devices without sharing them [156]. Therefore,
based on decentralized ML by keeping local datasets at each the data features and the associated model parametric blocks
edge device. This can achieve high privacy, security, resilience are split among edge devices, for which the asynchronous
and scalability. Compared with the sever-client learning archi- SGD method can be applied for vertical FL [157]. Consensus
tecture in FL, decentralized learning can accommodate the algorithms were also developed in [158] to jointly learn a
decentralized D2D communication network architectures and model under the decentralized network while keeping the
protocols with arbitrary connectivity graphs (e.g., cooperative distributed data features locally.
driving and robotics networks). It can also overcome the strag- Split DL further provides a flexible way to train a DNN by
gler dilemma with heterogeneous hardware, as well as improve dividing it into lower and upper segments located at the edge
the robustness to data poisoning attacks and master node device-side and edge server-side, respectively [36]. It can be
fails [35], [145]. The convergence behavior of decentralized typically applied to the medical diagnosis and millimeter wave
learning highly depends on the decentralized averaging mecha- channel prediction [37]. Split DL is able to preserve privacy
nism and the network topology for data exchange [71]. Typical without sharing raw data and enjoys computation scalability by
decentralized aggregation approaches include the consensus- allowing that only edge devices perform simple computation
based methods [69] and diffusion strategies [70]. for the lower segments. Compared with FL, split DL can
To improve the communication efficiency for exchanging significantly improve computation efficiency, reduce commu-
the locally updated models at edge devices within their neigh- nication costs, as well as achieve higher learning accuracy,
bors, one may reduce either the number of communication data security and system scalability. Specifically, edge devices
rounds (i.e., improve convergence rate) or the volume of and edge server collaboratively train the whole neural network,
exchanged data per round. Specifically, the variance reduction which involves routing the activations of the edge device-side
with the gradient tracking method was investigated in [146] subnetwork to the edge sever via forward propagation, and
to achieve a fast convergence rate. Periodic-averaging via downloading the gradients of the edge server-side subnetwork
running multiple local updates before decentralized averaging to update the lower segment via back propagation. However,
is an effective way to reduce the number of communication exchanging the instantaneous intermediate values between
rounds among devices [147], [148]. Besides, quantizing or edge devices and edge server becomes the communication
sparsifying the locally updated models can reduce the volume bottleneck, especially in the case with multiple edge devices.
of the exchanged messages to address the communication Therefore, a joint communication strategy and neural net-
bottleneck [149]. A consensus distance controlling framework work architecture design is required [37] for split training
was further developed in [150] to achieve the trade-off between of various DNNs with heterogeneous edge devices. Consid-
the learning performance and the exactness of decentralized ering the large-scale privacy-sensitive and delay-sensitive IoT
averaging for decentralized DL. Moreover, a communication applications, Lyu et al. [159] proposed a hybrid fog-based
network topology design is also critical to improve the com- privacy-preserving DL framework, where a fog-level DNN
munication efficiency [151], for which a group alternating is partitioned between the edge device and the fog server
direction method of multipliers [152] was proposed to form side.
a connectivity chain by dividing the workers into head and 4) Distributed Reinforcement Learning: RL provides a flex-
tail workers. To address the heterogeneity issue of local ible framework for sequential decision making in dynamic
datasets, the momentum-based method [153] has recently been settings by interacting with a dynamic environment, as shown
developed to achieve good generalization performance. in Fig. 5 (a). This can be frequently modeled as deci-
3) Model Split Learning: Model split learning enables a sion making and learning in a Markov decision process
collaborative learning process across the edge devices and (MDP) [160]. Typical RL algorithms include the model based
edge servers by partitioning the model parameters across the algorithm, policy-based algorithm (e.g., natural policy gradi-
edge nodes, as shown in Fig. 4 (c). That is, each edge node k, ent), value based algorithm (e.g., Q-learning), and actor-critic
including edge devices and edge servers, is only responsible method. In particular, an asynchronous method, by lever-
for updating θk with θ = [θ 1 , θ2 , . . . , θS ] in (1). This model aging parallel computing, was developed in [161] to solve
splitting architecture can achieve higher privacy levels and the large-scale nonconvex RL problem. However, in modern
better trade-offs between communication and computation. intelligent applications, e.g., autonomous driving and robotics,
It is thus particularly applicable for DL with a large model it is critical to consider multi-agent reinforcement learning
parameters size, whereas the data partition based training (MARL), in which multiple agents collaboratively interact
method, e.g., FL, normally requires the local update of a with a common environment to complete a common goal
whole copied global model at each involved edge device. and maximize a shared team award with different local action
The model parameter partitioned edge learning approach [72] spaces [73]. Due to the enormous state-action space, delayed
proposed to train only a block of model parameters based rewards and feedback, as well as the non-stationary and
on the coordinate decent method for the decomposable ML unknown environments with heterogeneous agents’ behaviors,
models [154] or the alternating minimization approach for the efficient communication strategy among multiple agents shall
play a key role to achieve good and stable performance for local updates) in FL with a server-client architecture, various
MARL. robust and secure model aggregation schemes (e.g., geometric
For the server-client architecture based MARL, the edge median [168], trimmed mean [169], and Krum [170]) were
server coordinates the learning process for all the edge agents. proposed to tolerate the Byzantine corrupted edge devices.
Lowe et al. [162] proposed a multi-agent actor-critic method To simultaneously preserve privacy for individual users while
involving decentralized actors at each agent and a centralized tolerating Byzantine adversaries, a Byzantine-resilient secure
critic for parameter sharing among the agents. To improve the aggregation framework was developed in [171] to detect adver-
communication efficiency of the distributed policy gradient sarial models without the knowledge of individual local mod-
for MARL, a lazily aggregated policy gradient was devel- els, as they are masked for privacy guarantees. To further avoid
oped in [38] to reduce the communication rounds by only malicious edge servers, blockchain technology was utilized to
communicating informative gradients of partial agents while provide a decentralized consensus environment to guarantee
reusing the outdated gradients for the remaining agents. For the validity of global models in every learning iteration. This
applications without central coordinators, e.g., autonomous is achieved by packing the local models and global model into
driving, decentralized MARL is essential wherein the agents blocks, which are confirmed under a consensus mechanism,
only allow the exchange of messages with their neighbors over followed by linking them into the blockchain [172]. To protect
a communication connectivity graph [163]. Zhang et al. [39] decentralized learning from attacks, a blockchain based peer-
proposed decentralized actor-critic algorithms with function to-peer network was developed in [35] to support swarm
approximation, where each agent makes individual decisions learning without a central server. This high security level in
based on both the information observed locally and the decentralized learning is achieved by securely enrolling new
messages shared through a consensus step over the network. nodes via blockchain smart contract to perform local model
A decentralized entropy-regularized policy gradient method by training.
only sharing information with neighbor agents was developed To summarize, the presented edge learning models and
in [164] to learn a single policy for multi-task RL with multiple algorithms provide a strong evidence that to deploy the edge
agents operating different environments. training process in wireless networks, we need to develop new
5) Trustworthy Learning: To learn and deploy AI mod- wireless communication techniques and strategies to support
els for high-stake applications (e.g., autonomous driving) at massive and flexible edge devices participation, as well as
the network edge, it is critical to ensure privacy, security, support efficient function computation for model aggrega-
interpretability, responsibility, robustness, and fairness for the tion (e.g., weighted sum global model aggregation in FL,
edge learning processes, as shown in Fig. 5 (b). However, the consensus model aggregation in decentralized learning, and
heterogeneity of massive scale edge systems and decentralized robust model aggregation in secure learning). Various edge
datasets raises unique challenges to design trustworthy edge training architectures (e.g., server-client, decentralized, and
AI techniques. Although FL addresses the local confidentiality hierarchical network topologies), as well as high-dimensional
issue by keeping datasets locally, the shared model updates model updates exchange motivate us to develop new wire-
still cause extreme privacy leakage (e.g., model inversion less network principles and architectures to support edge AI
attack), the learned global model can be colluded by mali- training systems, which will be discussed in the following
cious attackers [40], [165], and the edge devices may be subsection.
adversarial attackers (e.g., data or model poisoning). This
calls for rigorous privacy-preserving mechanisms and secure
aggregation rules [43]. Differential privacy provides a promis- B. Wireless Techniques for Edge Training
ing lightweight privacy-preserving mechanism to guarantee a As the communication target for edge AI becomes the
level of privacy disclosure for local datasets by adding random learning performance instead of the conventional data rates,
perturbations [42]. The additive noise and signal superposition we shall exploit the task structures of edge AI models
properties in the wireless channel can be naturally harnessed as and algorithms to match the principles and architectures of
the privacy-preserving mechanism [41]. The resulting inherent wireless networks. This helps demystify the efficiency of
noisy model aggregation scheme can limit the privacy disclo- edge training in wireless networks, which yields a learning-
sure of local datasets at the edge server for free while keeping communication co-design principle for future 6G wireless
the learning performance unchanged [41], [166]. To improve networks to enable AI functionalities sitting natively within
the communication efficiency for private distributed learning, 6G. As shown in Fig. 6, we will introduce next generation
Chen et al. [167] developed efficient encoding and decoding multiple access schemes (e.g., AirComp and massive random
mechanisms to simultaneously achieve optimal communica- access) to accommodate a massive number of edge devices
tion efficiency and differential privacy under typical statistical dynamically involved in the training process, new multiple
learning settings. antenna techniques (e.g., RIS and cell-free massive MIMO) to
Apart from preserving privacy for individual users, edge AI support high-dimensional model updates exchange, as well as
also needs to be robust to errors and adversarial attackers, new network architectures (e.g., SAGIN and unmanned aerial
as the decentralized nature makes it easy to be unreliable in vehicle (UAV) network) to support diversified edge training
the learning process or even completely controlled by external models and topologies.
attackers [74]. To address Byzantine attacks (i.e., the faulty 1) Over-the-Air Computation: Edge training tasks typically
edge device can behave arbitrarily badly by modifying its involve computing aggregation functions of multiple local
Fig. 6. Enabling wireless techniques for edge training.
model updates to update a global model. To accomplish design and analysis. To tackle the channel fading pertur-
weighted averaging aggregation in FL, consensus aggregation bation, a channel inversion method was proposed in [44],
in decentralized learning, and robust aggregation (e.g., geomet- [45], [173] by multiplying the inverse of channel gain for
ric median) in trustworthy learning, the local updates need to the transmit signal, which may however not satisfy the
be transmitted from the edge devices, followed by computing power constraint at edge devices. To address these issues,
the relevant aggregation function at the edge sever. However, a transceiver design was provided in [31] to minimize the
the limited bandwidth and resource in wireless networks distortion for the perturbed model aggregation, whereas the
becomes one of the key bottlenecks to enable a massive perturbed model updates are directly incorporated in the FL
number of edge devices that upload the local model updates for algorithm design [166]. Although the analog transmission
global aggregation. AirComp provides a new multiple access in AirComp is prone to channel noise, the additive noise
scheme for low-latency model aggregation. By concurrently in the model aggregation turns out to be controllable or
transmitting the locally updated models, AirComp can harness even beneficial in the edge training process. Specifically, the
interference to reduce communication bandwidth consump- channel noise in the model aggregation yields a new class of
tions. The key idea is that the waveform superposition of a noisy FL algorithms. The convergence behavior demonstrates
wireless multiple access can be exploited for computing the that the noisy iterates typically introduce non-negligible opti-
nomographic functions (e.g., the model aggregation function mality gap in various FL algorithms, e.g., vanilla gradient
weighted average) over the same channel [118], as shown method [174], quantized gradient method [175], sparsified
in Fig. 6 (a). Specifically, the transmitted signals at edge gradient method [173], and operator splitting method [87].
devices are first multiplied by the fading channels and then The optimality gap can be further controlled by transmit power
superposed over-the-air with additive channel noise, result- allocation [41], [173], [176], model aggregation receiver beam-
ing in a noisy weighted sum of transmitted signals [31]. forming design [31], [177], [178], and device scheduling [31],
This perfectly matches the structure of model aggregation [178], [179]. Besides, channel perturbation in algorithm iter-
computation. Note that the robust aggregation function (i.e., ates can also serve as the mechanism to design saddle points
geometric median) does not hold the additive structure. But escaping algorithms [94], thereby establishing global opti-
we can still approximate it by computing a few number of mality for training the non-convex over-parameterized neural
weighted averaging functions via AirComp [93]. The com- networks in high-dimensional statistical settings [180]. The
munication latency and bandwidth requirement of AirComp additive channel noise in model aggregation can also serve
will not increase with the number of edge devices, thus as an inherent privacy-preserving mechanism to guarantee
relieving the communication bottleneck in the edge training differential-privacy levels for each edge device without sac-
process. rificing learning performance [41].
Channel fading and noise perturbation in the model aggre- 2) Massive Access Techniques: Deploying cross-devices FL
gation raise unique challenges for the edge training algorithm in IoT networks raises practical challenges, i.e., the IoT
devices have sporadic access to the wireless network [181]. further scaled up by an order-of-magnitude in 6G [8]. The
It is thus critical to design practical FL systems to accom- recent advances in digital beamforming, analog beamforming,
modate flexible device participation with sporadic access to as well as hybrid beamforming have helped the roll-out of mas-
the wireless network [143], as shown in Fig. 6 (b). The sive MIMO into practice by operating over a wider frequency
grant-free random access protocol provides a low-latency and band. It has been demonstrated that massive MIMO is able to
low signaling overhead way to detect the active devices, fol- bring enormous benefits for edge training systems, including
lowed by decoding their corresponding information data [75], high-accuracy and high-rate for model aggregation, as well as
[182], [183]. In this protocol, active devices can transmit high-reliability for massive device connectivity. Specifically,
the data signals directly without waiting for any permis- massive MIMO can achieve a high computation accuracy
sion. Sparse signal processing provides a promising modeling for model aggregation via exploiting spatial diversity [196],
framework to simultaneously detect the active devices and and enable ultra-fast model aggregation with simultaneous
estimate their channels [75], [76], which is supported by var- multi-functions computation by spatial multiplexing [197].
ious efficient algorithms, including the approximate message Furthermore, for FL with edge devices sporadically enrolling,
passing algorithm [184], [185] and DNN algorithm unrolling the device activity detection error goes to zero as the number of
approach [97], [186]. To further reduce the latency for data antenna elements in the BS goes to infinity, thereby achieving
decoding in random access, a sparse blind demixing frame- high-reliable devices participation for model updates.
work was developed in [187] by simultaneously performing To scale edge training to huge physical areas with mas-
active device detection, channel estimation and their data sive geographically distributed edge devices, ultra-dense wire-
decoding. The key observation is that blind demixing is able less network is a promising way to achieve low-latency,
to perform low-latency data decoding for multiple users from high-reliability and high-performance. This is achieved by
the sum of bilinear measurements without channel estimation simultaneously uploading massive local model updates with
at both the transmitters and receivers [188], [189]. To enhance multiple distributed edge servers with abundant communica-
the performance, the common sparsity pattern in pilot and user tion, computation, and storage resources, thereby mitigating
data has been exploited via joint activity detection and data the stragglers issues (i.e., devices with low communication
decoding [190], [191]. and computation capabilities may prolong the training time)
Random access protocols are promising to support flexible and unfavorable channel dynamics. Besides, compared with
and massive device participation in the edge training process the single edge server architecture, distributed edge servers
by identifying active devices with sporadic traffic. It is still are robust to server failure issues for reliable edge training.
critical to develop massive access techniques to improve In particular, cloud radio access network (Cloud-RAN) [198],
the learning performance by enrolling more active devices [199] provides a cost-effective way to implement distributed
to perform local model update and exchange under digital antenna aided edge training systems, for which reliable model
transmission. Nonorthogonal multiple access (NOMA) [77], aggregation via AirComp can be achieved by centralized
[78] is a key enabling candidate technology to simultaneously signal processing and shortening the communication distances
serve massive devices for model aggregation in the same between edge devices and edge servers [200]. The recent pro-
radio resource block via superposition coding. Typical NOMA posal of cell-free massive MIMO [201] serves a promising way
schemes include the power-domain NOMA with different to realize the wireless distributed FL systems by exploiting the
transmit powers as weight factors and the code-domain NOMA channel hardening characterization (i.e., the effective channel
(e.g., sparse code multiple access [192] and pattern divi- gain is approximated by its expectation value) and avoiding
sion multiple access [78]) with different codes assigned to sharing instantaneous CSI among edge servers [46], as shown
users. Therefore, the user’s data can be decoded from the in Fig. 6 (c).
simultaneously transmitted signals via successive interference 4) Reconfigurable Intelligent Surfaces: To obtain the
cancellation. In particular, DL provides a powerful method desired average function of local model updates for model
to design and optimize NOMA systems [193]–[195]. Under aggregation via AirComp, magnitude alignment by scaling the
analog uncoded transmission, interference can be harnessed transmit signals (e.g., channel inversion) is normally required
via the new massive access techniques AirComp, for which to reduce the channel perturbation [202]. However, due to
Dong et al. further proposed a blind AirComp for low-latency the resource-limited edge devices and the non-uniform fad-
model aggregation without channel state information (CSI) ing channels, the unfavorable signal propagation environment
access [79]. It is thus particularly interesting to integrate inevitably leads to magnitude reduction and misalignment
a massive random access protocol (e.g., grant-free random with perturbed model aggregation, which in turn degrades the
access) and massive access technique (e.g., AirComp based learning performance of the edge training process. Besides, the
access technique) with analog uncoded transmission to simul- massive edge devices with sporadic access to the edge servers
taneously perform active device detection, channel estima- can be located at a service dead zone, which makes device
tion and model aggregation, thereby supporting flexible and activity detection challenging for weak channel links [185].
low-latency edge devices enrolling for collaboratively training To enroll multiple edge devices via simultaneously trans-
the models. mission with NOMA, sufficient diversified channel gains are
3) Ultra-Massive MIMO: Leveraging massive antenna normally required for successive interference cancellation,
arrays is a key enabling wireless technology to achieve high which however may not always hold in practical scenarios
spectral and energy efficiency, which is envisioned to be [203]. Heterogeneity in terms of computation, communication,
and storage across edge devices is one of major challenges to AI models [215], it is critical to access abundant computation
deploy edge AI systems. Waiting for the straggler edge devices resources across the continuum of nodes from edge devices,
with slow computation and communication speeds for model edge servers, to cloud servers [50]. It was shown in [130]
aggregation causes significant delays, which can be tackled by that the client-server-cloud multi-layer architecture is able to
computation offloading and task scheduling by mobile edge significantly reduce the training time and energy consumption.
computing (MEC) technique [204]. However, fully unleashing In the scenario without abundant edge and cloud computing
the benefits of MEC for straggler mitigation is limited by the infrastructures, SAGIN provides an ubiquitous computing plat-
hostile wireless links [205]. form for the multi-layer hierarchical edge learning system,
To address the above challenges in terms of propagation where the flying UAVs serve as the proximal edge computing,
impairments, RIS has been shown to be a cost-effective and the low earth orbit satellites serve as the relays to the
technology to support fast yet reliable model aggregation with cloud computing [216]. To realize SAGIN empowered edge
massive edge devices participation by programming the prop- training system, tier-adaptive aggregation interval manage-
agation environment of electromagnetic waves [118], [206], ment becomes critical to control the local and global model
as shown in Fig. 6 (d). Specifically, RIS is typically realized aggregation intervals [130] to achieve high communication
by planar or conformal artificial metamaterials or metasurfaces efficiency. Besides, the client-edge-satellite association with
equipped with a large number of low-cost passive reflecting dynamic scheduling and offloading is fundamental to tackle
elements, which are capable of adjusting the phase shifts the heterogeneity challenges in terms of system resources and
and amplitudes of the incident signals for directional signal network topologies.
enhancement or nulling, and thus altering the propagation In summary, this section presented multiple access technolo-
of the reflected signals [49], [207], [208]. To design an gies (e.g., AirComp, grant-free random access, NOMA), mul-
RIS-empowered edge training system, RIS can be leveraged tiple antenna techniques (e.g., Cloud-RAN, cell-free massive
to align the magnitudes of the transmit signals by establishing MIMO, RIS), and multiple layer networks (e.g., UAV, SAGIN)
favorable propagation links in waveform superposition for that are needed to support low-latency model aggregation and
AirComp, resulting in boosted received signal power and diversified learning architectures and environments. We hope
accurate aggregated function at the edge server [209]. The this can inspire more advanced 6G wireless and information
boosted model aggregation via RIS can support efficient techniques (e.g., millimeter-wave and terahertz (THz) commu-
edge devices scheduling in over-the-air FL, thereby adapt- nications [217], [218], age of information [219]) to support
ing to the time-varying local model updates and channel edge AI systems for establishing integrated communication,
dynamics [48], [178]. The reliable sporadic access in edge computation and learning ecosystems.
training can be developed by establishing abundant propaga-
tion scatters using RIS for accurate activity detection [185]. III. C OMMUNICATION -E FFICIENT E DGE I NFERENCE
The latency for local model updates of the active devices can In this section, we present communication-efficient tech-
be further reduced by establishing favorable propagation links niques for edge inference tasks with latency and reliability
via RIS, thereby mitigating stragglers [205]. guarantees. Based on the dataset distribution characteristics,
5) Space-Air-Ground Integrated Networks: The typical Yang et al. [33] proposed to categorize FL as horizontal FL
SAGIN [51], [210] provides an integrated space informa- (i.e., datasets share the same feature space but different sample
tion platform across the satellite networks (e.g., miniaturized space) and vertical FL (i.e., datasets share the same sample
satellites [211]), aerial networks (e.g., UAV communica- space but differ in the feature space). Hosseinalipour et al. [50]
tions [212]), and terrestrial communications (e.g., vehicular further proposed a fog learning framework by allowing
communications [213]) to provide ubiquitous connectivity for both vertical communications (i.e., model updates are only
various edge training architectures, as shown in Fig. 6 (e). exchanged across different network layers) and horizontal
Edge learning over a vehicle-to-everything network is critical communications (model updates can be exchanged between
to enable autonomous driving with delay-sensitive applica- devices in the same network layer). In a similar way, based on
tions [145]. In this scenario, the local model updates need to different computing collaboration schemes, we shall propose
be fast and reliably aggregated within neighbors via vehicle- to categorize edge inference as horizontal edge inference (i.e.,
to-vehicle communications [181], or to the roadside units via computation resources can only be harvested among edge
vehicle-to-infrastructure communication. In particular, radar devices, or only be pooled among edge servers), and vertical
sensing provides a promising way to predict the vehicular edge inference (i.e., computation resources can be harnessed
links [214] and holds the potential to provide real-time model between edge devices and edge servers), which are discussed
aggregation via predictive beamforming in the model aggre- in the following two subsections, respectively.
gation procedure. In the scenario with sparsely deployed edge
servers and moving edge devices (e.g., ground vehicles), UAV,
serving as the flying edge servers, can provide a promising A. Horizontal Edge Inference
solution to aggregate local model updates in the whole proce- We consider two different types of horizontal edge infer-
dure of edge training by joint UAV trajectory and transceivers ence, as shown in Fig. 7 (a) and Fig. 7 (b).
design over dynamic wireless edge networks [81]. 1) Edge Device Distributed Inference: Enormous efforts
To build a scalable edge training system with mas- on TinyML with DL model compression and neural network
sive devices participation for training extremely deep architecture search have been conducted to enable low-latency
Fig. 7. Communication-efficient edge inference systems.
and energy-efficient model inference on a single device with downlink design approach based on the interference alignment
limited storage and computation resources [220]. However, principle was developed in [52] to improve the communication
due to limited storage capability at edge devices, it becomes data rates for local intermediate values shuffling. In particular,
extremely difficult to accomplish inference computation tasks to compute the nomographic function [224] for edge device
at a single device, for applications such as mobile navigation distributed inference based on the MapReduce decomposition,
with a huge map information dataset [221]. Edge device a multi-layer hierarchical AirComp approach was proposed
distributed inference based on wireless MapReduce enjoys the in [225] to improve the spectral efficiency over the multi-hop
advantages of providing low-latency, high-accurate, scalable, D2D communication network, as shown in Fig. 7 (a).
and resilient services for edge devices without accessing the 2) Edge Server Cooperative Inference: DL with high-
cloud data center [12], [29]. Specifically, edge device dis- dimensional model parameters is able to provide high accurate
tributed inference involves computing the intermediate values intelligent services. However, it is challenging to directly
based on the local input datasets using the map function, deploy such large AI models on IoT devices due to very
followed by sharing the intermediate values via horizontal limited onboard computation, storage and energy resources.
communication among edge devices, thereby constructing the Deploying and executing DL models on edge servers turns
desired computation or inference results using the reduce out to be a promising solution. However, the limited wireless
function [52], [222]. bandwidth between edge devices and edge servers becomes
To tackle the communication bottleneck for shuffling inter- the key bottleneck [53], [54] for edge server cooperative
mediate values in the edge device distributed inference inference. Compressing and encoding the input source data at
process, a coded distributed computing approach [223] was edge devices are essential to reduce the uplink communication
adopted in [221] to improve the scalability of wireless overheads, for which various data dimensionality reduction
MapReduce by inducing the coded multicasting transmission approaches have been proposed by exploiting the specific com-
opportunities. This, however, sacrifices computation efficiency putation tasks and communication environments [29]. Besides,
as computation replication of the local dataset is needed. for the applications with high-dimension output inference
To further improve the spectral efficiency, instead of reducing results (e.g., the output of the NVIDIA’s AI system GauGAN
the volume of communication bits [221], a joint uplink and is a large-sized photorealistic landscape image), it is equally
important to design highly efficient downlink communication approach was proposed in [32] to limit the number of the
solutions for delivering the output inference results for the activated neurons at the last layer of neural network deployed
edge devices [53], [54]. at the edge device. However, the short message transmission
Computation replication has been shown to be effective for [37] and data amplification effect [229] of the output features
reducing the communication latency in computation offload- extracted by the on-device split model raise unique challenges
ing when the output size is large [226]. This is achieved to realize real-time vertical edge inference.
by executing each inference task at multiple edge servers, 2) Ultra-Reliable and Low-Latency Communication: The
followed by delivering the inference results for multiple packet length of the extracted output features transmitted from
edge devices via downlink cooperative transmission [227]. the edge devices can be very short [83], [145], for which the
Although edge server cooperative inference via downlink achievable data rate in such a finite block length regime is
transmission cooperation is able to significantly improve com- penalized by a non-vanishing decoding error probability [83].
munication efficiency by mitigating interference and alleviat- Besides, the output inference results from the edge server
ing channel uncertainties, it causes extra energy consumption should be delivered to the edge devices with latency and
to execute the same inference tasks at multiple edge servers. reliability guarantees for mission-critical applications. Consid-
To design a green edge server cooperative inference system, ering the system dynamics, including task arrival dynamics in
joint inference task selection and downlink coordinated beam- the network layer and the wireless channel dynamics in the
forming framework was proposed in [53] to minimize the physical layer, cross-layer optimization is needed to minimize
overall computation and communication energy consumption, the end-to-end delay for edge device-server co-inference [84],
as shown in Fig. 7 (b). RIS was further leveraged in [54] [85]. In particular, MDP supported by linear programming
to design green edge server cooperative inference systems was adopted in [85] to jointly schedule the transmission at
by considering both uplink and downlink transmit power edge devices and computation at the edge server for achieving
consumption. The rate splitting method is also anticipated the optimal power-latency tradeoff for edge device-server co-
to be able to further improve the energy-efficiency for edge inference via MEC. The random delay characteristics were
server cooperative inference by partially decoding the infer- also investigated in [230] by modeling the coupled transmis-
ence result and partially treating it as noise in a flexible sion and computation process as a discrete-time two-stage
way [228]. tandem queueing system. To support multiple edge devices
for uploading intermediate features using short packet trans-
B. Vertical Edge Inference mission, massive MIMO can be adopted to combat channel
We consider two different cases of vertical edge infer- fast fading and provide a nearly deterministic communication
ence, as shown in Fig. 7 (c) and Fig. 7 (d), with a sin- environment due to channel hardening [231]. The received
gle edge device and multiple edge devices, respectively. multiple intermediate features can be further aggregated via
In the following, we shall first present effective techniques the mixup augmentation technique [232] to enable scalable
for communication-efficient vertical edge inference for these and cooperative inference at the edge server, as shown in
two cases, and then present a new general design principle Fig. 7 (d).
for resource-constrained vertical edge inference, named task- 3) Task-Oriented Communication: As revealed in [32],
oriented communication. there exists an intrinsic communication-computation trade-off
1) Edge Device-Server Co-Inference: Edge device distrib- in resource-constrained vertical edge inference. This is mainly
uted inference enjoys low-latency whereas it has limited caused by the data amplification issue in DL based inference,
accuracy due to limited processing capabilities and limited namely the dimension of the intermediate feature may be
bandwidth. Although edge server cooperative inference is able larger than the input data size. Thus, if only a few layers of the
to achieve high accuracy with DL models, it may raise data neural network were deployed on the edge device, the output
leakage issue and excessive communication delay. It thus feature would have a size larger than the input data, yielding
becomes inapplicable for privacy-sensitive and delay-sensitive too much communication overhead. To reduce the intermediate
applications. To provide ubiquitous AI services across diver- feature size, more layers have to be deployed on the edge
sified application scenarios, edge device-server co-inference, device, which however will lead to high local computation
as a complementary solution to horizontal edge inference, burden. To resolve this tension between local computation
is promising to alleviate the communication overheads while and communication overhead, it is of critical importance to
achieving high accuracy and privacy for inferring the DNN effectively compress and transmit the intermediate feature.
models. This is achieved by dividing the DNN model into Such a communication task is fundamentally different from
a computational friendly segment at the edge device, and data-oriented communication in current wireless networks,
the remaining computational heavily segment at the edge i.e., to transmit a binary sequence at the highest data rate
server [32], as shown in Fig. 7 (c). By adaptively partitioning for reliable reconstruction at the receiver. In vertical edge
the computation burdens between the edge devices and edge inference, the feature transmission is for the inference task,
server, model split selection for the neural network is essential not for reconstructing the feature vector with high fidelity.
to achieve optimal computation-communication trade-off in Thus, as advocated in [32], we should rather design the com-
the vertical edge inference system via edge device-server syn- munication scheme for feature transmission in a task-oriented
ergy and collaboration [82]. To further reduce the communi- manner, i.e., only transmitting the informative messages for
cation overheads, a communication-aware model compression the downstream inference task at the edge server. Instead
of decoding the intermediate features, the received signal on the scheduled devices, local updates, aggregation behav-
corrupted by channel fading and noise is directly processed iors, network topologies, propagation environments, function
at the edge server to obtain the inference results. landscapes, and underlying algorithms. Specifically, for edge
This task-oriented communication principle constitutes a training systems via AirComp, the global model aggregation
paradigm shift for the communication system design from errors due to the wireless channel fading and noise will cause
data recovery to task accomplishment. It was first tested learning performance degradation [45], [87]. The optimality
in vertical edge inference via end-to-end training with joint gap (i.e., the distance between the current iterate and the
source-channel coding in [233], which helps to reduce both desired solution), characterized by the convergence behavior
the communication overhead and on-device computation cost. of the global iterate, can be further controlled by various
Such design principle has also been applied in other tasks. resource allocation schemes, including edge devices transmit
For example, the DL based end-to-end semantic commu- power control [41], [237], edge server receive beamform-
nication system was developed in [21] via joint semantic ing [31], [179], passive beamforming at RIS [48], [178],
source and wireless channel coding for recovering the meaning as well as device scheduling policy [31], [48]. For digital
of sentences instead of the original transmitted data sam- design of the edge training system, the optimality basically
ples. The analog JSCC approach was presented in [234] to depends on the edge devices selection, packet errors in the
compress and then code the feature vectors, followed by uplink transmission, and model parameter partition, for which
leveraging the received perturbed signal directly for wireless user scheduling [238], power control [88], batchsize selec-
image retrieval at the edge server via a fully-connected neural tion [239], aggregation frequency control [240], and bandwidth
network. Recently, a novel and generic design framework for allocation [241] were provided to improve the accuracy in the
task-oriented communication was developed in [20], which is edge training process.
based on the information bottleneck formulation [235]. This For edge inference, the accuracy indicates the quality of
framework provides a principled way to extract informative the inference results for a given task. It is typically measured
and concise representation from the intermediate feature, by the number of correct predictions from inference, e.g.,
which is made mathematically tractable via variational approx- the classification tasks. For computer vision applications in
imation. Furthermore, it has been extended to the cooperative autonomous driving, ultra-high accuracy for the DNN model
inference scenario with multiple edge devices in [86] based on inference is demanded. For applications in radio resource
distributed information bottleneck [236] and distributed source allocation via distributed ML, the accuracy of inferring a
coding theory. DNN model can be moderate. The accuracy of edge inference
In summary, this section presented interference coordina- depends on the difficulty of the tasks and datasets, the quality
tion techniques and task-oriented low-latency communication of the trained model, the dynamics of wireless communication
principles for horizontal edge inference and vertical edge infer- and edge computation environments, as well as the methods
ence, respectively. We hope this can motivate the co-design of for processing the models, datasets and features. In particular,
wireless communication networks and deep learning models for horizontal edge inference via AirComp aided wireless
to deliver low-latency, energy-efficient and trustworthy edge MapReduce, the accuracy for computing a nomographic func-
AI inference services. tion is fundamentally limited by the channel fading and noise,
for which various transceivers were designed to minimize the
IV. R ESOURCE A LLOCATION FOR E DGE AI S YSTEMS mean square error for inference computation tasks [225]. The
In this section, we shall characterize the engineering require- accuracy of vertical edge inference depends on the informa-
ments for designing communication-efficient edge AI sys- tiveness and reliability of the intermediate features transmitted
tems, including accuracy, latency, energy, privacy and security. from edge devices, as well as the dynamic wireless environ-
Effective service-driven resource allocation methods based ments, for which an ultra-reliable communication and adaptive
on mathematical programming and ML are then provided to JSCC approach need to be developed to improve the inference
achieve scalability and trustworthiness for edge AI systems. performance. In particular, information bottleneck was adopted
in [20] to characterize the relationship between the accuracy
of the vertical edge inference and the communication overhead
A. Engineering Requirements and Methodologies of the intermediate features.
We identify the engineering requirements for designing 2) Latency: For edge training, the latency consists of com-
scalable and trustworthy edge AI systems. Resource alloca- putation latency and communication latency. The computation
tion strategies must cater to the needs of edge AI systems latency highly depends on the computation capability of the
for achieving accurate intelligence distillation into the edge edge devices and servers, as well as the size of the models
network at an ultra-low power and low-latency cost. and datasets. The communication latency is the sum of the
1) Accuracy: The edge training process involves designing transmission latency of one round with respective to the
the global iterates θ [t] with t as the iteration index to minimize total learning rounds until convergence for training the global
the empirical loss function while achieving fast convergence model. In one typical training round, the communication delay
rates with negligible optimality gap for problem (1). To design in the uplink and downlink transmissions for model updates,
efficient resource allocation schemes in edge training systems, is mainly affected by the wireless communication techniques,
it is particularly important to characterize the convergence bandwidth and power budgets, wireless channel conditions,
behaviors for the global iterates θ [t] , which typically depend as well as the scheduled edge devices. Li et al. characterized
the delay distribution for FL over arbitrary fading channels via power transfer approach was further adopted in [248] to
the saddle point approximation method and large deviation power the edge devices for local model computation and
theory [89]. The trade-off between the convergence speed communication, for which the active devices with enough
and the per-round latency was revealed in [242] based on harvested energy will contribute to accelerate the learning
the key observation that more scheduled devices yield faster procedure. To deploy AirComp-assisted FL across massive
convergence rate while prolonging the time of uploading the IoT devices with a limited battery capability, microwave based
local updates at each iteration due to limited radio resources. wireless power transfer supported by RIS was adopted in [91]
A probabilistic device scheduling policy was further proposed to recharge the IoT devices via energy beamforming at edge
in [90], [243] to minimize the overall training time in wireless server and passive beamforming at RIS.
FL. Besides, the trade-off between the local computation In the case of edge inference, it becomes particularly
rounds for local model updates and the global communication important to achieve high energy efficiency for processing
rounds for global model updates is characterized to guide the DNN models at the network edge with battery-limited
the resource allocation for minimizing the total learning time devices. The energy consumption of executing a DNN model
and energy consumption [244]. The convergence speeds of is highly dictated by the computation architecture and methods
FL algorithms were characterized in [245] by considering (e.g., ultra-low power compute-in-memory AI accelerator) at
non-identical dataset distributions, partial edge devices par- the edge computation nodes [249], the architecture of DNN
ticipation, and quantized model updates in both uplink and models [250], and the wireless transmission for data exchange
downlink communications. during the model inference procedure. For horizontal edge
In the case of edge inference, the latency measures the time inference via wireless cooperative transmission at multiple
between the data arrival to the generation of the inference edge servers, the sum of the computation and transmission
results through the edge AI system. It consists of the data power consumption for generating and delivering the inference
pre-processing, data transmission, model inference, and result results were minimized via downlink coordinated beamform-
post-processing, which highly depend on the computation ing [53]. Energy consumption at the edge devices can be
hardware, communication schemes, DL models and tasks. For minimized in the cross-layer design for delay-sensitive edge
the real-time mobile computer vision application of AR/VR, device-server co-inference by computation offloading [85].
stringent latency requirements are required, e.g., 100ms. For Besides, energy harvesting becomes a promising technology
scalable radio resource allocation application via DL, the for the edge computing based vertical edge inference by
inference latency must be within the channel coherence time providing renewable energy resources for edge devices [251].
(e.g., 10ms) to yield a meaningful resource allocation deci- 4) Trustworthiness: Trustworthiness is one of the main
sion [23]. A low-rank matrix optimization based transceiver drivers for developing the next generation AI technologies.
design approach was proposed in [52] for fast shuffling Specifically, the developed AI models and algorithms must be
intermediate values in wireless distributed computing, thereby privacy-preserving, adversarial-resilient, robust, fair, optimal
reducing the latency for horizontal edge inference via edge and interpretable [95]. For edge training, privacy mainly
devices collaboration. For vertical edge inference, the dynamic depends on the offloading or coding of the raw data and
computation partition and early existing scheme was proposed intermediate features. Keeping datasets at devices is a direct
in [82] to accelerate the inference speed via edge device- and effective way to preserve user’s privacy in FL. Besides,
server synergy. The cross-layer design approach was adopted the wireless channel noise yields a noisy model aggregation
in [85] to reduce the communication and computation latency procedure via AirComp, which provides an inherent privacy-
for the time-sensitive edge inference computing applications. preserving mechanism to enhance differential-privacy for each
In particular, the DL enabled task-oriented communica- edge device. An adaptive power control method was further
tion framework was developed to achieve low-latency edge developed in [41] to control the differential-privacy levels
device-server co-inference by merging feature compression, in this over-the-air FL system, while avoiding the learning
source coding and channel coding for the specific inference performance degradation. To address the adversarial attacks,
tasks [20], [234]. the blockchain based decentralized learning was proposed
3) Energy: For edge training, the energy consumption con- in [252] to enable secure global model aggregation by using
sists of the computation and communication process. For a consensus mechanism of blockchain. The block generation
AlphaGo, it may cost 280 GPUs and a $3000 electric bill per rate was optimized by considering the communication, compu-
game [246]. It is therefore critical to design energy-efficient tation and consensus delays in the blockchain enabled secure
edge training systems to minimize carbon dioxide footprint edge learning systems [252], [253]. For edge inference, privacy
for contributing the carbon neutrality target. Such a design and security are mainly dictated by the way of processing the
is mainly dictated by the size of training models, model input data, of transmitting the inference results, as well as the
training algorithms, and wireless transmission strategies and computation methods for model inference (e.g., secure multi-
hardware (e.g., the scaled SiGe bipolar technology [247]), and party computation).
edge computing architectures and hardware. Both computation Establishing optimality for ML algorithms is important
energy consumption for local model updates and communi- to deliver reliable and responsible AI services. However,
cation energy consumption for uploading local updates are empirical risk minimization for training the models is usu-
simultaneously minimized in [92] by considering the learning ally nonconvex, which poses significant challenges to guar-
latency and accuracy constraints for wireless FL. The wireless antee global optimality for the learning algorithms and
models [180]. Fortunately, under the high-dimensional sta-

tistical setting, the local strong convexity and smoothness
of the nonconvex loss functions can be exploited to tame
the nonconvexity for various learning models, e.g., blind
demixing [189], phase retrieval [254], and shallow neural
networks [255]. Besides, with high-dimensional datasets, the
nonconvex loss functions of certain statistical learning mod-
els, including over-parameterized neural networks [180] and
dictionary learning [256], can enjoy benign global geometric
landscape such that all the local minima are global minima,
and all the saddle points can be escaped efficiently using
the algorithms including trust region method and perturbed
gradient descent method [257]. In particular, for edge training,
the channel noise yields a perturbed stochastic gradient descent
method to escape saddle points for distributed principal com- Fig. 8. Resource allocation optimization methods for edge AI systems.
ponent analysis via AirComp [94]. Therefore, channel noise
can provide a mechanism for both preserving differential
privacy [41] and achieving global optimality [94]. These the objective function (e.g., optimality gap in edge training),
evidences indicate that we should embrace channel fading and gi : Rn → R, i = 1, . . . , m are the inequality constraint func-
noise for achieving trustworthy edge AI. tions (e.g., latency requirements in edge inference), and hi :
5) Service-Driven Resource Orchestration: Edge AI Rn → R, i = 1, . . . , p are the equality constraint functions.
systems need to incorporate various wireless network The resource allocation optimization problems are typically
architectures and communication strategies by integrating categorized as mixed-combinatorial optimization, nonconvex
communication and computation. This will result in a continuous optimization, stochastic optimization, and end-
highly complex and dynamic network, which requires to-end optimization. To provide scalable, real-time, paral-
innovative technologies and solutions. Various use cases (e.g., lel, distributed and automatic resource allocation schemes,
autonomous driving, industrial IoT, and smart healthcare) and we shall propose to exploit the landscape of the underling
heterogeneous requirements in terms of accuracy, latency, optimization problems (2) by the theory-driven method based
energy and trustworthiness, would further aggravate the on mathematical programming, followed by developing the
complexity for resource allocation in edge AI systems. novel data-driven approach based on machine learning to
Besides, the complex edge servers and base stations will be achieve real-time and distributed implementations, as well as
quite energy-consuming, which brings formidable challenges improved and robust performance, as shown in Fig. 8. Here,
for achieving high energy efficiency. To enable efficient ψ(α) is a mapping function to map the problem parameter α
resource allocation, it is thus critical to precisely model the to the optimal solution of problem (2).
heterogeneous demands for edge AI services, and reversely 1) Mixed-Combinatorial Optimization: The resource allo-
matching them with proper network resource orchestration. cation problems in edge AI systems involve optimizing
This, however, relies on the quantitative relationship between across learning, computation and communication. Specifi-
network resources and user requirements for edge AI tasks. cally, for edge training systems, we need to jointly optimize
To pave the way for this paradigm shift for service-driven the subcarrier and bandwidth allocation [88], [90], [241],
resource allocation in edge AI systems, in the next subsection, transmit power and receive beamforming [31], [48], [178],
we shall provide various intelligent optimization models and passive beamforming at RIS [48], [178], device selection [31],
algorithms to adapt to diversified network environments and [242] and activity detection [76], local updates computa-
services. tion [92], and global aggregation frequency control [130],
thereby reducing the optimality gap and energy consumption
in the distributed learning procedure. For edge inference
B. Optimization Models and Algorithms
via collaboration among edge servers, task selection, coor-
The service-driven network resource management problems dinated downlink beamforming among edge servers, as well
for edge AI systems can be classified as a parametric family as passive beamforming at RISs were jointly optimized to
of mathematical optimization problems: achieve green edge inference [53], [54]. All of these resource
allocation schemes can be formulated as a mixed combina-
minimize f0 (z; α) torial optimization problem, which needs to jointly optimize
z
subject to gi (z; α) ≤ 0, i = 1, . . . , m, continuous-valued variables (e.g., beamforming and power
hi (z; α) = 0, i = 1, . . . , p, (2) control) and discrete-valued variables (e.g., device selection
and subcarrier allocation). In particular, sparse optimization
where z ∈ Rn is the optimization variable vector consisting provides a powerful modeling approach to solve the mixed
of both discrete and continuous variables, α ∈ A is the combinatorial resource allocation problems by exploiting the
problem parameter vector with A denoted as the parameter sparsity structures in the optimal solutions [96]. For instance,
space (e.g., CSI). For each fixed α ∈ A, f0 : Rn → R is the group sparsity can represent the combinatorial variables
for edge devices selection in FL [31], edge devices activ- stuffing technique for fast conic program modeling in the first
ity detection [76], and inference tasks selection [53]. The stage, and operator splitting method for scalable conic program
algorithmic advantages of the sparse optimization modeling solving in the second stage [98]. Although the semidefinite
approach are supported by various convex relaxation algo- relaxation approach [264] is able to convexify the general
rithms [198], [258], e.g., mixed 1 /2 -norm minimization [80]. quadratic programs by matrix lifting and dropping the resulting
A typical sparse and low-rank optimization modeling and rank constraints, it fails to return high quality solutions in
algorithmic framework was developed in [31] to support the the high-dimensional settings. This issue was addressed by
joint device selection and transceiver design for improving a difference-of-convex-functions (DC) programme [31], [52],
learning performance in over-the-air FL systems. [99] by representing the rank function via an equivalent DC
Although operation research provides a theory-driven function. This DC optimization modeling and algorithmic
approach for solving the mixed combinatorial optimization framework was typically applied to solve the nonconvex
problem or its equivalent sparse optimization problem, the passive beamforming problem in the RIS-empowered FL
existing algorithms are either heuristic with noticeable per- systems [48] and edge inference systems [54]. To solve the
formance loss or optimal with intolerably high computation large-scale rank constrained matrix optimization problems,
complexity. To address these challenges, “learning to opti- Riemannian manifold optimization was proposed to optimize
mize” provides a data-driven design paradigm to improve the such nonconvex programs directly by exploiting the manifold
computation efficiency and system performance for resource geometric structures of fixed-rank matrices [100], [101].
allocation [114], [259]. This is achieved by developing compu- To further enable real-time, automatic and distributed design
tationally efficient optimization methods by learning from the of nonconvex optimization algorithms for resource allocation
sampled problem instances using training models and methods. in edge AI systems, DL was shown to have great potentials
The learned algorithms can be furthered executed online for achieving this goal. A multi-layer perceptron was adopted
and distributed for real-time resource allocations in edge AI in [265] to directly learn the mapping from the problem
systems. To solve the large-scale mixed combinatorial opti- instance to the output solution generated by the weighted
mization problem efficiently, imitation learning was adopted minimum mean square error (WMMSE) algorithm for non-
in [22] to learn an aggressive pruning policy in the globally convex precoding design [266]. Instead of running the iterates,
optimal-achieving branch-and-bound algorithm. This learning the learned algorithm via deep learning can be executed in
based brand-and-bound method can significantly save the time real-time, as neural networks only involve computationally
for pruning the nodes in the search tree, achieve near-optimal cheap operations, e.g., matrix-vector multiplication. To reduce
performance with few training samples, as well as guarantee the model and sample complexity, as well as improve the
feasibility of constraints without performance degradation. performance and interpretability, unfolded neural networks
To further speed up the sparse optimization method for the were developed in [267]–[269] to parameterize the iterative
mixed combinatorial optimization problem in edge device policy via unfolding one iteration of the existing structured
activity detection, the DNN based algorithm unrolling frame- algorithm into one layer of a neural network. Graph neural
work was developed in [97] to achieve theoretical guarantees, network (GNN) has recently been shown to be able to harness
performance improvements, interpretability and robustness for the benefits of generalizability, interpretability, robustness,
the learned sparse optimization algorithms [186]. This is scalability, superior performance, real-time and distributed
achieved by mapping the theory-driven iterate operations, i.e., implementation for learning to optimize nonconvex problems,
iterative shrinkage thresholding algorithm, into an unrolled including power control [270], beamforming [23], and phase
recurrent neural network, followed by training the model para- shift design [25]. This is achieved by modeling wireless
meters based on supervised learning. Besides, a multi-agent network as a graph, followed by using a GNN to parameterize
RL approach was developed in [260] to solve the distributed the mapping function ψ(α) for the optimal solution.
mixed combinatorial optimization problem for task offloading 3) Stochastic Optimization: In large-scale edge AI systems,
and resource allocation in multi-layer edge inference systems. the estimated CSI will be inevitably imperfect or partially
2) Nonconvex Optimization: Most of the resource allocation available [177], [271]. It is thus critical to design prac-
problems in edge AI need to solve a series of nonconvex tical resource allocation schemes by considering the CSI
optimization problems, e.g., nonconvex sparse optimization uncertainty, for which robust optimization and stochastic
for device selection in wireless FL, nonconvex quadratic pro- optimization are two typical approaches. Specifically, robust
gramming for transceiver design in over-the-air FL [31], low- optimization approach aims at guaranteeing the worst-case
rank matrix optimization for interference management in edge but conservative performance over the uncertainty set. The
device distributed inference [52], and unit modulus constrained robust optimization method can usually yield computationally
phase shifts optimization [261] in RIS-empowered edge AI tractable optimization models [102]. The stochastic optimiza-
systems [48], [54]. Convex approximation provides a natural tion approach, e.g., chance constrained programming, only
way to design polynomial time complexity algorithms for relies on the probabilistic description of the uncertainty of
nonconvex programs based on the principle of majorization- the problem parameter α in problem (2) and is able to
minimization [262] or successive convex approximation [263]. provide a trade-off between conservativeness and probabilistic
A two-stage framework was provided in [98] for solving guarantees for the achievable performance [272]. In particular,
general large-scale convex programs with infeasibility detec- a statistical learning approach was presented in [53] to learn a
tion and scalable computation. This is achieved by matrix tractable uncertainty set to approximate the chance constrained
programming for achieving high computation efficiency and activity sparsity [185], were exploited to reduce the training
system performance in the energy-efficient edge inference overhead.
systems. However, due to the limited historical samples, it is However, all of the above works follow the “estimate-
difficult to characterize the true probability distribution for then-optimize” framework by first performing pilots-based
the CSI uncertainty. Distributionally robust optimization [273] channel estimation, followed by allocating resources based on
provides a promising way to achieve worst-case probabilistic the estimated CSI. However, this two-stage approach fails to
performance by incorporating all sample-generating distrib- achieve a low signaling overhead and superior system per-
utions into an ambiguity set. However, finding the globally formance. Although the low-dimensional structures have been
optimal solution for this method is often computationally exploited for designing efficient channel estimation methods,
intractable. the additional information (e.g., user location and mobility),
DL provides an alternative way to address the uncertainty are difficult to be modelled and incorporated into a unified
and dynamics of environment parameters to achieve modeling mathematical model for CSI acquisition overhead reduction,
flexibility and computational efficiency for resource allocation which may exceed latency. Besides, the artificially defined
in the complicated edge AI systems. Specifically, DL can criterion (e.g., mean square error) for channel estimation may
provide acceptable performance for resource allocation based not be aligned with the ultimate goal for resource allocation in
only on geographic locations information of the transmitters edge AI systems. To address this challenge, a DL approach has
and receivers [274]. By considering CSI variations [55], [275], recently been proposed to merge the two stages into an “end-
[276] and stochastic task arrivals [84], [85], the dynamic to-end optimization” framework for resource allocation [104].
communication and computation resource allocation problem This is achieved by directly mapping the received pilots (i.e.,
can be formulated as a MDP, for which deep RL, a model- the problem parameters α in (2) can be the received pilots) into
free approach, can provide efficient and robust solutions [73]. the resource allocation policy without explicit channel estima-
Besides, the learned algorithms can be distributively exe- tion. This mapping function is further parameterized by a DNN
cuted in the multi-agent edge AI systems. However, due to capture the inherent structures of the resource allocation
to the distribution shift for system parameters in episodi- problems. For instance, the GNN was adopted to model the
cally dynamic environment, the trained model may suffer permutation invariant and equivalent properties of the mapping
from performance deterioration when the dataset follows a function for resource allocation in the RIS empowered TDD
different distribution in the inference stage [22]. Transfer wireless networks [25]. The neural calibration approach [286]
learning [103] and continual learning [277] have recently been was developed in FDD massive MIMO systems to map the
adopted to address such task mismatch issue in the “learning received pilots at edge devices into feedback bits, followed
to optimize” framework considering the system distribution by directly mapping the feedback bits into the downlink
dynamics. beamformers [104].
4) End-to-End Optimization: Channel estimation plays In summary, this section presented the operation research
a pivotal role to support effective resource allocation in based theory-driven and ML based data-driven methods for
large-scale edge AI systems [198], [206]. In particular, designing effective, real-time, distributed and robust resource
exploiting the low-dimensional structures of wireless channels allocation strategies in edge AI systems. We hope these results
becomes a promising way to address the curse of dimen- can stimulate more service-driven resource allocation methods
sionality for CSI acquisition in various networks. Specifi- (e.g., network slicing [287]) and optimization approaches (e.g.,
cally, in ultra-dense Cloud-RAN, a high-dimensional struc- multi-objective optimization [288]). The presented “learning to
tured channel estimation framework was proposed in [278] optimize” framework is also promising for resource allocation
by inducing the spatial sparsity and temporal correlation prior in various future wireless networks.
information using a convex regularizer. Sparsity structures of
a massive MIMO channel was exploited in [279] to reduce V. A RCHITECTURE FOR E DGE AI S YSTEMS
the training overheads for CSI acquisition. The signal super- In this section, we present a new mobile network architec-
position property of a wireless multiple access channel was ture for edge AI systems, supported by the wireless network
exploited to directly obtain the weighted sum of channels infrastructures in Section II and Section III, as well as the
for receive beamformer design, thereby avoiding global CSI service-driven resource allocations in Section IV. We will
estimation [280]. The sparsity in the activity pattern was provide an end-to-end (E2E) architecture design across the
leveraged to develop the sparse signal processing framework network infrastructure, data governance, network function,
for joint activity detection and channel estimation in grant-free network management, as well as operations and applications.
massive access [75]. Due to the passive nature of RIS,
it becomes infeasible to directly perform signal processing
for channel estimation at RIS and the cascaded channel A. End-to-End Architecture for Edge AI Systems
can only be estimated either at the edge servers or edge For each new generation of mobile networks, new services
devices [206]. To address this unique challenge, the common and capabilities have been introduced at the architecture level
reflective channels among all edge devices [281], quasi-static in order to meet more and typically more stringent demands.
property between RIS and edge server channel links [282], The mobile network was originally designed to deliver voice
[283], spatial features of noisy channels and additive nature services. Since then, both the architecture and deployment of
of noises [284], as well as channel sparsity [285] and device mobile networks have followed a centralized and hierarchical
Fig. 9. E2E architecture for edge AI systems with radio computing nodes (RCN) to allow seamless integration of communication and computing capabilities.
New independent computing planes (CmP) in RCN will also be used to host AI tasks and collaborate with communication functions in the control & user
planes (CP and UP).
paradigm that reflects the nature of voice traffic and packet efficient data governance framework at the architecture level.
traffic of the mobile internet. To realize the vision of “con- Data governance goes far beyond conventional data collection
nected intelligence”, 6G will break and shift these traditional and storage, which will also consider the data availability
paradigms towards a novel architecture and design that meet and quality, data sovereignty, knowledge management and
new requirements for the deep integration of communication, legal implication. Data governance also must consider the
AI, computing, and sensing at the network edge with new mechanism to comply with the regional or national data
integrated capabilities empowered by evolutionary, as well as, protection policies and regulations of the data source in terms
revolutionary enabling technologies. of usage rights and obligations such as GDPR.
Under this new design philosophy, we introduce a holistic 1) Independent Data Plane: 5G has introduced a new
E2E architecture for scalable and trustworthy 6G edge AI network data analytics function (NWDAF) in the core network
systems, as illustrated in Fig. 9. By providing new wireless to implement AI-based network automation, optimize the
network infrastructures, enabling efficient data governance, related network functions (e.g., AI-based mobility manage-
integrating communication and computation at the network ment [105]), and improve user service experience, etc. One
edge, as well as performing automated and scalable edge AI of its main goals is to collect and analyze data from other
management and orchestration, the proposed E2E architec- 5G network elements to train AI models and implement AI
ture will provide a scalable and flexible platform to support inference for automated and scalable network optimization.
diversified edge AI applications with heterogeneous service Meanwhile, similar mechanisms such as collecting and ana-
requirements. lyzing data based on the existing SON/MDT (self-organizing
networks and minimization of drive tests), was adopted for
B. Data Governance 5G radio access networks (RAN). In 6G, such a separated
Due to the expected huge energy consumption, as well data collection and analytics mechanism needs to evolve to
as, security and privacy concerns, we envisioned that data in a unified and more efficient paradigm. An independent data
future 6G networks need to be collected, processed, stored and plane in 6G could contribute to organizing and managing data
consumed at the network edge. Since data and AI applications efficiently while also considering privacy protection [28]. This
in 6G are expected to be much more diverse than ever before, paves the way for natively embedding edge AI into the 6G
it is incentive that there will be a provision for a unified and networks by leveraging multi-domain data.
2) Multi-Player Roles: The data governance ecosystem In summary, this section presented the edge AI system
includes different roles: data customer, data provider, data architecture from an E2E perspective detailing its network
owner and data steward, etc. These could be taken by the infrastructure, data governance, edge network function, as well
same or different business entities, including individual users. as edge AI management and orchestration. The standardization
Hence, data governance is a typical scenario that involves efforts, hardware and software platform, and application sce-
multiple players. It thus becomes essential to establish a narios will be further discussed in the next section. We hope
multi-party data trading platform to negotiate data rights this novel E2E architecture can stimulate more innovative
and prices among different business entities while achieving and out-of-the-box ideas for the evolution of edge AI system
trustworthiness, fairness and efficiency. This can be achieved architectures.
using decentralized technologies such as blockchain with
smart contracts design [28], [106]. This will improve data VI. S TANDARDIZATIONS , P LATFORMS ,
efficiency and business ecosystem for the deployment of AND A PPLICATIONS
edge AI. In this section, we will first discuss the standardiza-
tion for edge learning models and algorithms, as well as
C. Deeply Converged Communication and Computing at the integrated computing functionalities at the network edge.
Edge The research-oriented and production-oriented platforms are
In 5G, the superior performance has been achieved by then provided, including distributed optimization based FL
leveraging the AI capability into RAN [289], [290]. For software, large-scale optimization based resource allocation
instance, we can optimize radio resource scheduling and solvers, as well as edge AI computing and communicat-
mitigate interference using machine learning methods [23]. ing hardware. To accelerate commercialization for edge AI,
Such utilization of AI in 5G can be referred to as AI for the application scenarios are also investigated, including
networks. The targets of edge AI are not only AI for networks, autonomous driving, industrial IoT, and smart healthcare.
but also networks for AI [12], as presented in Section II and
Section III. This will depend on the new functional capabilities A. Standardizations
of future networks, including how to make computing as a
The standardization of 6G will not be limited to the com-
foundational capability of future 6G networks. A new type
munications part, but also to the deep integration of commu-
of radio equipment may emerge, which we refer to as a radio
nications, intelligence, and computing. The 3rd Generation
computing node (RCN), which allows the computing resources
Partnership Project (3GPP) may start an overall study into
to be seamlessly converged with the communication capability.
6G systems around the end of 2025 (3GPP Release 20), while
This will require the introduction of a new independent com-
starting research into technical specifications around the end of
puting plane (CmP) in RCN to host AI tasks and collaborate
2027 [28]. In this subsection, we will introduce the standard-
with the communication functions in the control plane (CP)
izations on trustworthy edge learning models and algorithms,
and user plane (UP) [28]. This will also enable the flexible
as well as wireless computing functionalities implemented in
integration of computation, communication and intelligence
digital or analog communication systems.
for edge AI.
1) Learning: The first technical standard for FL was
approved on March 2021 as IEEE 3652.1-2020 [107], IEEE
D. Edge AI Management and Orchestration Guide for Architectural Framework and Application of Fed-
Edge AI involves a diverse set of learning models and erated Machine Learning. This IEEE standard for FL is
algorithms, network infrastructures, as well as complicated developed by the Learning Technology Standards Committee
collaboration for communication, computation and intelli- of the IEEE Computer Society with participants from the
gence. Developing a framework for edge AI management and shared machine learning working group, including 4Paradigm,
orchestration thus becomes an essential aspect for the design of AI Singapore, Alipay, Huawei, JD iCity, Tencent, WeBank,
the native AI support at the architecture level. This framework and Xiaomi, etc. Specifically, the IEEE 3652.1-2020 standard
needs to be designed so as to facilitate the seamless integration provides the guidelines for architectures and categories of
and deployment of AI services, especially from third-parties. FL from the perspectives of data, user and system, followed
This can be achieved by planning, deploying, maintaining, by identifying the associated application scenarios, perfor-
and optimizing the decentralized machine learning models mance evaluations, and regulatory requirements. Standardiza-
and algorithms, as well as the edge network infrastructures tion plays a vital role in creating a private and secure FL
and functions. The edge AI management and orchestration ecosystems at large-scale to provide consumer products and
shall also include AI workflow, distributed and streaming services in the market. Besides, various standards for data
data, along with heterogeneous network resources, etc. Scale privacy and security have been developed by the information
and cross-domain issues will be huge challenges for such a security, cybersecurity and privacy protection technical com-
framework and this may involve complicated standardization mittee from the International Organization for Standardiza-
efforts. Hence, building such new framework which will fully tion (IOS) and the International Electrotechnical Commission
rely on standardization may not be feasible. We may instead (IEC). For example, ISO/IEC TS 27570 [291], Privacy Protec-
leverage the open-source approach [28] to commercialize some tion - Privacy Guidelines for Smart Cities, provides guidelines
of the components in this framework. and recommendations for the management of privacy and the
usage of standards. ISO/IEC DIS 27400 [292], Guidelines for accelerate research progress and facilitate algorithmic inno-
Security and Privacy in Internet of Things (IoT), provides vation and performance comparison in realistic FL environ-
guidance for principles and controls to provide private and ments, FedML [109], a research-oriented open FL library,
secure IoT systems, services and solutions. The technical com- has recently been established to support diverse FL comput-
mittee on cybersecurity of the European Telecommunications ing environments and topological architectures with standard-
Standards Institute (ETSI) has recently unveiled ETSI EN 303 ized FL algorithm implementations and benchmarks. As a
645 [293], Cyber Security for Consumer Internet of Things: production-oriented software project, FATE [110] has been
Baseline Requirements, to provide cybersecurity standard and developed in the Webank’s AI Department for financial indus-
baseline for IoT consumer products and certification schemes. try by supporting various secure computing protocols and
All these standards are applicable for developing private and FL architectures. Besides, existing edge computing frame-
secure edge AI models and algorithms to provide trustworthy works (e.g., Baidu “Baetyl” and Huawei “KubeEdge”) provide
products and services. promising solutions to deliver edge AI services. For edge AI
2) Computing: The computing functionality can be imple- empowered IoT applications, Microsoft “Azure IoT Edge”,
mented in wireless networks by either digital modulation or Google “Cloud IoT”, Amazon “Web Services (AWS) IoT”
analog modulation. Specifically, MEC provides a promising and NVIDIA “EGX” provide edge AI platform to bring
solution for deploying edge AI systems in current wireless real-time AI services across a wide range of applications,
systems with digital modulation [115]. The standardization including smart retail, home, manufacturing, and healthcare.
activities on MEC thus pave a way to integrate edge AI into Huawei has recently released a next-generation operating
mobile networks at a maturity level. Specifically, the ETSI system, HarmonyOS [111], to enable seamless collaboration
ISG MEC (Industry Specification Group for Multi-access and interconnection among smart edge devices across diverse
Edge Computing) has established a standardized and open platforms. This empowers connected intelligence by deploying
ecosystem for both edge-aware and edge-unaware applications edge AI in the operating systems.
at the network edge. It has published a set of white papers 2) Solver: Resource allocations for edge AI systems and
and specifications covering across user equipment applica- wireless networks are booming through the development
tion, service application, as well as management, mobil- of various large-scale optimization models and algorithms.
ity, and orchestration related application programming inter- General-purpose large-scale optimization software solvers are
faces (APIs). Besides, 3GPP 5G specifications define the important to enable rapid prototyping and deploying resource
key enablers and architectures for edge computing to allow allocation optimization algorithms for edge AI systems.
traffic routing, policy control, and network management for Specifically, CVX [112] provides a two-stage software frame-
collaboration in a MEC system and a 5G system [294]. work for modeling and solving general large-scale convex opti-
The collaboration between two independent systems of MEC mization problems. This is achieved by automatically trans-
and 5G can be further optimized in 6G, where communi- forming the original problem instances into standard conic
cation and computing can be converged into one system programming forms, followed by calling the advanced off-the-
by adding a computing plane [28]. In particular, ETSI ISG shelf conic solvers, e.g., MOSEK [296] and SCS [113]. To fur-
MEC has recently developed a synergized mobile edge cloud ther speed up the modeling phase and avoid repeatedly parsing
architecture by leveraging and harmonizing the existing and and re-generating conic forms, a matrix stuffing technique was
ongoing standards (including 3GPP, ETSI ISG MEC, GSMA, presented in [98] to generate the mapping function between the
and 5GAA) [108]. Although 5G is rolled out globally, the original problem and the conic form in a symbolic way instead
modern mobile systems are widely deployed based on digital of the time-consuming numerical way using CVX. It is thus
modulation instead of analog modulation [295]. To support particularly interesting to develop a solver to automatically
analog communication based AirComp for edge training in generate the mapping functions for conic transformation in a
current wireless networks [6], one may either directly lever- symbolic forms. Besides, Gurobi [297] and MOSEK [296] are
age the existing digital modulator with quantized analog among the fastest solvers for solving the general mixed-integer
signals or introduce an additional analog modulator with a second-order conic programs. Chen et al. recently released the
matched filter for decoding the received signals [37]. It is software package “Open-L2O” [114] to implement the “learn-
obvious that more efforts are needed to incorporate AirComp ing to optimize” framework for benchmarking performance
functionalities into the future 6G standards to mature edge fairly and designing algorithm automatically.
AI systems. 3) Hardware: The achievable performance and benefits of
edge AI systems are conditioned upon the availability of
B. Platforms edge AI computing hardware and radio frequency (RF) hard-
We present the software and hardware platforms for deploy- ware technologies. Specifically, edge AI computing hardware
ing edge AI models and algorithms, as well as the optimization can be categorized as graphic processing unit (GPU)-based
solvers for resource allocation in edge AI systems. hardware (e.g., NVIDIA’s GPUs), field programmable gate
1) Software: There is a rapidly growing body of soft- array (FPGA)-based hardware (e.g., Xilinx’s SDSoC), and
ware platforms for simulations and productization of edge application specific integrated circuit (ASIC)-based hardware
AI algorithms and models. FL library, TensorFlow Feder- (e.g., Google’s TPU). The detailed comparisons for various
ated, Leaf, and PySyft have provided excellent open software edge AI computing hardware can be found in [115]. In partic-
frameworks for FL simulations and evaluations. To further ular, the chip design procedure for edge AI hardware can be
significantly accelerated by the recent proposal of deep RL localization for autonomous driving. The edge server cooper-
assisted fast chip floorplanning [298]. Besides, the massive ative inference method in Section III-A.2 can be adopted to
broadband connectivity requirements for edge AI systems reduce the storage and communication overheads for updating
motivate the innovations in RF hardware technologies. The the HD map by collecting fresh data from the vehicles in
benefits of RIS-empowered FL systems highly depend on the dynamic environments [117]. SLAM comprises simultane-
the capabilities of manipulating electromagnetic waves at ously estimating the state of a vehicle and constructing a map
the metasurfaces [49], whose reconfigurability is typically of the environment [304], which paves the way for achieving
enabled by switches, tunable material, topological metasur- full autonomy in autonomous driving [305]. Edge SLAM
faces, and hybrid metasurfaces [299]. THz communication [56], [57] has recently been developed to execute DL based
with frequency band 0.1-10 THz, is envisioned as a promising visual SLAM algorithms on edge vehicles. This is achieved by
enabler for achieving sensing, communication, and learning deploying the tracking computation parts on the edge vehicles
in an integrated edge AI system. To approach this THz while offloading the remaining parts (e.g., local mapping and
region, RF hardware technologies and solutions were thor- loop closure) to the roadside edge server via vertical edge
oughly investigated in [116], including semiconductor circuits, inference in Section III-B.
antenna forms, packaging and testing of transceivers.
2) Internet of Things: Artificial Intelligence of Things
C. Applications (AIoT) leverages AI technologies and IoT infrastructures
We discuss edge AI enabled application scenarios by inspir- to improve the human-machine interactions and enable
ing new communication algorithms, resource allocation opti- multi-agent communications and collaborations. AIoT goes
mization algorithms, as well as data processing methods. beyond the conventional communication paradigm for audio,
1) Autonomous Driving: Autonomous driving basically video and data delivery. It will enable semantic communica-
refers to self-driving vehicles that move without the interven- tion [58] to exchange semantic information among agents.
tion of human drivers. Self-driving vehicle integrates various Shannon and Weaver categorize communication into three
innovative technologies, including advanced sensor technolo- levels, including transmission level (i.e., transmit symbols
gies, new energy automobiles, next generation AI technolo- accurately), semantic level (i.e., convey the desired meanings
gies, as well as future vehicular networks. Autonomous driving precisely), and effectiveness level (i.e., produce the desired
can significantly improve the safety, passenger comfort, travel actions effectively) [306]. Sematic communication is able to
and logistics efficiency, collision avoidance, and energy effi- significantly improve the communication efficiency by only
ciency. Edge AI shall provide a pivotal role for achieving transmitting the extracted relevant information for sematic
ultra-low latency communication, intelligent networking, real- information delivery tasks with the semation error as the
time data analytics, as well as high security for intelligent performance metric. A distributed edge DL approach has been
vehicles [27], [117]. A general DL framework was proposed recently developed in [58] to enable low-latency semantic
in [26] to enable ultra-reliable and low-latency vehicular com- communication over IoT networks. This is achieved by jointly
munication, by incorporating the domain knowledge includ- optimizing the compressed DNN based transmitters at the edge
ing information theoretical tools and cross-layer optimization IoT devices and the quantized DNN based receivers at the edge
design. To minimize the vehicles’ queuing latency, a FL server over the wireless fading channels.
approach was developed in [300] to learn the tail distribution Industrial IoT (IIoT) is a production-oriented industrial
of the queue lengths. To cope with the high mobility and network for connecting industrial devices and equipments,
heterogeneous structures in vehicular networks, DL becomes processing and exchanging generated data, as well as opti-
powerful for dynamic resource allocation [24] and network mizing the production system [307]. Besides, digital twin
traffic control [27]. In particular, edge AI techniques, including is becoming a key technology for smart manufacturing in
distributed RL [55], [301], decentralized GNN [23], as well industrial 4.0 by connecting physical machines and digital
as distributed DNN with binarized output layer [61], are able representations in a cyber-physical system [308], [309]. This is
to learn and execute the distributed resource allocation polices achieved by providing a virtual representation of the industrial
in an automatic and real-time manner. entities and products’ life-cycle to predict and optimize the
The data processing tasks for autonomous driving mainly behaviors of the manufacturing process. Edge AI provides a
include perception, high-definition (HD) mapping, as well promising way to model and deploy digital twins for IIoT net-
as SLAM [117], [302]. Specifically, to understand the envi- works to process the high volume of industrial streaming data
ronments for intelligent decision making, various sensory with low-latency and high-security guarantees. Specifically,
data from onboard sensors (e.g., light detection and ranging edge computing provides a general platform for inferring DNN
(LiDAR), cameras, radar and sonar) need to be processed for models via computation offloading to reduce network latency
the perception tasks, including localization, object detection and operation cost in IIoT [119]. FL becomes a key enabling
and tracking. The perception capability can be enhanced by technology to support intelligent IIoT applications (e.g., smart
edge AI systems, e.g., edge device-server co-inference of DNN grid and smart manufacturing) and provide IIoT services (e.g.,
models for vision based perception tasks [303]. HD mapping data offloading and mobile crowdsensing) [120]. In particular,
aims at constructing a representation of the vehicles operat- blockchain empowered FL was proposed in [310] to pro-
ing environments, e.g., obstacles, landmarks position, curva- vide secure communication and private data sharing schemes
ture and slope. This is imperative to achieve high accurate for constructing digital twin IIoT networks, followed by
reducing communication overheads via asynchronous model systems. Combining the presentations of edge training in
aggregation. Section II, edge inference in Section III, resource allocation in
3) Smart Healthcare: Smart healthcare aims to realize a Section IV, and system architecture in Section V, we complete
common platform for efficient and personalized healthcare, the roadmap for edge AI ecosystem, as shown in Fig. 2.
intelligent health monitoring, and precision medicine devel- We hope these results can encourage more communities and
opment via collaboration among multiple participants (e.g., stakeholders to engage in industrializing and commercializing
doctors, patients, hospitals, and research institutions). This edge AI in the era of 6G.
is achieved by emerging advanced technologies, including
DL [311], [312], Tactile Internet, IoT, edge AI, and wireless VII. C ONCLUSION
communications. In particular, edge AI with distributed and Embedding low-power, low-latency, reliable, and trustwor-
secure DL has been demonstrated to be able to signifi- thy intelligence into the network edge is an inevitable trend
cantly improve the reliability, accuracy, scalability, privacy and disruptive shift in both academia and industry. Edge AI
and security for precision medicine and Internet of Med- serves as a distributed neural network to imbue connected
ical Things [313], including medical imaging, drug develop- intelligence in 6G, thereby enabling intelligent and seamless
ment, and chronic disease management [314]. Specifically, interactions among the human world, physical world, and
Kaissis et al. [315] presented a FL approach for medical digital world. The challenges for building edge AI ecosys-
imaging to preserve privacy and avoid potential attacks against tems are multidisciplinary spanning wireless communications,
the datasets or learning algorithms. Besides, swarm learning machine learning, operation research, domain applications,
has recently been developed in [35] to provide a decentral- regulations and ethics. In this paper, we have investigated
ized and confidential clinical disease detection solution for the key wireless communication techniques, effective resource
diseases (e.g., COVID-19, tuberculosis, and leukaemia). This management approaches and holistic network architectures
is achieved by leveraging the blockchain and edge comput- to design scalable and trustworthy edge AI systems. The
ing techniques to develop a secure and private decentralized standardizations, platforms, and applications were also dis-
learning architecture while keeping the medical data locally. cussed for productization and commercialization of edge AI.
MIT Media Lab established a split learning project to allow We hope that this article will serve as a valuable reference
health entities collaboration for training patient diagnostic and guideline for further considering edge AI opportunities
models without sharing sensitive raw data [316]. An RL across theoretical, algorithmic, systematic, and entrepreneurial
approach for decisions making in patient treatment was intro- considerations to embrace the exciting era of edge AI.
duced in [317] to realize safe and risk-conscious healthcare
practice. R EFERENCES
Haptic communication [121] aims at delivering the skill set [1] Resilient and Intelligent NextG Systems (RINGS). [Online]. Available:
(e.g., the manipulation skills representation learned from the https://www.nsf.gov/pubs/2021/nsf21581/nsf21581.pdf
multisensory tactile and visual data [318], and the signatures of [2] Expanded 6G Vision, Use Cases and Societal Values. [Online].
Available: https://hexa-x.eu/wp-content/uploads/2021/05/Hexa-
the human grasp learned using a tactile glove [319]) over the X_D1.2.pdf
Tactile Internet in an ultra-reliable and low-latency manner. [3] IMT-2030 (6G) Promotion Group, 6G Vision and Candidate
It has potentials in healthcare applications including tele- Technologies. [Online]. Available: http://www.caict.ac.cn/
english/news/202106/P020210608349616163475.pdf
diagnosis, tele-rehabilitation, and tele-surgery, which turns out [4] Network 2030: A Blueprint of Technology, Applications and Market
to be essential during the ongoing COVID-19 pandemic. Edge Drivers Towards the Year 2030 and Beyond, document FG-NET-2030,
AI becomes a key enabling technique for the Tactile Internet ITU Focus Group on Technologies for Network, May 2019.
[5] J. G. Andrews et al., “What will 5G be?” IEEE J. Sel. Areas Commun.,
with human-in-the loop to facilitate ultra-responsive and truly vol. 32, no. 6, pp. 1065–1082, Jun. 2014.
immersive tactile actuation in the tele-operation systems [320]. [6] M. Shafi et al., “5G: A tutorial overview of standards, trials, challenges,
This is achieved by enabling the network edge with intelli- deployment, and practice,” IEEE J. Sel. Areas Commun., vol. 35, no. 6,
pp. 1201–1221, Jun. 2017.
gent prediction capability for haptic information (e.g., tactile [7] Huawei 5.5G. [Online]. Available: https://www.huawei.com/en/news/
feedback and control traffics) [321], as well as the intelligent 2020/11/mbbf-shanghai-huawei-david-wang-5dot5g
resource allocations across the whole network layers [322]. [8] H. Tataria, M. Shafi, A. F. Molisch, M. Dohler, H. Sjöland, and
F. Tufvesson, “6G wireless systems: Vision, requirements, challenges,
Specifically, a distributed optimization framework was devel- insights, and opportunities,” Proc. IEEE, vol. 109, no. 7, pp. 1–34,
oped in [323] to design an edge computing assisted Tactile Jul. 2021.
[9] M. Maier, A. Ebrahimzadeh, S. Rostami, and A. Beniiche, “The internet
Internet for achieving both the ultra-low latency and high of no things: Making the internet disappear and ‘see the invisible,”’
energy efficiency. Such a distributed optimization algorithm IEEE Commun. Mag., vol. 58, no. 11, pp. 76–82, Nov. 2020.
can be further learned via the distributed DL techniques [61]. [10] X.-H. You et al., “Towards 6G wireless communication networks:
Vision, enabling technologies, and new paradigm shifts,” Sci. China
Besides, a variational optimization framework was proposed Inf. Sci., vol. 64, no. 1, pp. 1–74, Jan. 2021.
in [324] to enjoy low-latency and high-reliability for mas- [11] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems:
sive access in the Tactile Internet. The variational decision Applications, trends, technologies, and open research problems,” IEEE
function can be further parameterized via DNNs with the Netw., vol. 34, no. 3, pp. 134–142, May 2020.
[12] K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y.-J.-A. Zhang,
capability of distributed training and inference for practical “The roadmap to 6G: AI empowered wireless networks,” IEEE Com-
deployments [23], [324]. mun. Mag., vol. 57, no. 8, pp. 84–90, Aug. 2019.
In summary, this section presented standardizations, plat- [13] S. Ali, W. Saad, D. Steinbach, I. Ahmad, and J. Huusko, “White paper
on machine learning in wireless communication networks,” 6G Res.
forms, and applications for practical deployment of edge AI Vis., no. 7, pp. 1–36, Jun. 2020.
[14] J. Wang et al., “Interplay between RIS and AI in wireless commu- [38] T. Chen, K. Zhang, G. B. Giannakis, and T. Basar, “Communication-
nications: Fundamentals, architectures, applications, and open research efficient policy gradient methods for distributed reinforcement learn-
problems,” IEEE J. Sel. Areas Commun., vol. 39, no. 8, pp. 2271–2288, ing,” IEEE Trans. Control Netw. Syst., early access, May 6, 2021, doi:
Aug. 2021. 10.1109/TCNS.2021.3078100.
[15] H. Kim, Y. Jiang, R. B. Rana, S. Kannan, S. Oh, and P. Viswanath, [39] K. Zhang, Z. Yang, H. Liu, T. Zhang, and T. Basar, “Fully decentralized
“Communication algorithms via deep learning,” in Proc. Int. Conf. multi-agent reinforcement learning with networked agents,” in Proc.
Learn. Represent. (ICLR), Apr. 2018, pp. 1–19. Int. Conf. Mach. Learn. (ICML), 2018, pp. 5872–5881.
[16] T. O’Shea and J. Hoydis, “An introduction to deep learning for the [40] Y. Dong, J. Cheng, M. J. Hossain, and V. C. M. Leung, “Secure
physical layer,” IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, distributed on-device learning networks with byzantine adversaries,”
pp. 563–575, Dec. 2017. IEEE Netw., vol. 33, no. 6, pp. 180–187, Nov. 2019.
[17] J. Hoydis, F. A. Aoudia, A. Valcarce, and H. Viswanathan, “Toward [41] D. Liu and O. Simeone, “Privacy for free: Wireless federated learning
a 6G AI-native air interface,” IEEE Commun. Mag., vol. 59, no. 5, via uncoded transmission with adaptive power control,” IEEE J. Sel.
pp. 76–81, May 2021. Areas Commun., vol. 39, no. 1, pp. 170–185, Jan. 2021.
[18] Y. M. Saidutta, A. Abdi, and F. Fekri, “Joint source-channel coding [42] C. Dwork and A. Roth, “The algorithmic foundations of differen-
over additive noise analog channels using mixture of variational autoen- tial privacy,” Found. Trends Theor. Comput. Sci., vol. 9, nos. 3–4,
coders,” IEEE J. Sel. Areas Commun., vol. 39, no. 7, pp. 2000–2013, pp. 211–407, 2014.
Jul. 2021.
[43] Q. Yu, S. Li, N. Raviv, S. M. M. Kalan, M. Soltanolkotabi, and
[19] E. C. Strinati and S. Barbarossa, “6G networks: Beyond Shannon
S. A. Avestimehr, “Lagrange coded computing: Optimal design for
towards semantic and goal-oriented communications,” Comput. Netw.,
resiliency, security, and privacy,” Proc. Mach. Learn. Res., vol. 89,
vol. 190, May 2021, Art. no. 107930.
pp. 1215–1225, Apr. 2019.
[20] J. Shao, Y. Mao, and J. Zhang, “Learning task-oriented communica-
tion for edge inference: An information bottleneck approach,” 2021, [44] G. Zhu, Y. Wang, and K. Huang, “Broadband analog aggregation for
low-latency federated edge learning,” IEEE Trans. Wireless Commun.,
arXiv:2102.04170.
vol. 19, no. 1, pp. 491–506, Jan. 2020.
[21] H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled
semantic communication systems,” IEEE Trans. Signal Process., [45] M. M. Amiri and D. Gündüz, “Machine learning at the wireless
vol. 69, pp. 2663–2675, 2021. edge: Distributed stochastic gradient descent over-the-air,” IEEE Trans.
[22] Y. Shen, Y. Shi, J. Zhang, and K. B. Letaief, “LORM: Learning Signal Process., vol. 68, pp. 2155–2169, 2020.
to optimize for resource management in wireless networks with few [46] T. T. Vu, D. T. Ngo, N. H. Tran, H. Q. Ngo, M. N. Dao,
training samples,” IEEE Trans. Wireless Commun., vol. 19, no. 1, and R. H. Middleton, “Cell-free massive MIMO for wireless fed-
pp. 665–679, Jan. 2020. erated learning,” IEEE Trans. Wireless Commun., vol. 19, no. 10,
[23] Y. Shen, Y. Shi, J. Zhang, and K. B. Letaief, “Graph neural networks pp. 6377–6392, Oct. 2020.
for scalable radio resource management: Architecture design and [47] H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta,
theoretical analysis,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, “Cell-free massive MIMO versus small cells,” IEEE Trans. Wireless
pp. 101–115, Jan. 2021. Commun., vol. 16, no. 3, pp. 1834–1850, Mar. 2017.
[24] L. Liang, H. Ye, G. Yu, and G. Y. Li, “Deep-learning-based wireless [48] Z. Wang et al., “Federated learning via intelligent reflecting surface,”
resource allocation with application to vehicular networks,” Proc. IEEE Trans. Wireless Commun., early access, Jul. 30, 2021, doi:
IEEE, vol. 108, no. 2, pp. 341–356, Feb. 2020. 10.1109/TWC.2021.3099505.
[25] T. Jiang, H. V. Cheng, and W. Yu, “Learning to reflect and to beamform [49] M. Di Renzo et al., “Smart radio environments empowered by reconfig-
for intelligent reflecting surface with implicit channel estimation,” IEEE urable intelligent surfaces: How it works, state of research, and the road
J. Sel. Areas Commun., vol. 39, no. 7, pp. 1931–1945, Jul. 2021. ahead,” IEEE J. Sel. Areas Commun., vol. 38, no. 11, pp. 2450–2525,
[26] C. She et al., “A tutorial on ultrareliable and low-latency communica- Nov. 2020.
tions in 6G: Integrating domain knowledge into deep learning,” Proc. [50] S. Hosseinalipour, C. G. Brinton, V. Aggarwal, H. Dai, and M. Chiang,
IEEE, vol. 109, no. 3, pp. 204–246, Mar. 2021. “From federated to fog learning: Distributed machine learning over
[27] F. Tang, Y. Kawamoto, N. Kato, and J. Liu, “Future intelligent and heterogeneous wireless networks,” IEEE Commun. Mag., vol. 58,
secure vehicular network toward 6G: Machine-learning approaches,” no. 12, pp. 41–47, Dec. 2020.
Proc. IEEE, vol. 108, no. 2, pp. 292–307, Feb. 2020. [51] J. Liu, Y. Shi, Z. M. Fadlullah, and N. Kato, “Space-air-ground
[28] W. Tong and P. Zhu, 6G: The Next Horizon: From Connected People integrated network: A survey,” IEEE Commun. Surveys Tuts., vol. 20,
and Things to Connected Intelligence. Cambridge, U.K.: Cambridge no. 4, pp. 2714–2741, 4th Quart., 2018.
Univ. Press, 2021. [52] K. Yang, Y. Shi, and Z. Ding, “Data shuffling in wireless distributed
[29] Y. Shi, K. Yang, T. Jiang, J. Zhang, and K. B. Letaief, “Communication- computing via low-rank optimization,” IEEE Trans. Signal Process.,
efficient edge AI: Algorithms and systems,” IEEE Commun. Surveys vol. 67, no. 12, pp. 3087–3099, Jun. 2019.
Tuts., vol. 22, no. 4, pp. 2167–2191, 4th Quart., 2020. [53] K. Yang, Y. Shi, W. Yu, and Z. Ding, “Energy-efficient processing
[30] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge and robust wireless cooperative transmission for edge inference,” IEEE
intelligence: Paving the last mile of artificial intelligence with edge Internet Things J., vol. 7, no. 10, pp. 9456–9470, Oct. 2020.
computing,” Proc. IEEE, vol. 107, no. 8, pp. 1738–1762, Aug. 2019.
[54] S. Hua, Y. Zhou, K. Yang, Y. Shi, and K. Wang, “Reconfigurable
[31] K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via over- intelligent surface for green edge inference,” IEEE Trans. Green
the-air computation,” IEEE Trans. Wireless Commun., vol. 19, no. 3, Commun. Netw., vol. 5, no. 2, pp. 964–979, Jun. 2021.
pp. 2022–2035, Mar. 2020.
[55] Y. S. Nasir and D. Guo, “Multi-agent deep reinforcement learning for
[32] J. Shao and J. Zhang, “Communication-computation trade-off in
dynamic power allocation in wireless networks,” IEEE J. Sel. Areas
resource-constrained edge inference,” IEEE Commun. Mag., vol. 58,
Commun., vol. 37, no. 10, pp. 2239–2250, Oct. 2019.
no. 12, pp. 20–26, Dec. 2020.
[33] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: [56] A. J. B. Ali, Z. S. Hashemifar, and K. Dantu, “Edge-SLAM: Edge-
Concept and applications,” ACM Trans. Intell. Syst. Technol., vol. 10, assisted visual simultaneous localization and mapping,” in Proc.
no. 2, pp. 1–19, Feb. 2019. Annu. Int. Conf. Mobile Syst. Appl. Service (MobiSys), Jun. 2020,
[34] E. B. P. Kairouz and H. B. McMahan, “Advances and open problems pp. 325–337.
in federated learning,” Found. Trends Mach. Learn., vol. 14, no. 1, [57] J. Xu et al., “Edge assisted mobile semantic visual SLAM,” in
pp. 1–210, 2021. Proc. IEEE Conf. Comput. Commun. (IEEE INFOCOM), Jul. 2020,
[35] S. Warnat-Herresthal et al., “Swarm learning for decentralized and pp. 1828–1837.
confidential clinical machine learning,” Nature, vol. 594, pp. 265–270, [58] H. Xie and Z. Qin, “A lite distributed semantic communication system
Jun. 2021. for Internet of Things,” IEEE J. Sel. Areas Commun., vol. 39, no. 1,
[36] O. Gupta and R. Raskar, “Distributed learning of deep neural network pp. 142–153, Jan. 2021.
over multiple agents,” J. Netw. Comput. Appl., vol. 116, pp. 1–8, [59] J. Liu, Z. Shi, S. Zhang, and N. Kato, “Distributed Q-learning aided
Aug. 2018. uplink grant-free NOMA for massive machine-type communications,”
[37] J. Park et al., “Communication-efficient and distributed learning over IEEE J. Sel. Areas Commun., vol. 39, no. 7, pp. 2029–2041, Jul. 2021.
wireless networks: Principles and applications,” Proc. IEEE, vol. 109, [60] Y. Shen, J. Zhang, S. Song, and K. B. Letaief, “AI empowered resource
no. 5, pp. 796–819, May 2021. management for future wireless networks,” 2021, arXiv:2106.06178.
[61] H. Lee, S. H. Lee, and T. Q. S. Quek, “Deep learning for distributed [85] D. Han, W. Chen, and Y. Fang, “Joint channel and queue
optimization: Applications to wireless resource management,” IEEE J. aware scheduling for latency sensitive mobile edge computing with
Sel. Areas Commun., vol. 37, no. 10, pp. 2251–2266, Oct. 2019. power constraints,” IEEE Trans. Wireless Commun., vol. 19, no. 6,
[62] H. Jang, O. Simeone, B. Gardner, and A. Gruning, “An introduction pp. 3938–3951, Jun. 2020.
to probabilistic spiking neural networks: Probabilistic models, learning [86] J. Shao, Y. Mao, and J. Zhang, “Task-oriented communication for
rules, and applications,” IEEE Signal Process. Mag., vol. 36, no. 6, multi-device cooperative edge inference,” 2021, arXiv:2109.00172.
pp. 64–77, Nov. 2019. [87] S. Xia, J. Zhu, Y. Yang, Y. Zhou, Y. Shi, and W. Chen, “Fast
[63] N. Skatchkovsky, H. Jang, and O. Simeone, “Spiking neural convergence algorithm for analog federated learning,” in Proc. IEEE
networks—Part III: Neuromorphic communications,” IEEE Commun. Int. Conf. Commun. (ICC), Jun. 2020, pp. 1–6.
Lett., vol. 25, no. 6, pp. 1746–1750, Jun. 2021. [88] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “A joint
[64] R. Li, Z. Zhao, X. Xu, F. Ni, and H. Zhang, “The collective advan- learning and communications framework for federated learning over
tage for advancing communications and intelligence,” IEEE Wireless wireless networks,” IEEE Trans. Wireless Commun., vol. 20, no. 1,
Commun., vol. 27, no. 4, pp. 96–102, Aug. 2020. pp. 269–283, Jan. 2021.
[65] Q. Yu et al., “An immunology-inspired network security architecture,” [89] L. Li et al., “Delay analysis of wireless federated learning based
IEEE Wireless Commun., vol. 27, no. 5, pp. 168–173, Oct. 2020. on saddle point approximation and large deviation theory,” 2021,
[66] Q. Yu et al., “A fully-decoupled RAN architecture for 6G inspired by arXiv:2103.16994.
neurotransmission,” J. Commun. Inf. Netw., vol. 4, no. 4, pp. 15–23, [90] M. Chen, H. V. Poor, W. Saad, and S. Cui, “Convergence time
Dec. 2019. optimization for federated learning over wireless networks,” IEEE
[67] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas,
Trans. Wireless Commun., vol. 20, no. 4, pp. 2457–2471, Apr. 2021.
“Communication-efficient learning of deep networks from decentral-
[91] Z. Wang, Y. Shi, Y. Zhou, H. Zhou, and N. Zhang, “Wireless-powered
ized data,” in Proc. Int. Conf. Artif. Intell. Stat. (AISTATS), vol. 54,
over-the-air computation in intelligent reflecting surface-aided IoT
2017, pp. 1273–1282.
networks,” IEEE Internet Things J., vol. 8, no. 3, pp. 1585–1598,
[68] J. Wang et al., “A field guide to federated optimization,” 2021,
Feb. 2021.
arXiv:2107.06917.
[69] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip [92] Z. Yang, M. Chen, W. Saad, C. S. Hong, and M. Shikh-Bahaei,
algorithms,” IEEE Trans. Inf. Theory, vol. 52, no. 6, pp. 2508–2530, “Energy efficient federated learning over wireless communication net-
Jun. 2006. works,” IEEE Trans. Wireless Commun., vol. 20, no. 3, pp. 1935–1949,
[70] A. H. Sayed, S.-Y. Tu, J. Chen, X. Zhao, and Z. J. Towfic, “Diffusion Mar. 2021.
strategies for adaptation and learning over networks: An examination [93] S. Huang, Y. Zhou, T. Wang, and Y. Shi, “Byzantine-resilient federated
of distributed strategies and network behavior,” IEEE Signal Process. machine learning via over-the-air computation,” in Proc. IEEE Int.
Mag., vol. 30, no. 3, pp. 155–171, May 2013. Conf. Commun. Workshops (ICC Workshops), Jun. 2021, pp. 1–6.
[71] Y. Lu and C. De Sa, “Optimal complexity in decentralized training,” [94] Z. Zhang, G. Zhu, R. Wang, V. K. N. Lau, and K. Huang, “Turning
in Proc. Int. Conf. Mach. Learn. (ICML), Jul. 2021, pp. 7111–7123. channel noise into an accelerator for over-the-air principal component
[72] M. Li, D. G. Andersen, A. J. Smola, and K. Yu, “Communication effi- analysis,” 2021, arXiv:2104.10095.
cient distributed machine learning with the parameter server,” in Proc. [95] W. Samek, G. Montavon, S. Lapuschkin, C. J. Anders, and
Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 27, 2014, pp. 19–27. K.-R. Müller, “Explaining deep neural networks and beyond: A review
[73] D. Lee, N. He, P. Kamalaruban, and V. Cevher, “Optimization for of methods and applications,” Proc. IEEE, vol. 109, no. 3, pp. 247–278,
reinforcement learning: From a single agent to cooperative agents,” Mar. 2021.
IEEE Signal Process. Mag., vol. 37, no. 3, pp. 123–135, May 2020. [96] Y. Shi, J. Zhang, W. Chen, and K. B. Letaief, “Generalized sparse and
[74] Y. Chen, L. Su, and J. Xu, “Distributed statistical machine learning low-rank optimization for ultra-dense networks,” IEEE Commun. Mag.,
in adversarial settings: Byzantine gradient descent,” Proc. ACM Meas. vol. 56, no. 6, pp. 42–48, Jun. 2018.
Anal. Comput. Syst., vol. 1, no. 2, pp. 1–25, Dec. 2017. [97] Y. Shi, H. Choi, Y. Shi, and Y. Zhou, “Algorithm unrolling for
[75] L. Liu, E. G. Larsson, W. Yu, P. Popovski, C. Stefanovic, and massive access via deep neural network with theoretical guarantee,”
E. de Carvalho, “Sparse signal processing for grant-free massive con- IEEE Trans. Wireless Commun., early access, Aug. 6, 2021, doi:
nectivity: A future paradigm for random access protocols in the Internet 10.1109/TWC.2021.3100500.
of Things,” IEEE Signal Process. Mag., vol. 35, no. 5, pp. 88–99, [98] Y. Shi, J. Zhang, B. O’Donoghue, and K. B. Letaief, “Large-scale
Sep. 2018. convex optimization for dense wireless cooperative networks,” IEEE
[76] T. Jiang, Y. Shi, J. Zhang, and K. B. Letaief, “Joint activity detec- Trans. Signal Process., vol. 63, no. 18, pp. 4729–4743, Sep. 2015.
tion and channel estimation for IoT networks: Phase transition and [99] J. Y. Gotoh, A. Takeda, and K. Tono, “DC formulations and algorithms
computation-estimation tradeoff,” IEEE Internet Things J., vol. 6, no. 4, for sparse optimization problems,” Math. Program., vol. 169, no. 1,
pp. 6212–6225, Aug. 2019. pp. 141–176, 2018.
[77] Z. Ding, X. Lei, G. K. Karagiannidis, R. Schober, J. Yuan, and [100] N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre, “Manopt, a
V. Bhargava, “A survey on non-orthogonal multiple access for 5G MATLAB toolbox for optimization on manifolds,” J. Mach. Learn.
networks: Research challenges and future trends,” IEEE J. Sel. Areas Res., vol. 15, no. 1, pp. 1455–1459, 2014.
Commun., vol. 35, no. 10, pp. 2181–2195, Oct. 2017. [101] Y. Shi, J. Zhang, and K. B. Letaief, “Low-rank matrix completion for
[78] L. Dai, B. Wang, Y. Yuan, S. Han, C.-L. I, and Z. Wang, “Non- topological interference management by Riemannian pursuit,” IEEE
orthogonal multiple access for 5G: Solutions, challenges, opportunities, Trans. Wireless Commun., vol. 15, no. 7, pp. 4703–4717, Jul. 2016.
and future research trends,” IEEE Commun. Mag., vol. 53, no. 9,
[102] Y. Shi, J. Zhang, and K. Letaief, “Robust group sparse beamforming for
pp. 74–81, Sep. 2015.
multicast green cloud-RAN with imperfect CSI,” IEEE Trans. Signal
[79] J. Dong, Y. Shi, and Z. Ding, “Blind over-the-air computation and
Process., vol. 63, no. 17, pp. 4647–4659, Sep. 2015.
data fusion via provable Wirtinger flow,” IEEE Trans. Signal Process.,
vol. 68, pp. 1136–1151, 2020. [103] Y. Yuan, G. Zheng, K.-K. Wong, B. Ottersten, and Z.-Q. Luo, “Transfer
[80] Y. Shi, J. Zhang, and K. B. Letaief, “Group sparse beamforming for learning and meta learning-based fast downlink beamforming adapta-
green cloud-RAN,” IEEE Trans. Wireless Commun., vol. 13, no. 5, tion,” IEEE Trans. Wireless Commun., vol. 20, no. 3, pp. 1742–1755,
pp. 2809–2823, May 2014. Mar. 2021.
[81] M. Fu, Y. Zhou, Y. Shi, W. Chen, and R. Zhang, “UAV aided over- [104] F. Sohrabi, K. M. Attiah, and W. Yu, “Deep learning for distributed
the-air computation,” 2021, arXiv:2106.00254. channel feedback and multiuser precoding in FDD massive MIMO,”
[82] E. Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: On-demand IEEE Trans. Wireless Commun., vol. 20, no. 7, pp. 4044–4057,
accelerating deep neural network inference via edge computing,” IEEE Jul. 2021.
Trans. Wireless Commun., vol. 19, no. 1, pp. 447–457, Jan. 2020. [105] C. Shen, C. Tekin, and M. van der Schaar, “A non-stochastic learning
[83] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Channel coding rate in the approach to energy efficient mobility management,” IEEE J. Sel. Areas
finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, Commun., vol. 34, no. 12, pp. 3854–3868, Dec. 2016.
pp. 2307–2359, May 2010. [106] W. Dai, C. Dai, K.-K.-R. Choo, C. Cui, D. Zou, and H. Jin, “SDTE:
[84] C.-F. Liu, M. Bennis, M. Debbah, and H. V. Poor, “Dynamic task A secure blockchain-based data trading ecosystem,” IEEE Trans. Inf.
offloading and resource allocation for ultra-reliable low-latency edge Forensics Security, vol. 15, pp. 725–737, 2020.
computing,” IEEE Trans. Commun., vol. 67, no. 6, pp. 4132–4150, [107] IEEE Guide for Architectural Framework and Application of Federated
Jun. 2019. Machine Learning, IEEE Standard 3652.1-2020, 2021, pp. 1–69.
[108] Harmonizing Standards for Edge Computing, ETSI ISG MEC, [133] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the conver-
Sophia Antipolis, France, Jul. 2020. gence of FedAvg on non-IID data,” in Proc. Int. Conf. Learn. Represent.
[109] C. He et al., “FedML: A research library and benchmark for federated (ICLR), 2020, pp. 1–26.
machine learning,” 2020, arXiv:2007.13518. [134] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith,
[110] Fate. [Online]. Available: https://fate.fedai.org/ “Federated optimization in heterogeneous networks,” in Proc. Mach.
[111] HarmonyOS. [Online]. Available: https://www.harmonyos.com/en/ Learn. Syst. (MLSys), vol. 2, 2020, pp. 429–450.
[112] M. Grant and S. Boyd. (Mar. 2014). CVX: MATLAB Software for [135] D. A. E. Acar, Y. Zhao, R. Matas, M. Mattina, P. Whatmough, and
Disciplined Convex Programming, Version 2.1. [Online]. Available: V. Saligrama, “Federated learning based on dynamic regularization,”
http://cvxr.com/cvx in Proc. Int. Conf. Learn. Represent. (ICLR), 2021, pp. 1–36.
[113] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd, “Conic optimization [136] C. T. Dinh, N. Tran, and J. Nguyen, “Personalized federated learning
via operator splitting and homogeneous self-dual embedding,” J. Optim. with Moreau envelopes,” in Proc. Adv. Neural Inf. Process. Syst.
Theory Appl., vol. 169, no. 3, pp. 1042–1068, Jun. 2016. (NeurIPS), vol. 33, 2020, pp. 21394–21405.
[114] T. Chen et al., “Learning to optimize: A primer and a benchmark,” [137] Y. Deng, M. M. Kamani, and M. Mahdavi, “Distributionally robust
2021, arXiv:2103.12828. federated averaging,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS),
[115] X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan, and X. Chen, vol. 33, 2020, pp. 15111–15122.
“Convergence of edge computing and deep learning: A comprehensive [138] R. Chen and I. C. Paschalidis, “Distributionally robust learning,” Found.
survey,” IEEE Commun. Surveys Tuts., vol. 22, no. 2, pp. 869–904, Trends Optim., vol. 4, nos. 1–2, pp. 1–243, 2020.
2nd Quart., 2020. [139] V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated
[116] S. Amakawa et al., “White paper on RF enabling 6G—Opportunities multi-task learning,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS),
and challenges from technology to spectrum,” 6G Res. Vis., no. 13, vol. 30, 2017, pp. 4427–4437.
pp. 1–68, Apr. 2021. [140] A. Fallah, A. Mokhtari, and A. Ozdaglar, “Personalized federated
learning with theoretical guarantees: A model-agnostic meta-learning
[117] J. Zhang and K. B. Letaief, “Mobile edge intelligence and computing
approach,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33,
for the internet of vehicles,” Proc. IEEE, vol. 108, no. 2, pp. 246–261,
2020, pp. 3557–3568.
Feb. 2020.
[141] R. Pathak and M. J. Wainwright, “FedSplit: An algorithmic framework
[118] K. Yang, Y. Shi, Y. Zhou, Z. Yang, L. Fu, and W. Chen, “Federated
for fast federated optimization,” 2020, arXiv:2005.05238.
machine learning for intelligent IoT via reconfigurable intelligent
[142] J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “Tackling
surface,” IEEE Netw., vol. 34, no. 5, pp. 16–22, Sep. 2020.
the objective inconsistency problem in heterogeneous federated opti-
[119] T. Qiu, J. Chi, X. Zhou, Z. Ning, M. Atiquzzaman, and D. O. Wu, mization,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2020,
“Edge computing in industrial Internet of Things: Architecture, pp. 1–13.
advances and challenges,” IEEE Commun. Surveys Tuts., vol. 22, no. 4,
[143] Y. Ruan, X. Zhang, S.-C. Liang, and C. Joe-Wong, “Towards flexible
pp. 2462–2488, 4th Quart., 2020.
device participation in federated learning,” in Proc. Int. Conf. Artif.
[120] D. C. Nguyen et al., “Federated learning for industrial Internet of Intell. Statist. (AISTATS), vol. 130, Apr. 2021, pp. 3403–3411.
Things in future industries,” IEEE Wireless Commun., early access, [144] X. Lian, C. Zhang, H. Zhang, C. Hsieh, W. Zhang, and J. Liu,
Aug. 6, 2021, doi: 10.1109/MWC.001.2100102. “Can decentralized algorithms outperform centralized algorithms? A
[121] K. Antonakoglou, X. Xu, E. Steinbach, T. Mahmoodi, and M. Dohler, case study for decentralized parallel stochastic gradient descent,” in
“Toward haptic communications over the 5G tactile internet,” Proc. Neural Inf. Process. Syst. (NeurIPS), 2017, pp. 5330–5340.
IEEE Commun. Surveys Tuts., vol. 20, no. 4, pp. 3034–3059, [145] S. Savazzi, M. Nicoli, M. Bennis, S. Kianoush, and L. Barbieri,
4th Quart., 2018. “Opportunities of federated learning in connected, cooperative, and
[122] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: automated industrial systems,” IEEE Commun. Mag., vol. 59, no. 2,
Challenges, methods, and future directions,” IEEE Signal Process. pp. 16–21, Feb. 2021.
Mag., vol. 37, no. 3, pp. 50–60, May 2020. [146] R. Xin, S. Kar, and U. A. Khan, “Decentralized stochastic optimization
[123] D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, “QSGD: and machine learning: A unified variance-reduction framework for
Communication-efficient SGD via gradient quantization and encod- robust performance and fast convergence,” IEEE Signal Process. Mag.,
ing,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), Dec. 2017, vol. 37, no. 3, pp. 102–113, May 2020.
pp. 1707–1718. [147] J. Wang and G. Joshi, “Cooperative SGD: A unified framework for
[124] W. Wen et al., “TernGrad: Ternary gradients to reduce communication the design and analysis of communication-efficient SGD algorithms,”
in distributed deep learning,” in Proc. Adv. Neural Inf. Process. Syst. 2018, arXiv:1808.07576.
(NeurIPS), vol. 30, Dec. 2017, pp. 1508–1518. [148] A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, and S. Stich, “A uni-
[125] J. Bernstein, Y.-X. Wang, K. Azizzadenesheli, and A. Anandkumar, fied theory of decentralized SGD with changing topology and local
“signSGD: Compressed optimisation for non-convex problems,” in updates,” in Proc. Int. Conf. Mach. Learn. (ICML), Nov. 2020,
Proc. Int. Conf. Mach. Learn. (ICML), vol. 80, Jul. 2018, pp. 560–569. pp. 5381–5393.
[126] Y. Du, S. Yang, and K. Huang, “High-dimensional stochastic gradient [149] A. Koloskova, S. Stich, and M. Jaggi, “Decentralized stochastic opti-
quantization for communication-efficient edge learning,” IEEE Trans. mization and gossip algorithms with compressed communication,” in
Signal Process., vol. 68, pp. 2128–2142, 2020. Proc. Int. Conf. Mach. Learn. (ICML), May 2019, pp. 3478–3487.
[127] N. Shlezinger, M. Chen, Y. C. Eldar, H. V. Poor, and S. Cui, “UVeQFed: [150] L. Kong, T. Lin, A. Koloskova, M. Jaggi, and S. U. Stich, “Consensus
Universal vector quantization for federated learning,” IEEE Trans. control for decentralized deep learning,” 2021, arXiv:2102.04828.
Signal Process., vol. 69, pp. 500–514, 2021. [151] G. Neglia, C. Xu, D. Towsley, and G. Calbi, “Decentralized gradient
[128] H. Wang, S. Sievert, S. Liu, Z. B. Charles, D. S. Papailiopoulos, and methods: Does topology matter?” in Proc. Int. Conf. Artif. Intell. Statist.
S. J. Wright, “ATOMO: Communication-efficient learning via atomic (AISTATS), vol. 108, Aug. 2020, pp. 2348–2358.
sparsification,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2018, [152] A. Elgabli, J. Park, A. S. Bedi, M. Bennis, and V. Aggarwal,
pp. 9872–9883. “GADMM: Fast and communication efficient framework for distributed
[129] A. F. Aji and K. Heafield, “Sparse communication for distributed machine learning,” J. Mach. Learn. Res., vol. 21, no. 76, pp. 1–39,
gradient descent,” in Proc. Conf. Empirical Methods Natural Lang. 2020.
Process. (EMNLP), Sep. 2017, pp. 440–445. [153] T. Lin, S. P. Karimireddy, S. U. Stich, and M. Jaggi, “Quasi-global
[130] L. Liu, J. Zhang, S. Song, and K. B. Letaief, “Hierarchical quantized momentum: Accelerating decentralized deep learning on heterogeneous
federated learning: Convergence analysis and system design,” 2021, data,” 2021, arXiv:2102.04761.
arXiv:2103.14272. [154] S. J. Wright, “Coordinate descent algorithms,” Math. Program.,
[131] F. Haddadpour, M. M. Kamani, A. Mokhtari, and M. Mahdavi, vol. 151, no. 1, pp. 3–34, 2015.
“Federated learning with compression: Unified analysis and sharp [155] A. Choromanska et al., “Beyond backprop: Online alternating mini-
guarantees,” in Proc. Int. Conf. Artif. Intell. Stat. (AISTATS), Mar. 2021, mization with auxiliary variables,” in Proc. Int. Conf. Mach. Learn.
pp. 2350–2358. (ICML), vol. 97, Jun. 2019, pp. 1193–1202.
[132] J. Sun, T. Chen, G. B. Giannakis, Q. Yang, and Z. Yang, “Lazily [156] B. Gu, Z. Dang, X. Li, and H. Huang, “Federated doubly stochastic
aggregated quantized gradient innovation for communication-efficient kernel learning for vertically partitioned data,” in Proc. 26th ACM
federated learning,” IEEE Trans. Pattern Anal. Mach. Intell., early SIGKDD Int. Conf. Knowl. Discovery Data Mining (ACM SIGKDD),
access, Oct. 23, 2020, doi: 10.1109/TPAMI.2020.3033286. Aug. 2020, pp. 2483–2493.
[157] Y. Hu, D. Niu, J. Yang, and S. Zhou, “FDML: A collaborative machine [182] X. Chen, D. W. K. Ng, W. Yu, E. G. Larsson, N. Al-Dhahir, and
learning framework for distributed features,” in Proc. Int. Conf. Knowl. R. Schober, “Massive access for 5G and beyond,” IEEE J. Sel. Areas
Discovery Data Mining (ACM SIGKDD), Jul. 2019, pp. 2232–2240. Commun., vol. 39, no. 3, pp. 615–637, Mar. 2021.
[158] B. Ying, K. Yuan, and A. H. Sayed, “Supervised learning under [183] Y. Wu, X. Gao, S. Zhou, W. Yang, Y. Polyanskiy, and G. Caire,
distributed features,” IEEE Trans. Signal Process., vol. 67, no. 4, “Massive access for future wireless communication systems,” IEEE
pp. 977–992, Feb. 2019. Wireless Commun., vol. 27, no. 4, pp. 148–156, Aug. 2020.
[159] L. Lyu, J. C. Bezdek, J. Jin, and Y. Yang, “FORESEEN: Towards differ- [184] L. Liu and W. Yu, “Massive connectivity with massive MIMO—Part I:
entially private deep inference for intelligent Internet of Things,” IEEE Device activity detection and channel estimation,” IEEE Trans. Signal
J. Sel. Areas Commun., vol. 38, no. 10, pp. 2418–2429, Oct. 2020. Process., vol. 66, no. 11, pp. 2933–2946, Jun. 2018.
[160] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, [185] S. Xia, Y. Shi, Y. Zhou, and X. Yuan, “Reconfigurable intelligent
“Deep reinforcement learning: A brief survey,” IEEE Signal Process. surface for massive connectivity,” 2021, arXiv:2101.10322.
Mag., vol. 34, no. 6, pp. 26–38, Nov. 2017. [186] V. Monga, Y. Li, and Y. C. Eldar, “Algorithm unrolling: Interpretable,
[161] V. Mnih et al., “Asynchronous methods for deep reinforcement learn- efficient deep learning for signal and image processing,” IEEE Signal
ing,” in Proc. Int. Conf. Mach. Learn. (ICML), 2016, pp. 1928–1937. Process. Mag., vol. 38, no. 2, pp. 18–44, Mar. 2021.
[162] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi- [187] Y. Shi, J. Dong, and J. Zhang, Low-Overhead Communications in IoT
agent actor-critic for mixed cooperative-competitive environments,” in Networks. Singapore: Springer, 2020.
Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2017, pp. 6382–6393. [188] J. Dong, K. Yang, and Y. Shi, “Blind demixing for low-latency commu-
[163] K. Zhang, Z. Yang, and T. Başar, “Multi-agent reinforcement nication,” IEEE Trans. Wireless Commun., vol. 18, no. 2, pp. 897–911,
learning: A selective overview of theories and algorithms,” 2019, Feb. 2019.
arXiv:1911.10635. [189] J. Dong and Y. Shi, “Nonconvex demixing from bilinear measure-
[164] S. Zeng, A. Anwar, T. Doan, A. Raychowdhury, and J. Romberg, ments,” IEEE Trans. Signal Process., vol. 66, no. 19, pp. 5152–5166,
“A decentralized policy gradient approach to multi-task reinforcement Oct. 2018.
learning,” 2020, arXiv:2006.04338. [190] X. Bian, Y. Mao, and J. Zhang, “Supporting more active users for
[165] G. Damaskinos, E. M. El Mhamdi, R. Guerraoui, A. H. A. Guirguis, massive access via data-assisted activity detection,” in Proc. IEEE Int.
and S. L. A. Rouault, “AGGREGATHOR: Byzantine machine learning Conf. Commun. (ICC), Jun. 2021, pp. 1–6.
via robust gradient aggregation,” in Proc. Conf. Syst. Mach. Learn. [191] Y. M. X. Bian and J. Zhang, “Joint activity detection and data
(SysML), 2019, pp. 1–19. decoding in massive random access via a turbo receiver,” in Proc.
[166] A. Elgabli, J. Park, C. B. Issaid, and M. Bennis, “Harnessing wireless IEEE Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC),
channels for scalable and privacy-preserving federated learning,” IEEE Sep. 2021, pp. 1–5.
Trans. Commun., vol. 69, no. 8, pp. 5194–5208, Aug. 2021. [192] H. Cheng, Y. Xia, Y. Huang, Z. Lu, and L. Yang, “Deep neural network
[167] W.-N. Chen, P. Kairouz, and A. Ozgur, “Breaking the communication- aided low-complexity MPA receivers for uplink SCMA systems,” IEEE
privacy-accuracy trilemma,” in Proc. Adv. Neural Inf. Process. Syst., Trans. Veh. Technol., vol. 70, no. 9, pp. 9050–9062, Sep. 2021.
vol. 33, 2020, pp. 3312–3324. [193] N. Ye, X. Li, H. Yu, L. Zhao, W. Liu, and X. Hou, “DeepNOMA:
[168] Z. Wu, Q. Ling, T. Chen, and G. B. Giannakis, “Federated variance- A unified framework for NOMA using deep multi-task learning,”
reduced stochastic gradient descent with robustness to Byzantine IEEE Trans. Wireless Commun., vol. 19, no. 4, pp. 2208–2225,
attacks,” IEEE Trans. Signal Process., vol. 68, pp. 4583–4596, 2020. Apr. 2020.
[169] D. Yin, Y. Chen, R. Kannan, and P. Bartlett, “Byzantine-robust dis- [194] C. Huang, G. Chen, Y. Gong, P. Xu, Z. Han, and J. A. Chambers,
tributed learning: Towards optimal statistical rates,” in Proc. Int. Conf. “Buffer-aided relay selection for cooperative hybrid NOMA/OMA
Mach. Learn. (ICML), 2018, pp. 5650–5659. networks with asynchronous deep reinforcement learning,” IEEE J. Sel.
[170] P. Blanchard, E. M. E. Mhamdi, R. Guerraoui, and J. Stainer, “Machine Areas Commun., vol. 39, no. 8, pp. 2514–2525, Aug. 2021.
learning with adversaries: Byzantine tolerant gradient descent,” in Proc. [195] Y. Lu, P. Cheng, Z. Chen, W. H. Mow, Y. Li, and B. Vucetic,
Adv. Neural Inf. Process. Syst. (NeurIPS), 2017, pp. 118–128. “Deep multi-task learning for cooperative NOMA: System design and
[171] J. So, B. Güler, and A. S. Avestimehr, “Byzantine-resilient secure principles,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 61–78,
federated learning,” IEEE J. Sel. Areas Commun., vol. 39, no. 7, Jan. 2021.
pp. 2168–2181, Jul. 2021. [196] X. Zhai, X. Chen, J. Xu, and D. W. K. Ng, “Hybrid beamforming
[172] H. Kim, J. Park, M. Bennis, and S.-L. Kim, “Blockchained on- for massive MIMO over-the-air computation,” IEEE Trans. Commun.,
device federated learning,” IEEE Commun. Lett., vol. 24, no. 6, vol. 69, no. 4, pp. 2737–2751, Apr. 2021.
pp. 1279–1283, Jun. 2020. [197] G. Zhu and K. Huang, “MIMO over-the-air computation for high-
[173] M. M. Amiri and D. Gündüz, “Federated learning over wireless mobility multimodal sensing,” IEEE Internet Things J., vol. 6, no. 4,
fading channels,” IEEE Trans. Wireless Commun., vol. 19, no. 5, pp. 6089–6103, Aug. 2019.
pp. 3546–3557, May 2020. [198] Y. Shi, J. Zhang, K. B. Letaief, B. Bai, and W. Chen, “Large-
[174] T. Sery and K. Cohen, “On analog gradient descent learning over scale convex optimization for ultra-dense cloud-RAN,” IEEE Wireless
multiple access fading channels,” IEEE Trans. Signal Process., vol. 68, Commun., vol. 22, no. 3, pp. 84–91, Jun. 2015.
pp. 2897–2911, 2020. [199] M. Peng, Y. Sun, X. Li, Z. Mao, and C. Wang, “Recent advances
[175] G. Zhu, Y. Du, D. Gündüz, and K. Huang, “One-bit over-the-air aggre- in cloud radio access networks: System architectures, key techniques,
gation for communication-efficient federated edge learning: Design and and open issues,” IEEE Commun. Surveys Tuts., vol. 18, no. 3,
convergence analysis,” IEEE Trans. Wireless Commun., vol. 20, no. 3, pp. 2282–2308, 3rd Quart., 2016.
pp. 2120–2135, Mar. 2021. [200] L. Xing, Y. Zhou, and Y. Shi, “Over-the-air computation via cloud
[176] X. Wei and C. Shen, “Federated learning over noisy channels: Conver- radio access networks,” in Proc. IEEE Int. Conf. Commun. Workshops
gence analysis and design examples,” 2021, arXiv:2101.02198. (ICC Workshops), Montreal, QC, Canada, Jun. 2021, pp. 1–6.
[177] M. M. Amiria, T. M. Dumanb, D. Gündüzc, S. R. Kulkarni, and [201] J. Zhang, E. Björnson, M. Matthaiou, D. W. K. Ng, H. Yang, and
H. V. Poor, “Collaborative machine learning at the wireless edge with D. J. Love, “Prospective multiple antenna technologies for beyond 5G,”
blind transmitters,” IEEE Trans. Wireless Commun., 2021. IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1637–1660, Aug. 2020.
[178] H. Liu, X. Yuan, and Y.-J. A. Zhang, “Reconfigurable intelligent [202] G. Zhu, J. Xu, K. Huang, and S. Cui, “Over-the-air computing for
surface enabled federated learning: A unified communication-learning wireless data aggregation in massive IoT,” 2020, arXiv:2009.02181.
design approach,” IEEE Trans. Wireless Commun., early access, [203] M. Fu, Y. Zhou, Y. Shi, and K. B. Letaief, “Reconfigurable intelligent
Jun. 10, 2021, doi: 10.1109/TWC.2021.3086116. surface empowered downlink non-orthogonal multiple access,” IEEE
[179] X. Fan, Y. Wang, Y. Huo, and Z. Tian, “Joint optimization of commu- Trans. Commun., vol. 69, no. 6, pp. 3802–3817, Jun. 2021.
nications and federated learning over the air,” 2021, arXiv:2104.03490. [204] M. M. Amiri and D. Gündüz, “Computation scheduling for distrib-
[180] C. Fang, H. Dong, and T. Zhang, “Mathematical models of overpara- uted machine learning with straggling workers,” IEEE Trans. Signal
meterized neural networks,” Proc. IEEE, vol. 109, no. 5, pp. 683–703, Process., vol. 67, no. 24, pp. 6270–6284, Dec. 2019.
May 2021. [205] T. Bai, C. Pan, Y. Deng, M. Elkashlan, A. Nallanathan, and L. Hanzo,
[181] S. Savazzi, M. Nicoli, and V. Rampa, “Federated learning with cooper- “Latency minimization for intelligent reflecting surface aided mobile
ating devices: A consensus approach for massive IoT networks,” IEEE edge computing,” IEEE J. Sel. Areas Commun., vol. 38, no. 11,
Internet Things J., vol. 7, no. 5, pp. 4641–4654, May 2020. pp. 2666–2682, Nov. 2020.
[206] X. Yuan, Y.-J. A. Zhang, Y. Shi, W. Yan, and H. Liu, “Reconfigurable- [229] J. Shao, H. Zhang, Y. Mao, and J. Zhang, “Branchy-GNN: A device-
intelligent-surface empowered wireless communications: Challenges edge co-inference framework for efficient point cloud processing,”
and opportunities,” IEEE Wireless Commun., vol. 28, no. 2, in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP),
pp. 136–143, Apr. 2021. Jun. 2021, pp. 8488–8492.
[207] Q. Wu, S. Zhang, B. Zheng, C. You, and R. Zhang, “Intelligent [230] J. Cao, W. Feng, N. Ge, and J. Lu, “Delay characterization of mobile-
reflecting surface-aided wireless communications: A tutorial,” IEEE edge computing for 6G time-sensitive services,” IEEE Internet Things
Trans. Commun., vol. 69, no. 5, pp. 3313–3351, May 2021. J., vol. 8, no. 5, pp. 3758–3773, Mar. 2021.
[208] C. Huang et al., “Holographic MIMO surfaces for 6G wireless net- [231] H. Ren, C. Pan, Y. Deng, M. Elkashlan, and A. Nallanathan, “Joint pilot
works: Opportunities, challenges, and trends,” IEEE Wireless Commun., and payload power allocation for massive-MIMO-enabled URLLC IIoT
vol. 27, no. 5, pp. 118–125, Oct. 2020. networks,” IEEE J. Sel. Areas Commun., vol. 38, no. 5, pp. 816–830,
[209] W. Fang, Y. Jiang, Y. Shi, Y. Zhou, W. Chen, and K. B. Letaief, May 2020.
“Over-the-air computation via reconfigurable intelligent surface,” 2021, [232] V. Verma et al., “Manifold mixup: Better representations by interpo-
arXiv:2105.05113. lating hidden states,” in Proc. Int. Conf. Mach. Learn. (ICML), 2019,
[210] X. Zhu, C. Jiang, L. Yin, L. Kuang, N. Ge, and J. Lu, “Coopera- pp. 6438–6447.
tive multigroup multicast transmission in integrated terrestrial-satellite [233] J. Shao and J. Zhang, “BottleNet++: An end-to-end approach for
networks,” IEEE J. Sel. Areas Commun., vol. 36, no. 5, pp. 981–992, feature compression in device-edge co-inference systems,” in Proc.
May 2018. IEEE Int. Conf. Commun. Workshops (ICC Workshops), Jun. 2020,
[211] N. Saeed, A. Elzanaty, H. Almorad, H. Dahrouj, T. Y. Al-Naffouri, pp. 1–6.
and M.-S. Alouini, “CubeSat communications: Recent advances and [234] M. Jankowski, D. Gündüz, and K. Mikolajczyk, “Wireless image
future challenges,” IEEE Commun. Surveys Tuts., vol. 22, no. 3, retrieval at the edge,” IEEE J. Sel. Areas Commun., vol. 39, no. 1,
pp. 1839–1862, 3rd Quart., 2020. pp. 89–100, Jan. 2021.
[212] Y. Zeng, Q. Wu, and R. Zhang, “Accessing from the sky: A tutorial [235] N. Tishby, F. C. N. Pereira, and W. Bialek, “The information bottleneck
on UAV communications for 5G and beyond,” Proc. IEEE, vol. 107, method,” in Proc. Annu. Allerton Conf. Commun. Control Comput.,
no. 12, pp. 2327–2375, Dec. 2019. 2000, pp. 368–377.
[213] H. Zhou, W. Xu, J. Chen, and W. Wang, “Evolutionary V2X technolo- [236] I. E. Aguerri and A. Zaidi, “Distributed variational representation
gies toward the internet of vehicles: Challenges and opportunities,” learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1,
Proc. IEEE, vol. 108, no. 2, pp. 308–323, Feb. 2020. pp. 120–138, Jan. 2021.
[214] F. Liu, W. Yuan, C. Masouros, and J. Yuan, “Radar-assisted predictive [237] X. Cao, G. Zhu, J. Xu, Z. Wang, and S. Cui, “Optimized
beamforming for vehicular links: Communication served by sensing,” power control design for over-the-air federated edge learning,” 2021,
IEEE Trans. Wireless Commun., vol. 19, no. 11, pp. 7704–7719, arXiv:2106.09316.
Nov. 2020. [238] H. H. Yang, Z. Liu, T. Q. S. Quek, and H. V. Poor, “Scheduling policies
[215] K. Bonawitz et al., “Towards federated learning at scale: System for federated learning in wireless networks,” IEEE Trans. Commun.,
design,” 2019, arXiv:1902.01046. vol. 68, no. 1, pp. 317–333, Jan. 2020.
[216] N. Cheng et al., “Space/aerial-assisted computing offloading for IoT [239] J. Ren, G. Yu, and G. Ding, “Accelerating DNN training in wireless
applications: A learning-based approach,” IEEE J. Sel. Areas Commun., federated edge learning systems,” IEEE J. Sel. Areas Commun., vol. 39,
vol. 37, no. 5, pp. 1117–1129, May 2019. no. 1, pp. 219–232, Jan. 2021.
[217] S. Rangan, T. S. Rappaport, and E. Erkip, “Millimeter-wave cellular [240] S. Wang et al., “Adaptive federated learning in resource constrained
wireless networks: Potentials and challenges,” Proc. IEEE, vol. 102, edge computing systems,” IEEE J. Sel. Areas Commun., vol. 37, no. 3,
no. 3, pp. 366–385, Mar. 2014. pp. 1205–1221, Jun. 2019.
[218] T. S. Rappaport et al., “Wireless communications and applications [241] D. Wen, M. Bennis, and K. Huang, “Joint parameter-and-bandwidth
above 100 GHz: Opportunities and challenges for 6G and beyond,” allocation for improving the efficiency of partitioned edge learning,”
IEEE Access, vol. 7, pp. 78729–78757, 2019. IEEE Trans. Wireless Commun., vol. 19, no. 12, pp. 8272–8286,
[219] R. D. Yates, Y. Sun, D. R. Brown, S. K. Kaul, E. Modiano, and Dec. 2020.
S. Ulukus, “Age of information: An introduction and survey,” IEEE [242] W. Shi, S. Zhou, Z. Niu, M. Jiang, and L. Geng, “Joint device
J. Sel. Areas Commun., vol. 39, no. 5, pp. 1183–1210, May 2021. scheduling and resource allocation for latency constrained wireless
[220] Y. Cheng, D. Wang, P. Zhou, and T. Zhang, “Model compression and federated learning,” IEEE Trans. Wireless Commun., vol. 20, no. 1,
acceleration for deep neural networks: The principles, progress, and pp. 453–467, Jan. 2021.
challenges,” IEEE Signal Process. Mag., vol. 35, no. 1, pp. 126–136, [243] M. Chen, N. Shlezinger, H. V. Poor, Y. C. Eldar, and S. Cui,
Jan. 2018. “Communication-efficient federated learning,” Proc. Nat. Acad. Sci.
[221] S. Li, Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “A scalable USA, vol. 118, no. 17, 2021, Art. no. e2024789118.
framework for wireless distributed computing,” IEEE/ACM Trans. [244] C. T. Dinh et al., “Federated learning over wireless networks: Con-
Netw., vol. 25, no. 5, pp. 2643–2654, Oct. 2017. vergence analysis and resource allocation,” IEEE/ACM Trans. Netw.,
[222] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on vol. 29, no. 1, pp. 398–409, Feb. 2021.
large clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008. [245] S. Zheng, C. Shen, and X. Chen, “Design and analysis of uplink and
[223] S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, “A fundamen- downlink communications for federated learning,” IEEE J. Sel. Areas
tal tradeoff between computation and communication in distributed Commun., vol. 39, no. 7, pp. 2150–2167, Jul. 2021.
computing,” IEEE Trans. Inf. Theory, vol. 64, no. 1, pp. 109–128, [246] T. F. de Lima et al., “Machine learning with neuromorphic photonics,”
Jan. 2018. J. Lightw. Technol., vol. 37, no. 5, pp. 1515–1534, Mar. 1, 2019.
[224] M. Goldenbaum, H. Boche, and S. Stańczak, “Nomographic functions: [247] M. Alioto, V. De, and A. Marongiu, “Energy-quality scalable integrated
Efficient computation in clustered Gaussian sensor networks,” IEEE circuits and systems: Continuing energy scaling in the twilight of
Trans. Wireless Commun., vol. 14, no. 4, pp. 2093–2105, Apr. 2015. Moore’s law,” IEEE Trans. Emerg. Sel. Topics Circuits Syst., vol. 8,
[225] F. Wang and V. K. N. Lau, “Multi-level over-the-air aggregation no. 4, pp. 653–678, Dec. 2018.
of mobile edge computing over D2D wireless networks,” 2021, [248] Q. Zeng, Y. Du, and K. Huang, “Wirelessly powered federated edge
arXiv:2105.00471. learning: Optimal tradeoffs between convergence and power transfer,”
[226] K. Li, M. Tao, and Z. Chen, “Exploiting computation replication for 2021, arXiv:2102.12357.
mobile edge computing: A fundamental computation-communication [249] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “How to evaluate deep
tradeoff study,” IEEE Trans. Wireless Commun., vol. 19, no. 7, neural network processors: TOPS/W (alone) considered harmful,” IEEE
pp. 4563–4578, Jul. 2020. Solid State Circuits Mag., vol. 12, no. 3, pp. 28–41, Aug. 2020.
[227] D. Gesbert, S. Hanly, H. Huang, S. S. Shitz, O. Simeone, and W. Yu, [250] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing
“Multi-cell MIMO cooperative networks: A new look at interference,” of deep neural networks: A tutorial and survey,” Proc. IEEE, vol. 105,
IEEE J. Sel. Areas Commun., vol. 28, no. 9, pp. 1380–1408, Dec. 2010. no. 12, pp. 2295–2329, Dec. 2017.
[228] B. Clerckx, H. Joudeh, C. Hao, M. Dai, and B. Rassouli, “Rate splitting [251] Y. Mao, J. Zhang, Z. Chen, and K. B. Letaief, “Dynamic compu-
for MIMO wireless networks: A promising PHY-layer strategy for tation offloading for mobile-edge computing with energy harvesting
LTE evolution,” IEEE Commun. Mag., vol. 54, no. 5, pp. 98–105, devices,” IEEE J. Sel. Areas Commun., vol. 34, no. 12, pp. 3590–3605,
May 2016. Dec. 2016.
[252] Y. Qu et al., “Decentralized privacy using blockchain-enabled federated [274] W. Cui, K. Shen, and W. Yu, “Spatial deep learning for wire-
learning in fog computing,” IEEE Internet Things J., vol. 7, no. 6, less scheduling,” IEEE J. Sel. Areas Commun., vol. 37, no. 6,
pp. 5171–5183, Jun. 2020. pp. 1248–1261, Jun. 2019.
[253] S. R. Pokhrel and J. Choi, “Federated learning with blockchain for [275] F. Meng, P. Chen, L. Wu, and J. Cheng, “Power allocation in multi-
autonomous vehicles: Analysis and design challenges,” IEEE Trans. user cellular networks: Deep reinforcement learning approaches,” IEEE
Commun., vol. 68, no. 8, pp. 4734–4746, Aug. 2020. Trans. Wireless Commun., vol. 19, no. 10, pp. 6255–6267, Oct. 2020.
[254] Y. Chi, Y. M. Lu, and Y. Chen, “Nonconvex optimization meets low- [276] C. Huang, R. Mo, and C. Yuen, “Reconfigurable intelligent surface
rank matrix factorization: An overview,” IEEE Trans. Signal Process., assisted multiuser MISO systems exploiting deep reinforcement learn-
vol. 67, no. 20, pp. 5239–5269, Oct. 2019. ing,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1839–1850,
[255] S. Xia and Y. Shi, “Learning shallow neural networks via provable Aug. 2020.
gradient descent with random initialization,” in Proc. IEEE Int. Conf. [277] H. Sun, W. Pu, M. Zhu, X. Fu, T.-H. Chang, and M. Hong, “Learning
Acoust., Speech Signal Process. (ICASSP), May 2019, pp. 5616–5620. to continuously optimize wireless resource in episodically dynamic
[256] J. Sun, Q. Qu, and J. Wright, “Complete dictionary recovery over the environment,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
sphere I: Overview and the geometric picture,” IEEE Trans. Inf. Theory, (ICASSP), Jun. 2021, pp. 4945–4949.
vol. 63, no. 2, pp. 853–884, Feb. 2017. [278] X. Liu, Y. Shi, J. Zhang, and K. B. Letaief, “Massive CSI acquisition
[257] C. Jin, R. Ge, P. Netrapalli, S. M. Kakade, and M. I. Jordan, “How for dense cloud-RANs with spatial-temporal dynamics,” IEEE Trans.
to escape saddle points efficiently,” in Proc. Int. Conf. Mach. Learn. Wireless Commun., vol. 17, no. 4, pp. 2557–2570, Apr. 2018.
(ICML), 2017, pp. 1724–1732. [279] J.-C. Shen, J. Zhang, E. Alsusa, and K. B. Letaief, “Compressed CSI
[258] Y. Shi, J. Cheng, J. Zhang, B. Bai, W. Chen, and K. B. Letaief, acquisition in FDD massive MIMO: How much training is needed?”
“Smoothed Lp -minimization for green cloud-RAN with user admission IEEE Trans. Wireless Commun., vol. 15, no. 6, pp. 4145–4156,
control,” IEEE J. Sel. Areas Commun., vol. 34, no. 4, pp. 1022–1036, Jun. 2016.
Apr. 2016. [280] L. Chen, N. Zhao, Y. Chen, X. Qin, and F. R. Yu, “Computation over
[259] Y. Bengio, A. Lodi, and A. Prouvost, “Machine learning for combi- MAC: Achievable function rate maximization in wireless networks,”
natorial optimization: A methodological tour d’horizon,” Eur. J. Oper. IEEE Trans. Commun., vol. 68, no. 9, pp. 5446–5459, Sep. 2020.
Res., vol. 290, no. 2, pp. 405–421, Apr. 2021. [281] Z. Wang, L. Liu, and S. Cui, “Channel estimation for intelligent
[260] Y. Zhang, B. Di, Z. Zheng, J. Lin, and L. Song, “Distributed multi- reflecting surface assisted multiuser communications: Framework, algo-
cloud multi-access edge computing by multi-agent reinforcement learn- rithms, and analysis,” IEEE Trans. Wireless Commun., vol. 19, no. 10,
ing,” IEEE Trans. Wireless Commun., vol. 20, no. 4, pp. 2565–2578, pp. 6607–6620, Oct. 2020.
Apr. 2021. [282] H. Liu, X. Yuan, and Y.-J.-A. Zhang, “Matrix-calibration-based cas-
[261] B. Feng, J. Gao, Y. Wu, W. Zhang, X.-G. Xia, and C. Xiao, “Optimiza- caded channel estimation for reconfigurable intelligent surface assisted
tion techniques in reconfigurable intelligent surface aided networks,” multiuser MIMO,” IEEE J. Sel. Areas Commun., vol. 38, no. 11,
2021, arXiv:2106.15458. pp. 2621–2636, Nov. 2020.
[262] Y. Sun, P. Babu, and D. P. Palomar, “Majorization-minimization algo- [283] C. Hu, L. Dai, S. Han, and X. Wang, “Two-timescale channel esti-
rithms in signal processing, communications, and machine learning,” mation for reconfigurable intelligent surface aided wireless commu-
IEEE Trans. Signal Process., vol. 65, no. 3, pp. 794–816, Feb. 2017. nications,” IEEE Trans. Commun., early access, Apr. 12, 2021, doi:
[263] B. R. Marks and G. P. Wright, “A general inner approximation 10.1109/TCOMM.2021.3072729.
algorithm for nonconvex mathematical programs,” Oper. Res., vol. 26, [284] C. Liu, X. Liu, D. W. K. Ng, and J. Yuan, “Deep residual learning
no. 4, pp. 681–683, 1978. for channel estimation in intelligent reflecting surface-assisted multi-
[264] Z.-Q. Luo, W.-K. Ma, A. M.-C. So, Y. Ye, and S. Zhang, “Semidefinite user communications,” IEEE Trans. Wireless Commun., early access,
relaxation of quadratic optimization problems,” IEEE Signal Process. Aug. 3, 2021, doi: 10.1109/TWC.2021.3100148.
Mag., vol. 27, no. 3, pp. 20–34, May 2010. [285] Z.-Q. He and X. Yuan, “Cascaded channel estimation for large intel-
[265] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, ligent metasurface assisted massive MIMO,” IEEE Wireless Commun.
“Learning to optimize: Training deep neural networks for interfer- Lett., vol. 9, no. 2, pp. 210–214, Feb. 2020.
ence management,” IEEE Trans. Signal Process., vol. 66, no. 20, [286] Y. Ma, Y. Shen, X. Yu, J. Zhang, S. Song, and K. B. Letaief, “Neural
pp. 5438–5453, Oct. 2018. calibration for scalable beamforming in FDD massive MIMO with
[266] Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iteratively weighted implicit channel estimation,” 2021. arXiv:2108.01529.
MMSE approach to distributed sum-utility maximization for a MIMO [287] I. Afolabi, T. Taleb, K. Samdanis, A. Ksentini, and H. Flinck, “Network
interfering broadcast channel,” IEEE Trans. Signal Process., vol. 59, slicing and softwarization: A survey on principles, enabling technolo-
no. 9, pp. 4331–4340, Sep. 2011. gies, and solutions,” IEEE Commun. Surveys Tuts., vol. 20, no. 3,
[267] A. Chowdhury, G. Verma, C. Rao, A. Swami, and S. Segarra, “Unfold- pp. 2429–2453, 3rd Quart., 2018.
ing WMMSE using graph neural networks for efficient power alloca- [288] E. Björnson, E. A. Jorswieck, M. Debbah, and B. Ottersten, “Multiob-
tion,” IEEE Trans. Wireless Commun., vol. 20, no. 9, pp. 6004–6017, jective signal processing optimization: The way to balance conflicting
Sep. 2021. metrics in 5G systems,” IEEE Signal Process. Mag., vol. 31, no. 6,
[268] Q. Hu, Y. Liu, Y. Cai, G. Yu, and Z. Ding, “Joint deep reinforcement pp. 14–23, Nov. 2014.
learning and unfolding: Beam selection and precoding for mmWave [289] Q. Mao, F. Hu, and Q. Hao, “Deep learning for intelligent wireless
multiuser MIMO with lens arrays,” IEEE J. Sel. Areas Commun., networks: A comprehensive survey,” IEEE Commun. Surveys Tuts.,
vol. 39, no. 8, pp. 2289–2304, Aug. 2021. vol. 20, no. 4, pp. 2595–2621, 4th Quart., 2018.
[269] Q. Hu, Y. Cai, Q. Shi, K. Xu, G. Yu, and Z. Ding, “Iterative [290] N. C. Luong et al., “Applications of deep reinforcement learning in
algorithm induced deep-unfolding neural networks: Precoding design communications and networking: A survey,” IEEE Commun. Surveys
for multiuser MIMO systems,” IEEE Trans. Wireless Commun., vol. 20, Tuts., vol. 21, no. 4, pp. 3133–3174, 4th Quart., 2019.
no. 2, pp. 1394–1410, Feb. 2021. [291] Privacy Protection—Privacy Guidelines for Smart Cities,
[270] M. Eisen and A. Ribeiro, “Optimal wireless resource allocation with Standard ISO/IEC TS 27570:2021, 2021. [Online]. Available:
random edge graph neural networks,” IEEE Trans. Signal Process., https://www.iso27001security.com/html/27570.html
vol. 68, pp. 2977–2991, 2020. [292] Guidelines for Security and Privacy in Internet of Things,
[271] M. M. Wadu, S. Samarakoon, and M. Bennis, “Joint client scheduling Standard ISO/IEC 27030, 2021. [Online]. Available:
and resource allocation under channel uncertainty in federated learn- https://www.iso27001security.com/html/27030.html
ing,” IEEE Trans. Commun., vol. 69, no. 9, pp. 5962–5974, Sep. 2021. [293] Cyber Security for Consumer Internet of Things: Baseline
[272] Y. Shi, J. Zhang, and K. B. Letaief, “Optimal stochastic coordinated Requirements, Standard ETSI EN 303 645, 2020. [Online].
beamforming for wireless cooperative networks with CSI uncertainty,” Available: https://www.etsi.org/newsroom/press-releases/1789-2020-
IEEE Trans. Signal Process., vol. 63, no. 4, pp. 960–973, Feb. 2015. 06-etsi-releases-world-leading-consumer-iot-security-standard
[273] P. M. Esfahani and D. Kuhn, “Data-driven distributionally robust [294] Technical Specification Group Services and System Aspects; Sys-
optimization using the Wasserstein metric: Performance guarantees tem Architecture for the 5G System; Stage 2 (Release 15),
and tractable reformulations,” Math. Program., vol. 171, no. 1, document 3GPP TS 23.501, Version 15.1.0, 3rd Generation Partner-
pp. 115–166, 2018. ship Project, Mar. 2018.
[295] S. Haykin, An Introduction to Analog and Digital Communication. [321] L. Ruan, M. P. I. Dias, and E. Wong, “Achieving low-latency human-
Hoboken, NJ, USA: Wiley, 1994. to-machine (H2M) applications: An understanding of H2M traffic for
[296] Mosek. [Online]. Available: https://www.mosek.com/ AI-facilitated bandwidth allocation,” IEEE Internet Things J., vol. 8,
[297] Gurobi. [Online]. Available: https://www.gurobi.com/ no. 1, pp. 626–635, Jan. 2021.
[298] A. Mirhoseini et al., “A graph placement methodology for fast chip [322] N. Promwongsa et al., “A comprehensive survey of the tactile internet:
design,” Nature, vol. 594, no. 7862, pp. 207–212, 2021. State-of-the-art and research directions,” IEEE Commun. Surveys Tuts.,
[299] C.-W. Qiu, T. Zhang, G. Hu, and Y. Kivshar, “Quo vadis, metasur- vol. 23, no. 1, pp. 472–523, 1st Quart., 2021.
faces?” Nano Lett., vol. 21, no. 13, pp. 5461–5474, Jun. 2021. [323] Y. Xiao and M. Krunz, “Distributed optimization for energy-efficient
[300] S. Samarakoon, M. Bennis, W. Saad, and M. Debbah, “Distributed fog computing in the tactile internet,” IEEE J. Sel. Areas Commun.,
federated learning for ultra-reliable low-latency vehicular commu- vol. 36, no. 11, pp. 2390–2400, Nov. 2018.
[324] N. Ye, X. Li, H. Yu, A. Wang, W. Liu, and X. Hou, “Deep learning
nications,” IEEE Trans. Commun., vol. 68, no. 2, pp. 1146–1159,
aided grant-free NOMA toward reliable low-latency access in tactile
Feb. 2020.
Internet of Things,” IEEE Trans. Ind. Informat., vol. 15, no. 5,
[301] Y. Hu, M. Chen, W. Saad, H. V. Poor, and S. Cui, “Distributed
pp. 2995–3005, May 2019.
multi-agent meta learning for trajectory design in wireless drone net-
works,” IEEE J. Sel. Areas Commun., vol. 39, no. 10, pp. 3177–3192,
Oct. 2021.
[302] S. Liu, L. Liu, J. Tang, B. Yu, Y. Wang, and W. Shi, “Edge computing
for autonomous driving: Opportunities and challenges,” Proc. IEEE,
vol. 107, no. 8, pp. 1697–1716, Aug. 2019.
[303] D. Scaramuzza and F. Fraundorfer, “Visual odometry [tutorial],” IEEE
Robot. Autom. Mag., vol. 18, no. 4, pp. 80–92, Dec. 2011.
[304] C. Cadena et al., “Past, present, and future of simultaneous localization
and mapping: Toward the robust-perception age,” IEEE Trans. Robot.,
vol. 32, no. 6, pp. 1309–1332, Dec. 2016. Khaled B. Letaief (Fellow, IEEE) received the
[305] G. Bresson, Z. Alsayed, L. Yu, and S. Glaser, “Simultaneous localiza- B.S. (Hons.), M.S., and Ph.D. degrees in electrical
tion and mapping: A survey of current trends in autonomous driving,” engineering from Purdue University, West Lafayette,
IEEE Trans. Intell. Veh., vol. 2, no. 3, pp. 194–220, Sep. 2017. IN, USA, in December 1984, August 1986, and May
[306] C. E. Shannon and W. Weaver, The Mathematical Theory of Commu- 1990, respectively.
nication, vol. 96. Urbana, IL, USA: Univ. Illinois Press, 1949. From 1990 to 1993, he was a Faculty Member
with the University of Melbourne, Australia. Since
[307] E. Sisinni, A. Saifullah, S. Han, U. Jennehag, and M. Gidlund,
1993, he has been with The Hong Kong University
“Industrial Internet of Things: Challenges, opportunities, and direc-
of Science and Technology (HKUST), where he is
tions,” IEEE Trans. Ind. Informat., vol. 14, no. 11, pp. 4724–4734,
currently a New Bright Professor of engineering.
Nov. 2018.
While at HKUST, he has held many administrative
[308] F. Tao, H. Zhang, A. Liu, and A. Y. C. Nee, “Digital twin in
positions, including the Dean of Engineering, the Head of the Electronic and
industry: State-of-the-art,” IEEE Trans. Ind. Informat., vol. 15, no. 4,
Computer Engineering Department, the Director of the Wireless IC Design
pp. 2405–2415, Apr. 2019.
Center and the Hong Kong Telecom Institute of Information Technology,
[309] G. N. Schroeder, C. Steinmetz, R. N. Rodrigues, R. V. B. Henriques, and the Founding Director of Huawei Innovation Laboratory. He served as a
A. Rettberg, and C. E. Pereira, “A methodology for digital twin Consultant for different organizations, including Huawei, ASTRI, ZTE, Nor-
modeling and deployment for industry 4.0,” Proc. IEEE, vol. 109, no. 4, tel, PricewaterhouseCoopers, and Motorola. He is currently an Internationally
pp. 556–567, Apr. 2021. Recognized Leader in wireless communications and networks with research
[310] Y. Lu, X. Huang, Y. Dai, S. Maharjan, and Y. Zhang, “Blockchain interest in artificial intelligence, big data analytics systems, mobile cloud and
and federated learning for privacy-preserved data sharing in industrial edge computing, tactile internet, and 5G systems and beyond. In these areas,
IoT,” IEEE Trans. Ind. Informat., vol. 16, no. 6, pp. 4177–4186, he has over 720 papers along with 15 patents, including 11 U.S. inventions.
Jun. 2020. Dr. Letaief is a member of the United States National Academy of
[311] Z. Obermeyer and E. J. Emanuel, “Predicting the future—Big data, Engineering and the Hong Kong Academy of Engineering Sciences, and a
machine learning, and clinical medicine,” New England J. Med., fellow of the Hong Kong Institution of Engineers. He is well recognized
vol. 375, no. 13, p. 1216, 2016. for his dedicated service to professional societies and IEEE, where he has
[312] S. K. Zhou et al., “A review of deep learning in medical imaging: served in many leadership positions. These include the IEEE Communications
Imaging traits, technology trends, case studies with progress highlights, Society Vice President for conferences, an Elected Member of the IEEE
and future promises,” Proc. IEEE, vol. 109, no. 5, pp. 820–838, Product Services and Publications Board, and the IEEE Communications
May 2021. Society Vice President for technical activities. He also served as the President
[313] Y. Sun, F. Lo, and B. Lo, “Security and privacy for the internet of for the IEEE Communications Society for the period 2018–19, the world’s
medical things enabled healthcare systems: A survey,” IEEE Access, leading organization for communications professionals with headquarter in
vol. 7, pp. 183339–183355, 2019. New York City and members in 162 countries. From 2022 to 2023, he will
[314] M. Subramanian et al., “Precision medicine in the era of artificial serve as member for the IEEE Board of Directors. He is also recognized
intelligence: Implications in chronic disease management,” J. Transl. by Thomson Reuters as an ISI Highly Cited Researcher and was listed
Med., vol. 18, no. 1, pp. 1–12, Dec. 2020. among the 2020 top 30 of AI 2000 Internet of Things Most Influential
[315] G. A. Kaissis, M. R. Makowski, D. Rückert, and R. F. Braren, Scholars. He was a recipient of many distinguished awards and honors,
“Secure, privacy-preserving and federated machine learning in medical including the 2021 IEEE Communications Society Best Survey Paper Award,
imaging,” Nature Mach. Intell., vol. 2, no. 6, pp. 305–311, Jun. 2020. the 2019 Distinguished Research Excellence Award by the HKUST School of
[316] P. Vepakomma, O. Gupta, T. Swedish, and R. Raskar, “Split learning Engineering (Highest Research Award and only one recipient/three years is
for health: Distributed deep learning without sharing raw patient data,” honored for his/her contributions), the 2019 IEEE Communications Society
2018, arXiv:1812.00564. and Information Theory Society Joint Paper Award, the 2018 IEEE Signal
[317] O. Gottesman et al., “Guidelines for reinforcement learning in health- Processing Society Young Author Best Paper Award, the 2017 IEEE Cognitive
care,” Nature Med., vol. 25, no. 1, pp. 16–18, Jan. 2019. Networks Technical Committee Publication Award, the 2016 IEEE Signal
[318] N. Fazeli, M. Oller, J. Wu, Z. Wu, J. B. Tenenbaum, and A. Rodriguez, Processing Society Young Author Best Paper Award, the 2016 IEEE Marconi
“See, feel, act: Hierarchical learning for complex manipulation skills Prize Paper Award in Wireless Communications, the 2011 IEEE Wireless
with multisensory fusion,” Sci. Robot., vol. 4, no. 26, pp. 1–22, Communications Technical Committee Recognition Award, the 2011 IEEE
Jan. 2019. Communications Society Harold Sobol Award, the 2010 Purdue University
[319] S. Sundaram, P. Kellnhofer, Y. Li, J.-Y. Zhu, A. Torralba, and Outstanding Electrical and Computer Engineer Award, the 2009 IEEE Marconi
W. Matusik, “Learning the signatures of the human grasp using a Prize Award in Wireless Communications, the 2007 IEEE Communications
scalable tactile glove,” Nature, vol. 569, pp. 698–702, May 2019. Society Joseph LoCicero Publications Exemplary Award, and 19 IEEE best
[320] M. Simsek, A. Aijaz, M. Dohler, J. Sachs, and G. Fettweis, paper awards. He is the Founding Editor-in-Chief of the prestigious IEEE
“5G-enabled tactile internet,” IEEE J. Sel. Areas Commun., vol. 34, T RANSACTIONS ON W IRELESS C OMMUNICATIONS and been involved in
no. 3, pp. 460–473, Mar. 2016. organizing many flagship international conferences.
Yuanming Shi (Senior Member, IEEE) received the Jianhua Lu (Fellow, IEEE) received the B.S. and
B.S. degree in electronic engineering from Tsinghua M.S. degrees from Tsinghua University, Beijing,
University, Beijing, China, in 2011, and the Ph.D. China, in 1986 and 1989, respectively, and the
degree in electronic and computer engineering from Ph.D. degree in electrical and electronic engineering
The Hong Kong University of Science and Tech- from The Hong Kong University of Science and
nology (HKUST) in 2015. Since September 2015, Technology, Hong Kong, China, in 1998.
he has been with the School of Information Science Since 1989, he has been with the Department
and Technology, ShanghaiTech University, where he of Electronic Engineering, Tsinghua University,
is currently a Tenured Associate Professor. He vis- where he currently serves as a Professor. He has
ited the University of California, Berkeley, CA, authored/coauthored over 300 refereed technical
USA, from October 2016 to February 2017. His papers published in international renowned journals
research areas include optimization, statistics, machine learning, wireless and conferences and over 80 Chinese invention patents. His research interests
communications, and their applications to 6G, the IoT, and edge AI. He was include broadband wireless communications, multimedia signal processing,
a recipient of the 2016 IEEE Marconi Prize Paper Award in Wireless Com- and satellite communications.
munications, the 2016 Young Author Best Paper Award by the IEEE Signal Prof. Lu is a member of the Chinese Academy of Sciences. He is the
Processing Society, and the 2021 IEEE ComSoc Asia-Pacific Outstanding Vice President of the National Natural Science Foundation of China. He was
Young Researcher Award. He is also an Editor of IEEE T RANSACTIONS ON a recipient of the Best Paper Awards at the IEEE ICCCS 2002, China
W IRELESS C OMMUNICATIONS and IEEE J OURNAL ON S ELECTED A REAS Communications 2006, IEEE Embedded-Com 2012, IEEE WCSP 2015,
IN C OMMUNICATIONS. IEEE IWCMC 2017, and IEEE ICNC 2019. He served as the program
committee co-chair and a TPC member for many international conferences,
and an Editor for IEEE T RANSACTIONS ON W IRELESS C OMMUNICATIONS
from 2008 to 2011. He is the Editor-in-Chief of China Communications.
Jianmin Lu joined Huawei Technologies in 1999.

He is currently the Executive Director of Huawei
Wireless Technology Laboratory. During the last
two decades, he conducted various researches on
wireless communications especially on physic layer
and MAC layer and developed 3G, 4G, and
5G products. He received more than 50 patents
during the research. He was deeply involved
in 3GPP2 (EVDO/UMB), WiMAX/802.16m, and
3GPP(LTE/NR) standardization and contributed sev-
eral key technologies, such as flexible radio frame
structure, radio resource management, and MIMO. His current research
interest is in the area of signal processing, protocol, and networking for the
next generation wireless communication.

Edge Artificial Intelligence For 6G Vision Enabling Technologies and Applications

Uploaded by

Copyright:

Available Formats

Edge Artificial Intelligence For 6G Vision Enabling Technologies and Applications

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Edge Artificial Intelligence For 6G Vision Enabling Technologies and Applications

Uploaded by

Copyright:

Available Formats

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 40, NO.

Edge Artificial Intelligence for 6G: Vision,

Fig. 1. Towards 6G: the evolution of use cases from 5G to 6G.

Fig. 2. Roadmap to edge AI.

Fig. 4. Edge learning models and architectures.

proposes to only communicate the informative elements of the

Fig. 6. Enabling wireless techniques for edge training.

Fig. 7. Communication-efficient edge inference systems.

models [180]. Fortunately, under the high-dimensional sta-

Jianmin Lu joined Huawei Technologies in 1999.

You might also like