Machine and Deep Learning For Resource Allocation in Multi-Access Edge Computing - A Survey

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/362767506
Machine and Deep Learning for Resource Allocation in Multi-Access Edge

Computing: A Survey
Article in IEEE Communications Surveys & Tutorials · August 2022

DOI: 10.1109/COMST.2022.3199544
CITATIONS READS
0 103
4 authors, including:
Hamza Djigal Jia Xu

Hohai University Nanjing University of Posts and Telecommunications
6 PUBLICATIONS 51 CITATIONS 84 PUBLICATIONS 625 CITATIONS
SEE PROFILE SEE PROFILE
Linfeng Liu
Nanjing University of Posts and Telecommunications
106 PUBLICATIONS 574 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
BlockChain View project
Incentive mechanism design for mobile crowdsensing View project
All content following this page was uploaded by Jia Xu on 19 August 2022.
The user has requested enhancement of the downloaded file.

This article has been accepted for publication in IEEE Communications Surveys & Tutorials. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/COMST.2022.3199544
DJIGAL ET AL.: MACHINE AND DEEP LEARNING FOR RESOURCE ALLOCATION IN MULTI-ACCESS EDGE COMPUTING: A SURVEY 1
Machine and Deep Learning for Resource

Allocation in Multi-Access Edge Computing: A
Survey
Hamza Djigal, Jia Xu, Senior Member, IEEE, Linfeng Liu, Member, IEEE, and Yan Zhang, Fellow, IEEE
Abstract—With the rapid development of Internet-of-Things to increase to 25.2 billion by 2025, and it is estimated that
(IoT) devices and mobile communication technologies, Multi- 3.1 billion of the connected IoT devices will use cellular
access Edge Computing (MEC) has emerged as a promising technology, establishing business opportunities for enterprises
paradigm to extend cloud computing and storage capabilities
to the edge of cellular networks, near to IoT devices. MEC and mobile operators [2]. Also, the 5G networks bring a
enables IoT devices with limited battery capacity and compu- range of benefits to the end-users and IoT devices, including
tation/storage capabilities to execute their computation-intensive 5G’s ultra-reliability (99.999%), very low latency (below 5ms),
and latency-sensitive applications at the edge of the networks. high bandwidth (10 Gbps), and the ability to support 1000
However, to efficiently execute these applications in MEC systems, times higher data volumes [3]. The 5G networks will highly
each task must be properly offloaded and scheduled onto the
MEC servers. Additionally, the MEC servers may intelligently improve user’s QoS and QoE, and facilitate the demand of
balance and share their computing resources to satisfy the smart applications (e.g., smart cars, smart energy grids, smart
application QoS and QoE. Therefore, effective resource allocation houses, etc.) [2].
(RA) mechanisms in MEC are vital for ensuring its foreseen Due to their limited resources (computation, storage, etc.)
advantages. Recently, Machine Learning (ML) and Deep Learn- capacities and finite battery capacity, the IoT devices are not
ing (DL) have emerged as key methods for many challenging
aspects of MEC. Particularly, ML and DL play a crucial role in suitable for supporting latency or delay-intensive applications
addressing the challenges of RA in MEC. This paper presents or IoT applications providing 5G services such as online
a comprehensive survey of ML/DL-based RA mechanisms in gaming and video services [4]. To overcome this challenge,
MEC. We first present tutorials that demonstrate the advantages the concept of mobile cloud computing (MCC) is introduced
of applying ML and DL in MEC. Then, we present enabling to enable the IoT devices to offload their computation-sensitive
technologies for quickly running ML/DL training and inference
in MEC. Afterward, we provide an in-depth survey of recent applications to powerful centralized remote clouds which are
works that used ML/DL methods for RA in MEC from three accessible via the Internet or a core network (CN) of a mobile
aspects: (1) ML/DL-based methods for task offloading; (2) operator [5]. However, MCC also incurs high latency because
ML/DL-based methods for task scheduling; and (3) ML/DL- data is offloaded to remote cloud servers that are located
based methods for joint resource allocation. Finally, we discuss far away from the IoT devices. To address this issue, the
key challenges and future research directions of applying ML/DL
for resource allocation in MEC networks. emerged Mobile Edge Computing paradigm has been intro-
duced by ETSI ISG [6] to move the cloud computation and
Index Terms—Multi-access edge computing, resource alloca-
storage capabilities closer to the end-users. In September 2017,
tion, task offloading, task scheduling, machine learning, deep
learning, IoT applications. ETSI ISG officially renamed it Multi-access Edge Computing
(MEC) to better reflect that the edge is not only based on
mobile networks but can also refer to various networks such
I. I NTRODUCTION
as WiFi and fixed access technologies [7], [8], [3]. MEC
W ITH the recent progress in information and mobile

communication technologies, such as fifth-generation
mobile networks (5G), the quality of service (QoS), and the
comprises edge servers located at the edge of the network
and implemented either at the following access points: base
stations (BSs), radio access networks (RANs) for LTE/5G, hot
conjectures towards fantastic Quality of Experience (QoE) are spots, data centers (DCs), routers, switches, and wiFi access
widely increasing. Tens of billions of resource-limited wireless points (WAP).
smart devices, such as mobile user equipment (UEs), sensors, The basic principle of MEC is to bring the cloud computa-
and wearable devices can connect to the Internet through tion and storage capabilities closer to the edge of the networks.
5G networks [1]. The number of IoT devices is expected Hence, MEC can provide ultra-low latency and reliability
This work was supported in part by the National Natural Science Foundation compared to the MCC. In addition, due to its ultra-low latency
of China under grants 61872193, 61872191 and 62072254, and in part by and high bandwidth characteristics, MEC brings new business
the National Foreign Expert Program of China under grant QN2022014001. opportunities to the major stakeholders including mobile oper-
(Corresponding author: Jia Xu)
Hamza Djigal, Jia Xu and Linfeng liu are with the Jiangsu Key Laboratory ators, application developers, telecom equipment and software
of Big Data Security and Intelligent Processing, Nanjing University of vendors, IT platform and technology vendors [9]. For instance,
Posts and Telecommunications, Nanjing, Jiangsu 210023, China. (e-mail: a mobile operator can maximize its revenue by offering open
djigalshamza@gmail.com, {xujia, liulf}@njupt.edu.cn).
Yan Zhang is with Department of Informatics, University of Oslo, 0316 access of the MEC platforms through Application Program-
Oslo, Norway (e-mail:yanzhang@ieee.org). ming Interfaces (APIs) to service providers and suggest usage-
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Nanjing Univ of Post & Telecommunications. Downloaded on August 19,2022 at 04:57:27 UTC from IEEE Xplore. Restrictions apply.
predict unknown QoS or QoE requirements by learning from

Deep
Learning historical data.

Intelligent
Cloud
Layer
Allocate to Cloud Cloud
Fig. 1 illustrates a ML/DL-enabled intelligent framework for
Server Scheduler
resource allocation in MEC. In the intelligent device layer, a
INTERNET machine learning algorithm is embedded in each intelligent
device (e.g., mobile device), which can make offloading deci-
Intelligent
MEC
Machine Server Decisions
Allocate to MEC MEC
sions. In the intelligent MEC layer, an ML algorithm selects

Learning Server Scheduler
Layer
the suitable computation resources (server decisions) which
Deep
Task queue Learning

can execute the offloaded tasks. If the offloaded tasks are
allocated to an MEC server, a DL-based scheduler embedded
Offload to External Server
in the MEC server makes the scheduling decision (i.e., task
Mobile
Intelligent
Device prioritization and assignment). Generally, the DL training

Device
Layer Machine
process is done in the intelligent cloud layer because it requires

Offloading Learning
Decisions powerful computing resources [12]. Additionally, the ML/DL-
Application task Graph Run Locally based offloading and scheduling decisions depend not only on
the data size and computation resource capabilities but also
Fig. 1: The framework of ML/DL-enabled intelligent resource on the ML/DL model to be executed.
allocation in MEC
A. Existing Surveys on Resource Allocation

based charging for using resources (e.g., computation, storage, In this section, we first present previous surveys on task
and bandwidth) [3]. Another advantage of MEC is that it offloading in MEC. Then, we discuss related surveys on task
enables IoT devices with limited battery capacity and limited scheduling in various distributed systems, such as grid comput-
computation/storage capabilities to execute their computation- ing, cloud computing, fog computing, and MEC. Additionally,
intensive and latency-intensive applications at the edge of the we classify the related surveys on traditional techniques for
networks. However, to efficiently execute these applications resource allocation (Table I). Finally, we compare the existing
in MEC systems, each task must be properly offloaded and surveys with our survey in terms of different aspects as shown
scheduled onto the edge/cloud servers. Moreover, multiple in Table II.
edge servers can collaboratively offload their computation- 1) Existing Surveys on Task Offloading: In [13], the authors
intensive tasks to each other through a backhaul network present a comprehensive survey on data offloading techniques
to offer better services for the end-users by balancing and in cellular networks. They classify the existing techniques
sharing their computation resources [4]. For example, a nearby into two main categories, namely delayed offloading and
edge server can decide to offload the computation task of a non-delayed offloading according to the delay that the data
connected mobile device to another edge server if it does may tolerate. In delayed offloading, packet reception may
not have the required computing resources to process the be purposely delayed up to a certain time to achieve more
task within its deadline. Hence, efficient resource allocation beneficial delivery conditions. In non-delayed offloading, no
mechanisms in MEC are vital for its foreseen advantages. The additional delay is added to packet reception except the delay
main objective of this paper is to provide an in-depth survey caused by the packet processing. Since there is no extra delay
of recent works that used ML and DL methods to address in no-delayed offloading, the QoS requirements are preserved.
the resource allocation problem in MEC, which we divide The authors of [5] address the computation offloading decision
into three sub-problems: (1) task offloading problem, (2) task problem in MEC. They classify the offloading decision ap-
scheduling problem, and (3) joint resource allocation problem. proaches into full offloading and partial offloading. The main
Traditionally, the resource allocation problem is solved idea of the full offloading approach is to offload the whole
using optimization methods such as heuristic or meta-heuristic, computation task to the MEC servers. In partial offloading,
which are not globally optimal [10]. Also, these traditional a part of the computation task is processed locally by the
solutions are not suitable for delay-sensitive and data-intensive device while the rest is processed by the MEC servers. In
applications since they are computationally expensive. More- [20], the authors present a survey on opportunistic offloading
over, the time complexity of these traditional approaches pro- and classify them into two main categories: traffic offloading
portionally increases with the increase in the number of tasks and computation offloading. In traffic offloading, the mobile
and network size. To overcome the drawbacks of traditional devices download contents via the cellular network and send
methods, some recent studies have begun to investigate ML them to other nodes via opportunistic communication. In com-
and DL methods to solve the resource allocation problem putation offloading, the computationally intensive tasks of the
in MEC. For instance, Huang et al. [11] proposed a DL- mobile device with limited computing resources are offloaded
based task offloading and bandwidth allocation mechanism in through opportunistic communication to other mobile devices
MEC, which minimizes the overall offloading cost. Moreover, nearby which have enough computing capacity to process the
ML and DL techniques could be vital for optimizing the tasks. In [24], the authors present a survey and taxonomy
resource allocation process while meeting the QoS and QoE on task offloading in edge-cloud environments. They analyze
requirements of the application because they can intelligently the task offloading solutions from five aspects: task types,
TABLE I: Summary of Existing Surveys on Conventional Techniques for Resource Allocation in Cloud/Edge/Fog
Ref. Traditional Techniques Resource Allocation (RA) Aspects Focus of Discussion
Heuristic Meta-Heuristic Game Theory Offloading Scheduling Joint RA
2015, [13] D D Data offloading techniques in cellular networks.
2016, [14] D Taxonomy for classifying 109 scheduling problems and solu-
tions in DS.
2017, [15] D Taxonomy of task allocation with temporal and ordering
constraints
2017, [10] D D D Taxonomy and survey on scheduling algorithms in IaaS cloud
2017, [5] D D Survey on architecture and computation offloading in MEC
2017, [16] D D Survey of economic and pricing models for resource manage-
ment in cloud.
2018, [17] D D Survey of cluster frameworks and scheduling strategies in data
center networks
2018, [18] D D D Survey of task scheduling methods in desktop grid computing
systems
2018, [19] D Taxonomy of the scheduling problem in cloud.
2018, [20] D Survey of opportunistic offloading: traffic and computation
offloading.
2019, [21] D D D Systematic review and taxonomy of scheduling techniques in
cloud.
2019, [22] D D Survey of meta-heuristic scheduling techniques in cloud.
2019, [23] D D D Taxonomy and survey on scheduling techniques in fog-cloud
2020, [24] D D D D Survey and Taxonomy on Task Offloading for Edge-Cloud
Computing
2020, [25] D Survey of computation offloading modeling in edge comput-
ing
2020, [26] D D Multiple workflows scheduling problems in multi-tenant dis-
tributed systems.
2020, [27] D D D Survey of smartphone perspective on computation offloading
2021, [28] D D Survey of task offloading in MEC
2021, [29] D D Survey of resource allocation in NFV
2021, [30] D D D Survey of collaborative task scheduling in edge computing
2021, [31] D D Resource allocation in heterogeneous 5G networks
TABLE II: Comparison between Existing Surveys and our Survey: Emerging Techniques for Resource Allocation in MEC
Ref. ML/DL- Enabling Taxonomy of ML/DL for RA In-depth Review of Works focused on ML/DL for RA Focus of Discussion
Enabled Techniques for
MEC ML/DL Tasks
in MEC
Focused on ML Focused on DL ML/DL-based Offloading ML/DL-based Scheduling ML/DL-based Joint RA
2019, [32] D D Discussed key factors that enables the im-
plementation of DL in mobile networking
applications
2019, [33] D D D Discussed techniques for quickly running
DL inference in MEC
2020, [34] D D D D Discussed federated learning for MEC opti-
mization
2020, [35] D D D Mainly discussed ML-based resource al-
location techniques for HetNets, MIMO,
D2D, and NOMA networks.
2020, [25] D Survey of computation offloading modeling
in edge computing
2020, [36] D D Survey of ML-based computation offloading
modeling in edge computing
2020, [37] D DL for 5G networks
2020, [38] D Survey of stochastic-based offloading mech-
anisms in edge/cloud
2020, [39] D D Investigated key techniques about the con-
vergence of DL and MEC, MEC for DL,
and DL for Edge
2020, [40] D D Survey on edge intelligence, mainly dis-
cusses caching, training/inference, and of-
floading methods in MEC.
2020, [41] D Survey on MEC for 5G and IoT. Mainly
discusses enabling technologies for MEC in
5G
2021, [42] D Survey on task offloading in edge and cloud
computing
2021, [43] D D D D Mainly focused on conventional techniques
for resource scheduling in MEC
Our survey D D D D D D D In-dept survey of ML/DL-based resource
allocation mechanisms in MEC (See Sec-
tion I-B for key contributions)
offloading schemes (i.e., full offloading and partial offloading), is an optimization problem, they classify the offloading mod-
objectives, device mobility, and multi-hop cooperation. The eling approaches from different perspectives, including non
authors of [25] provide a survey on computation offloading and convex optimization, Markov Decision Process (MDP),
modeling in edge computing. Since the offloading problem game theory, Lyapunov optimization, and machine learning. In
[36], the authors present a survey on ML-based computation for ML/DL tasks in MEC; taxonomy of ML/DL for resource
offloading in MEC systems and classify all the solutions into allocation, and in-depth review of works focused on ML/DL
reinforcement learning, supervised learning, and unsupervised for resource allocation.
learning approaches. In contrast to Shakarami et al. [36], Motivated by this, we propose an in-depth survey of
the authors of [38] provide a survey on stochastic-based ML/DL-based resource allocation methods in MEC. Particu-
offloading approaches in MCC, MEC and Fog computing larly, we survey recent works that used ML and DL techniques
systems. They propose a taxonomy to classify the approaches to address the task offloading, task scheduling, and joint
into three Markow models, namely, Markov chain, Markov resource allocation problems in MEC while considering the
process, and hidden Markov. In [39], the authors investigate other aspects shown in Table II.
the convergence of MEC and deep learning. They mainly
discuss enabling techniques for the integration of MEC and B. Contributions
DL, i.e., DL applications in MEC, DL training/inference in
In contrast to the existing surveys depicted in Table I and
MEC, MEC for DL services, and DL for optimizing MEC.
Table II, this survey focuses on ML and DL techniques for
2) Existing Surveys on Task Scheduling: In [44], the authors
resource allocation in MEC. The key contributions of this
address the task allocation and load balancing problems in
article are as follows:
distributed systems. They focus on five main aspects: control,
• We discuss the advantage of applying ML and DL
resource optimization, reliability, coordination strategy among
heterogeneous nodes, and network structure. The authors of for MEC (ML/DL-enabled MEC) by presenting three
[23] provide a survey on scheduling techniques in different use cases from three perspectives: end-users, service
cloud models including traditional, serverless, and Fog-cloud providers, and networking services.
• We discuss potential technologies for quickly running ML
environments. In [35], the authors investigate ML algorithms
for resource management in wireless IoT networks. They and DL tasks (i.e., training and inference) in MEC.
• We discuss potential ML and DL algorithms for resource
mainly survey machine learning techniques for emerging cel-
lular IoT networks such as MIMO, D2D communications, allocation in MEC, and their advantages and disadvan-
and NOMA networks. However, they did not address the tages are summarized.
• We conduct a comprehensive and in-depth survey of
task offloading problem which is vital for ensuring high
performance in IoT networks, especially in the presence of a recent works that used ML and DL methods to address
large number of computationally intensive tasks. The authors the resource allocation problem in MEC. Particularly, we
of [30] present a survey of collaborative task scheduling discuss and classify current ML and DL-based methods
problems in edge computing. They analyze the problem from for resource allocation from three aspects: task offloading,
four main perspectives: computing architectures (e.g., device- task scheduling, and joint resource allocation.
• We discuss lessons learned from the state-of-the-art ML
edge, device-edge-cloud, etc.), computation task models (e.g,
local execution, offloading types, etc.), optimization objec- and DL-based methods for resource allocation in MEC,
tives, and scheduling methods. In [43], the authors present which will help researchers to well-understand how and
a comprehensive survey of resource scheduling in edge com- when ML and DL-based methods outperform the tradi-
puting. They classify the existing works from three aspects tional techniques for resource allocation in MEC.
• We discuss key challenges and present future research
including computation offloading, resource allocation, and
resource provisioning. They also discuss different techniques directions of applying ML and DL for resource allocation
of resource scheduling such as heuristic, approximation, game in MEC.
theory, and machine learning. Compared to [30], the authors The rest of the paper is organized as follows. Fig. 2 shows
of [43] provide more details about the scheduling methods and the structure of the paper. Table III shows all the acronyms
the optimization objectives. However, they also don’t provide used in the paper. Section II presents an overview of MEC,
an in-depth review of ML/DL-based methods for resource ML/DL, and resource allocation. Section III discusses the
scheduling. advantages bring by ML and DL for MEC. Section IV presents
In summary, most of the existing surveys on resource enabling technologies for ML and DL tasks in MEC. Section
allocation either focused on the task offloading problem or V presents potential ML and DL techniques for resource
task scheduling problem, ignoring the joint task offloading and allocation. Section VI presents an in-depth survey of ML/DL-
scheduling problem. Also, as shown in Table I, the majority based methods for task offloading in MEC. In Section VII,
of the existing surveys focused on the traditional resource we thoroughly survey state-of-the-art ML and DL methods
allocation methods such as heuristics, meta-heuristics, and for task scheduling in MEC. Section VIII thoroughly reviews
game theory ignoring the emerging ML and DL techniques. recent works addressing the joint resource allocation problem
Moreover, the emerging ML and DL techniques that have been using ML/DL methods. In section IX, we discuss challenges
used to solve the resource allocation problem in MEC have and future research directions. Finally, Section X concludes
not been comprehensively discussed in the existing surveys. this survey.
To the best of our knowledge, there is no survey that thor-
oughly discussed the application of ML and DL for resource II. OVERVIEW
allocation in MEC while considering the other aspects shown In this section, we summarize the basics of MEC, ML/DL,
in Table II, i.e., the ML/DL-enabled MEC; enabling techniques and resource allocation.
Structure of
the paper
III. ML/DL ENABLED MEC: IV. ENABLING TECHNOLOGIES

I. Introduction II. Overview FOR ML/DL TASKS IN MEC
USE CASES

A. Existing Surveys on A. Multi-Access Edge A. Why Do We Need A. ML/DL Tasks on-Edge-
Resource Allocation Computing (MEC) ML/DL in MEC ? Devices
B. Machine Learning B. ML/DL-Enabled MEC: B. ML/DL Tasks on Edge

B. Contributions
and Deep Learning Use Cases Servers
C. ML/DL Tasks across

C. Resource Allocation Edge Devices, Edge and
Cloud Servers
D. Issues With
Traditional Resource D. Lessons Learned
Allocation
V. ML AND DL FOR RESOURCE VII. ML/DL-Based VIII. ML/DL-Based

VI. ML/DL-Based Methods
ALLOCATION IN MEC
Methods for Task Scheduling Methods for Joint Resource
for Task Offloading in MEC
in MEC
Allocation in MEC

A. Minimization of Energy
A. Motivation Example A. Minimization of A. Minimization of
Latency Execution Time Consumption
B. Machine Learning for B. Minimization of Energy B. Minimization of Energy B. Minimization of

Resource Allocation Consumption while Consumption while Execution Delay Under
Satisfying a QoS Metric Satisfying Response Time Energy Constraints
C. Deep Learning for
Resource Allocation C. Trade-Off Between C. Trade-off Between C. Minimization of
Execution Delay, Task Response Time and Latency
Drops, Queuing Delay, Resource Utilization Costs
Failure Penalty, and Cost
D. Privacy-Preserving
C. Minimization of
D. Trade-Off Between Communication Cost
Privacy, Execution Delay, E. Lessons Learned From

and Energy Consumption Joint Resource Allocation
D. Lessons Learned From
Task Scheduling in MEC F. Summary of ML/DL-
E. Lessons Learned From Based Resource Allocation
Task Offloading in MEC
in MEC
IX. Challenges and Future Research Directions
A. Trade-off Between D. Integration of E. Resource Allocation

B. Trade-off Between
Large-Scale Training C. Deep Learning Blockchain and ML/DL under Time-Varying
Convergence Rate and
Datasets and Models Caching for Resource Wireless Channel

Time Complexity
Computation Delay Allocation Conditions
I. Deep Learning
F. Considering
G. Resource
H. Resource Allocation Inference on J. Federated Deep
More Computing Allocation on Hybrid

in MEC for ML and DL Resource-Constrained Transfer Learning
Resources Architectures
Devices
X. Conclusion
Fig. 2: Structure of the survey.
A. Multi-Access Edge Computing (MEC) paradigm is introduced by ETSI ISG [6] to bring the cloud
Since the fundamental of MEC have been widely studied services closer to the data sources and the end-users. MEC is
in the literature [3], [5], [8], [45], [46], in this sub-section, characterized by ultra-low latency and high bandwidth.
we discuss key networking technologies for MEC realiza- 2) Network Function Virtualization (NFV): NFV is a new
tion, including Network Function Virtualization (NFV) and network concept that aims to manage networking functions
Software Defined Networking (SDN). Before we introduce by evolving virtualization technology [47]. It has been proved
these networking technologies, it is natural to answer to the that NFV has many benefits such as reducing the monetary
following question: Why Do We Need Multi-Access-Edge cost of hardware infrastructure, optimizing the quality of
Computing? service deployment, orchestrating many virtual networks, and
1) Why Do We Need Multi-Access-Edge Computing?: scaling of network services [8]. NFV allows network service
Traditionally, the huge volume of data (big data) are mainly providers and vendors to implement network functions in
stored and analyzed on powerful remote cloud data centers. software by evolving virtualization technologies rather than
However, today, with the increasing number of smart devices run on purpose-built hardware. NFV has also several use cases
(with limited resources and battery capacity) connecting to and applications including, traffic analysis [48] and security
the internet through 4G/5G networks, processing the data threats analysis [49] [50]. In NFV, services are implemented by
generated by the devices on the remote cloud will incur a sequence of Virtual Network Functions (VNFs) that can run
high latency. The reason is that cloud servers are located far on servers [29]. These VNFs are also called Service Function
away from the smart devices (i.e., data sources). Hence, MEC Chaining (SFC). Furthermore, NFV offers a more efficient
TABLE III: Acronyms Used in the Paper

Acronym Definition Acronym Definition Acronym Definition
A3C Asynchronous Advantage Actor–Critic FCM Fuzzy C-Means PPO Proximal Policy optimization
ABC Artificial Bee Colony FIFO First In, First Out PSO Particle Swarm Optimization
ACO Ant Colony Optimization FCFS First Come First Serve QL Q-Learning
AI Artificial Intelligence FF First Fit QoE Quality of Experience
ANN Neural Network GA Genetic Algorithm QoS Quality of Service
AE AutoEncoder Gbps Gigabit billions of bits per second RA Resource Allocation
API Application Programming Interface GPU Graphics Processing Unit RAE Relative Absolute Error
BS Base Station HCA Hierarchical Clustering Algorithm RAN Radio Access Network
CART Classification and Regression Tree ILP Integer Linear Programming RF Random Forest
CN Corps Network IoT Internet-of-Things RL Reinforcement Learning
CMDP Constrained Markov Decision Process JTOS Joint Task Offloading and Scheduling RNN Recurent Neural Network
CNN Convolutional Neural Networks KNN K-Nearest Neighbors RR Round Robin
CPU Central Processing Unit LP Linear RSU Road Side units
CRL Clustered Reinforcement Learning LPL Local Processing Policy SA Simulated Annealing
CRN Cognitive Radio Network LTE Long Term Evolution Seq2Seq Sequence-to-Sequence
CSI Channel State Information LSTM Long Short-Term Memory SL Supervised Learning
D3QN Dueling Double Deep Q-Network MAQL Multi-Agent Q-Learning SLA Service Level Agreement
D2D Device-to-Device MCC Mobile Cloud Computing SDN Software Defined Networking
DAG Direct Acyclic Graph MCTS Monte Carlo Tree Search SVM Support Vector Machine
DC Data Center MDP Markov Decision Process SOM Self organizing Map
DCS Distributed Computing System MEC Multi-access Edge Computing TD Temporal Difference
DE Differential Evolution MGM Markov Game Model TL Transfer Learning
DL Deep Learning MIMO Multiple Input Multiple Output UE User Equipment
DNN Deep Neural Network MINLP Mixed Integer Nonlinear Programming USL Unsupervised Learning
DRL Deep Reinforcement Learning ML Machine Learning V2I Vehicle-to-Infrastructure
DT Decision Tree MLP Multilayer Perceptron V2V Vehicle-to-Vehicle
DTL Deep Transfer Learning MRL Meta-Reinforcement Learning VM Virtual Machine
EFT Earliest Finished Time ms millisecond WAP Wifi Access Point
EL Ensemble Learning MTL Multi-task Transfer Learning
EPG Exact Potential Game NFV Network Functions Virtualization
EST Earliest Start Time NOMA Non-Orthogonal Multiple Access
ET Execution Time PDS Post Decision State
and scalable network resources allocation for network func- [56] which investigates the convergence of O-RAN with
tions, and therefore it can significantly reduce both operating MEC, SON, and network slicing in 5G networks; and [57]
expenses (OPEX) and capital expenses (CAPEX) of network which studies the pairing of cloud RAN and MEC. The
service providers [51]. authors of [51] present a comprehensible survey of NFV
Although NFV brings a range of benefits for both academia and its relationship with SDN. The works in [58], [59], [60]
and industry, the resource allocation in NFV, in particular, investigate the advantages of SDN and NFV in the ecosystem
in MEC brings new challenges. For instance, since in NFV, of MEC. LayBack [61] is an architecture that facilitates the
the network functions must be executed in a specific order, it communication and computation of resource sharing among
is crucial to investigate an efficient offloading and placement different networking technologies. Moreover, the authors of
mechanism of VNF. Also, the scheduling of SFC is another [62] provide a comprehensible survey of networking technolo-
challenge that needs to be considered in MEC. Towards this, gies for ultra-low latency (ULL) applications.
the authors of [52] propose a deep learning based approach
to address the SFC scheduling problem in MEC. In [53], the B. Machine Learning (ML) and Deep Learning (DL)
authors present a survey on hardware acceleration techniques There are many surveys that have already discussed the
for NFV. Moreover, the work in [29] provides a compressible basics of ML and DL techniques, such as [73], [63], [74],
survey of resource allocation problems in NFV. [75], [69], [70]. For this reason, in this section, we discuss
3) Software Defined Networking (SDN): SDN is another popular artificial neural networks, which are commonly used
networking technology for designing cost-effective and adapt- for training and inference in MEC. Table IV summarizes
able networks [54]. Many studies focus on the convergence existing surveys on ML/DL in MEC networks.
of SDN and NFV at the MEC network. For instance, Light- 1) Popular Neural Networks: Artificial neural networks
MANO [55] is a multi–access networking framework, which (ANNs) commonly called Neural Networks (NNs) are series
converges SDN and NFV into a single lightweight platform of mathematical functions that attempt to retrieve the desired
for the management and orchestration of network services output from an input dataset by mimicking the way biological
over distributed NFV systems. Other studies investigate the neurons operate. A neural network comprises layers of inter-
integration of networking technologies with MEC, such as connected nodes which contains an input layer, one or many
TABLE IV: Summary of Existing Surveys on ML/DL a series of convolutional layers, pooling layers, and fully
Ref. Focus of Discussion connected layers [77] (Fig. 4b). CNNs are commonly used in
2017, [63] Survey of ML techniques in cellular networks computer vision. A CNN learns to extract the features of the
2018, [64] DL for wireless networks. Mainly focused on the applications inputs (e.g., images or videos) to achieve a specific task, such
of DL for different network layers, and DL to enhance
network security as image classification and pattern recognition. Compared to
2018, [65] DL for IoT big data and streaming data analytics MLPS, CNN models extract the simple features from inputs by
2019, [66] Discussed DRL approaches in communications and network- executing mathematical operations (convolution operations), in
ing
particular, matrix multiplication. There are many CNN models
2020, [67] Integration of Blockain and ML in communications and
networking such as AlexNet [78], VGG-16 [79], GoogleNet [80], ResNet
2020, [68] Relationship between ML and privacy protection in 6G net- [81], SqueezeNet [82], and MobileNets [83]. More details
works about these CNN models can be found in [77], [84].
2020, [69] ML and DL for IoT security
c) Recurrent Neural Networks (RNNs): RNNs contain
2021, [70] Survey of DL and it applications
intralayer recurrent connections, which make them different
2021, [71] ML techniques for network optimization to meet the end-to-
end QoS and QoE from MLP and CNN models [85] (Fig. 4c). RNNs are mainly
2021, [72] ML techniques in the edge network. Mainly discusses models used for time-series of sequential input data to make predic-
compression techniques, hardware, and software stacks
tions about future outputs (e.g., sales forecasting).
2) ML/DL Training and Inference: In the context of ma-
output
Input
Layer
Hidden
Layer1
Hidden
Layer2 Layer chine learning, training is the process of learning an ANN or
Output
Input a DNN using dataset to achieve a specific AI task (e.g., image
or voice recognition). The training is performed by feeding the
Not Car ANN/DNN data, so that it can make a prediction about the
Feature Extraction + Classification type of data [86]. For example, suppose that we are training a
DNN to differentiate three different objects, such as a cup, car,
Fig. 3: Deep Learning. and bicycle as shown in Fig. 5a. The first step is to gather a
dataset that consists of thousands of images that contain cups,
cars, and bicycles. The second step is to feed the images to
hidden layers, and an output layer. Each node (or perception the DNN so that it can make a prediction about what is the
or artificial neuron) is associated with a weight and threshold. image. If the prediction is inaccurate, the DNN is updated
Neural networks are at the core of DL algorithms. A neural by correcting errors until obtaining more accurate predictions.
network has mainly two phases: training and inference, which The training process continues until the DNN makes prediction
will be discussed in the next sub-section. that satisfies the desired accuracy. Once the required accuracy
Deep Learning (DL) is a particular type of ML model is obtained, the DNN training is finished, and the trained model
based on ANNs with multiple non-linear processing layers is ready to be used to make inference (prediction).
to automatically extract complex representation from data and Inference consists of using a trained ANN/DNN model to
to design its high-level abstractions [66]. As shown in Fig. make prediction on novel data. ML/DL inference is accom-
3, the input data passes through multiple hidden layers which plished by feeding new input data to the neural network,
perform some operations (e.g., matrix multiplications). The allowing ANN/DNN to classify the input data. Considering
output of a layer is generally the input to the next layer. The the previous example (Fig. 5b), the DNN can be fed new
output of the final layer is either a feature or a classification input images of cups, cars, bicycles, and other images. After
output. In contrast to traditional ML algorithms which first fully trained a DNN, it is simplified (compressed) before
partition the feature extraction and classification, then solve it can be deployed to the resource-constrained device. This
them separately; deep learning models integrate the feature is due to the fact that fully trained DNN models require
extraction and classification, and the end-to-end approach is more computational resources in terms of storage, CPU/GPU,
adopted to solve the problem [76]. A DL model with many energy, and latency. On the other hand, compressing a DNN
hidden layers in sequences is called a deep neural network model will impact the model accuracy. In [87], the authors
(DNN). There are three main models of DNNs including present a method to minimize the inference time in embedded
Multilayer Perceptron (MLP), Convolutional Neural Networks devices while meeting the user requirement. They propose
(CNNs), and Recurrent Neural Networks (RNNs). an adaptive method to determine at the runtime, the best
a) Multilayer Perceptron (MLP): MLPs are type of DNN model to use for a given dataset. To achieve this goal,
feedforward neural networks, which comprise a series of the authors use an ML technique to automatically create a
feedforward fully connected layers (Fig. 4a). MLPs use a predictor that can quickly selects the optimum model to use.
backpropagation method (supervised learning) for training. Firstly, the predictor trains the model off-line, then determines
Backpropagation helps to adjust the weights of the percep- the optimum DNN by using the learned model of the input
trons to get the expected output or an output closer to the data. The proposed method is applied to the image classifica-
expected one. MLPs are generally applied for classification tion problem and evaluated on a Jetson TX2 embedded DL
and regression problems. system. The 50K images from the ImageNet ILSVRC 2012
b) Convolutional Neural Networks (CNNs): CNNs (or validation dataset is used to evaluate the proposed method. The
ConvNets) contain multiple feature extraction layers, including experimental results show that the proposed approach obtains
Hidden Layers
Hidden Layers
Input Input Output

Output Layer layer
Layer layer

Input
..
Recurrent Network
Convolution Pooling Fully-Connected
(a) Multilayer Perceptron (b) Convolutional Neural Network (c) Recurrent Neural Network
Fig. 4: Popular Neural Networks
a) DNN Training
task offloading mechanism that can minimize the virtual reality
(VR) devices’ computation load while satisfying the VR’s
Input Output
? QoE and resisting malicious attacks. To this end, the authors
introduce a blockchain to detect malicious attacks during
Error
Backwards tasks offloading and data processing, and use a reinforcement
learning algorithm to properly allocate resources based on the
b) DNN Inference
defined-QoE requirements. In the proposed mechanism, the
Forward
main information of each viewport providing offloading is
Input
stored by a transaction < T XNv , m >t−1 of a blockchain
controller (BC) implemented at the edge access point (EAP),
where T XNv denotes the transaction ID, m denotes the BC
Fig. 5: DNN Traning versus DNN Inference
ID, and t represents the time slot. As shown in Fig. 6, each
EAP m needs to carry out the task offloading and blockchain
7.52% improvement in inference accuracy and 1.8x reduction consensus in parallel at each time slot. There are mainly two
in inference time over the individual DNN models. phases for each EAP m to perform over time slot t, namely
the transaction generation and blockchain consensus. During
the transaction generation at time slot t, the BC m generates
C. Resource Allocation transactions < T XNv , m >t−1 according to the offloaded
In the literature, the resource allocation problem in records at time slot t − 1. Then, the number of transactions
Cloud/MEC is often referred to: (i) task offloading problem gathered by BC m at time slot t denoted by Tm (t) is calculated
for deciding whether, where, how much, and what should be by (1)
offloaded to the cloud/edge servers [5]; or (ii) resource pro-
visioning problem for determining the adequate computation Vm
X
resources (e.g., servers) that will be used to execute each task; Tm (t) = xm,v (t − 1), (1)
or (iii) task allocation problem [88] for ordering and mapping v=1
each task onto the best-suited computation resource. Also,
the term “task scheduling problem” is often used to refer to where xm,v (t − 1) is the offloading decisions, and Vm the
the combination of the sub-problems (ii) and (iii) [10]. The number of VR devices. Concerning the blockchain consen-
term “joint resource allocation” is often used to refer to the sus process, it comprises five phases including request, pre-
combination of task offloading and task scheduling problems prepare, prepare, commit, and reply as shown Fig. 6. This
for jointly offloading and scheduling the application’s tasks to consensus is based on the Byzantine fault tolerance (PBFT)
the best-suited edge/cloud servers [4]. Hence, we follow the protocol which is widely used in the literature.
same motif throughout the rest of this paper and discuss the The challenges of task offloading in MEC are as follows
resource allocation problem from three main aspects: (1) task [88], [27]:
offloading problem, (2) task scheduling problem, and (3) joint • Decision on task offloading: a key step regarding task
resource allocation problem. offloading is to decide whether to offload or not the
1) Task Offloading: Task offloading is the process of trans- computational tasks of IoT devices. This decision may
ferring computation-intensive tasks to a set of remote com- result in: i) local execution, that is, the task is processed
puting machines (e.g., cloud or edge servers) that can process locally by the IoT device due to the cost constraints
the tasks. An efficient task offloading strategy can significantly or MEC resources constraints; ii) full offloading which
reduce the latency and the total energy consumption of the IoT means that the whole task is offloaded and processed by
devices [1]. For instance, the authors of [89] propose a secure the MEC server, and iii) partial offloading, i.e., the task
Transaction Task Result

generation computation feedback tasks, the result from the edge servers is sent back to the edge
Blockchain
Request
concensus acquisition device. In this scenario, the scheduling length (makespan) is

BCM
equal to 65. The second scenario is the partial task offloading
Blockchain Controllers
.......
.......
and scheduling, where the offloading algorithm decides to

BCm
.......
execute a part of the VR tasks locally and the rest (e.g.,
computation-intensive tasks) is offloaded to the MEC servers

BC2
(see Fig. 7b). The makespan in the second scenario is 45
because only three tasks are executed on the edge servers.

BC1
Resquest Pre-prepare Prepare Commit Reply
4) Traditional Techniques for Resource Allocation in MEC:
The traditional resource allocation techniques can be cat-
egorized into approximation-based, heuristic-based, meta-
<block,m>t-2 <TXNv,m>t-1 <block,m>t-1 <TXNv,m>
heuristic-based, and game-theoretic-based. Approximation
Edge Access
Point m t-1
time slot t
t+1 methods find a quasi-optimal solution (within polynomial time
[94]) to NP-hard problems that are guaranteed to be close to
Fig. 6: Blockchain-enabled secure task offloading [89] the optimal solution [95]. In the context of resource allocation,
the methods for designing approximation algorithms include
greedy-based [96], [97], [98], local search-based [99], [100],
is divided into two parts, one part is executed by the IoT primal-dual-based [101], [102], and LP-rounding-based [103],
device and the rest is offloaded to the MEC server [5]. [104], [105].
• What to offload: It consists of determining the parts of the A heuristic technique is defined as “any approach to
IoT application that should be offloaded (i.e, offloadable problem-solving that employs a practical method that is not
parts) and the parts that should be executed locally (i.e., guaranteed to be optimal, perfect or rational, but it is never-
non-offloadable parts, e.g., user input or camera). theless sufficient for reaching an immediate short-term goal
• Where to offload: It consists of selecting the target com- [106]”. Heuristic methods are designed for specific problems,
puting infrastructure (e.g., to MEC server, cloud server, and they can find reasonably good solutions in an acceptable
or cloudlets) that will execute the offloaded tasks. time interface [10], [107]. In the context of resource allocation,
• How to offload: the answer to this question solves techni- heuristic techniques can be classified into three groups: list-
cal issues related to the task offloading mechanism. For based [108], [109], [110] [111], [112], [113]; clustering-based
example, a good task offloading scheme should satisfy [114], [115], [116], [117], [118], and duplication-based [119],
the QoS requirements of the IoT application. [120], [121]. Among the three heuristic methods, the list-based
• When to offload: It determines the appropriate time for approach is the simplest one with quadratic time complexity,
transferring the offloadable tasks from IoT devices to the i.e., O(t2 × p) for t tasks and p processors.
selected MEC servers. While the heuristic techniques are designed for specific
2) Task Scheduling: Task scheduling is the process of problems, the meta-heuristic approaches are designed for
assigning an application’s tasks to the computation resources general-purpose optimization problems [10]. In terms of total
and ordering their execution so that the dependencies between execution times of an application (or makespan), the meta-
them are maintained while meeting the required QoS [10]. heuristic techniques are better than heuristic-based due to their
Efficient task scheduling mechanisms are vital for maximizing ability to search in a larger space of solutions. However, com-
the QoS performance [90], maximizing the revenue earned pared to heuristic methods, the running time of meta-heuristic
by MEC service providers [91], and minimizing the energy algorithms increases rapidly when the number of tasks in
consumption and delay of the application [92]. Also, pro- the application increases [122]. Thus, meta-heuristic methods
cessing the high volume of data generated by billions of IoT are not suitable for large-scale IoT applications. The meta-
devices onto MEC servers requires appropriate task scheduling heuristic techniques can be categorized into genetic algorithm
mechanisms. Besides, these data may require different QoS (GA) [123], [124], [125], particle swarm optimization (PSO)
that a single MEC server cannot provide. Therefore, a proper [126], [127], [128], ant colony optimization (ACO) [129],
task scheduling strategy is vital to meet the required QoS. [130], [131], and simulated annealing (SA) [132], [133].
3) Use case of task offloading and scheduling in MEC: Game theory is a branch of applied mathematics that
To help readers understand the task offloading and scheduling studies interactive decision-making, where the outcome for
problem, we illustrate a use case from the perspective of the each player depends on the actions of all [134], [135], [136].
end-users, namely, virtual reality (VR) application which is Game theoretical approaches can be classified into two groups:
considered to be the most key application market in future the classical game, which assumes that all players are rational,
[93]. We first suppose a full task offloading and scheduling and the evolutionary game, which considers that a player may
scenario, where the offloading algorithm implemented in the play for his interest and has limited information about available
edge device decides to offload the whole VR application choices of strategies [137].
represented by a direct cyclic graph (or tasks graph) to the
edge server (see Fig. 7a). Then, the scheduling algorithm gives D. Issues With Traditional Resource Allocation Techniques
priority to each task and allocates the tasks to the available Due to the NP-hardness of the resource allocation problem,
edge servers based on their priority. After the execution of the the solutions obtained by the conventional methods are not
EDGE DEVICE MEC SERVERS

Virtual Reality
Fully Conneted
Application (VR)
edge servers
Tasks T1 and T2
S1
are allocated to the edge
T1 server S1
Mobile T1
device S1 S2 S3
OFFLOADING 1) Full
S2 S3 0
ALGORITHM T2 T3 T4 T1
T2 T3 T4
offloading
10
Tasks
3)
Tasks
T5 T2
T graph SCHEDULING Tasks
graph 5 20
ALGORITHM allocation T4
30
Task ordering
Server
Process
Idle
Locally 2) 40 T3
50
T5
Results
T5 T3 T4 T2 T1 60
Task priority Makespan = 60

4) Send back result to mobile device
(a) Full task offloading and scheduling
EDGE DEVICE MEC SERVERS

Virtual Reality
Fully Conneted
Application (VR)
edge servers
S1 Tasks T4 T3 and T5
Mobile T1 are processed by edge server S2

device S1 S2 S3
OFFLOADING 1)
Partial
S2 S3 0
T3 T4
T2 T3 T4 ALGORITHM
offloading
10
Tasks
3)
Tasks
T1 T5 T4
T graph SCHEDULING Tasks
graph 5 20
ALGORITHM allocation
T3
30
Task ordering
T2
Server
Process Idle
Locally 2) 40 T5
Tasks T1 and T2
are processed 50
locally
Results
T5 T3 T4 60
Task priority Makespan = 45

4) Send back result to mobile device
(b) Partial task offloading and scheduling
Fig. 7: Use case of task offloading and scheduling in MEC
globally optimal. They have difficulties in adapting to various videos). Since the data is generated at the network edge, it is
QoS requirements and dynamic environments. In general, the more beneficial to analyze them at the network edge. In this
traditional resource allocation techniques have the following context, ML/DL techniques are necessary due to their ability
main limitations: to efficiently analyze and quickly extract features from a huge
• Computationally expensive: The execution time of tradi- volume of data. Additionally, to efficiently execute and analyze
tional methods proportionally increases with the increase the generated data, it is crucial to properly allocate (offload
in the application size (i.e., number of tasks), which and schedule) them to the edge computational resources, which
leads to extra overheads in terms of computational time. can satisfy the data requirements (e.g., latency, privacy, QoE).
Therefore, they are unappropriated for delay-sensitive and Since ML/DL are key techniques for data prediction, they
data-intensive applications. Also, meta-heuristic methods can accurately predict both data requirements and the MEC
such as GA are time-consuming since they maintain large computing nodes which will process the data.
solutions in memory.
• Slow convergence: The traditional resource allocation B. ML/DL-Enabled MEC: Use Cases from three Perspectives
methods have a slow convergence rate since they cannot In this section, we present ML/DL use cases from three
learn from previous sub-optimal solutions. perspectives (see Fig. 8): 1) end-users, 2) service providers,
• Lack of adaptability: The solutions obtained through
and 3) networking.
traditional resource allocation methods are sensitive to the 1) End-User Perspectives: The execution of ML/DL tasks
environment changing. They supposed that the computing in MEC, in particular, on edge devices is beneficial to the
environment is static and known by mobile users. If a end-users. In general, by running ML/DL at the edge of
parameter about the computing environment (e.g., the the network, the user QoE is maximized since they can
wireless channel information) changes, the optimization predict user’s requirements. For instance, a DL approach called
problem would be reformulated to take into account the LiveDeep is proposed to predict the user’s viewport for live
new changed parameter to achieve the desired objective. virtual reality (VR) streaming [138]. Facebook presents a data-
Therefore, conventional resource allocation schemes are driven approach to enable ML inference on smartphones with
not adaptable in time-variant dynamic environments. the objective to increase the QoE and reduce the latency
[139]. Running ML/DL tasks on edge devices also enables
III. ML/DL E NABLED MEC: U SE C ASES
the end-user to rapidly analyze and obtain its health report.
A. Why Do We Need ML/DL in MEC ? For example, HealthFog [140] is a framework that implements
The growing numbers of IoT devices connected to the Inter- deep learning in edge devices for automatic heart disease
net have generated a massive amount of data (pictures, audios, analysis. HealthFog delivers healthcare as a service using IoT
End-Users
Networking
Perspectives
Perspectives
Network Slicing
AR/VR Image/Voice Fraud
recognition detection Resource

Content Allocation
caching
QoE
ML/DL-Enabled MEC
SDN/NFV Security
Video
Online
and
processing
Gaming Privacy
Healthcare Service Providers
Perspectives
Customers Profit
segmentation maximization
Data
Analyzing
Market
Prediction Data Collection
Fig. 8: ML/DL-Enabled MEC: Use Cases from three Perspectives
devices and efficiently executes user requests (i.e., the patient the associated base stations operated by the MEC service
heart information). Additionally, bringing ML/DL models to providers. The goal of the miners is to maximize their mining
the edge network enhances user privacy because the raw data rewards. To achieve these goals, a framework-based reinforce-
required for DL tasks is stored on the edge devices instead ment learning is proposed. The proposed framework enables
of the cloud. Moreover, bringing ML/DL techniques at the the service providers to maximize their long-term profit by
edge of the network enables rapid access to the huge volume dynamically adjusting the price per unit hash rate to a miner
of real-time data generated by the IoT devices for rapid AI while taking into account the highest price that the miner can
tasks, which in turn gives the devices the ability to respond to pay during the whole mining process. The framework also
real-time events in an intelligent manner. allows the miners to select the best response strategies, and
On top of this, ML/DL techniques enable end-users to therefore satisfy miners’ objectives.
intelligently offload their latency-sensitive applications (e.g., 3) Networking Perspectives: The third ML/DL use case is
online gaming and mobile augmented reality (MAR)) to other the one optimizing networking technologies for MEC such
devices or to the edge servers. Chakrabarti [141] proposes as network slicing [144], [145], [146]; NFV [147], [148];
a DRL-based mechanism to offload MAR application’ tasks SDN [149] [150]; mobility management [151], [152]; content
to the nearby devices. The authors of [142] propose a DRL- caching [153], [154]; resource allocation [155], [156], [157];
based joint task offloading and migration approach, where and security/privacy [158], [159].
DRL and LSTM are combined to solve the task offloading Network slicing is a networking technology that enables
problem in MEC networks. The proposed algorithm called network providers to split the physical network infrastructure
“Online Predictive Offloading (OPO)” uses LSTM to predict into multiple logical networks [160]. Abidi et al. [145] propose
the load of the edge server with the objective to improve the a 5G network slicing approach based on deep belief network
convergence speed and accuracy of the DRL model during the (DBN) and neural network (NN) (Fig. 9). Firstly, the number
offloading process. The experimental results shows that the of IoT devices (e.g., mobile phones, cars, and cameras) in
proposed approach reduces the latency by 6.25% on average. the 5G network is observed. Then, the attributes of the
2) Service Providers Perspectives: One of the benefits of devices (e.g., device type, packet information, bandwidth)
bringing ML/DL models to the edge of the network for the are collected. After that, the collected data are normalized
service providers or any stakeholder is the processing and into the interval [0,1] to reduce the redundant data. Then,
analyzing of data generated by users or IoT devices at the edge the feature extraction is performed by multiplying a weight
network. This significantly reduces the cost and latency of function with the attributes values of the network to obtain
sending data to the remote cloud for processing and analyzing. high-scale variation. The weight function is optimized by
Also, the analyzed data can be exploited for security and using a hybrid metaheuristic method, namely the glowworm
safety (e.g., park monitoring, fire prevention), or for marketing swarm optimization method and the deer hunting optimization
purposes (e.g., users segmentation to determine users wishes). method. Next, the network slicing prediction is performed
Besides, the service providers can also maximize the long- using DBN and NN to maximize the accuracy. The types of
term profit by intelligently selling their computation resources the predicted network slices are enhanced mobile broadband
to the end-users. For instance, the authors of [143] present (eMBB), massive machine type communication (mMTC), and
a public blockchain application in MEC with the objective ultrareliable low-latency communication (URLLC). The exper-
to maximize the long-term profit of the service providers. In imental results prove that the accuracy of the proposed hybrid
the public blockchain-enabled MEC network, each blockchain learning approach is better than the benchmark methods.
user (i.e., miner) offloads its proof-of-work puzzle tasks to For example, the simulation results proved that the proposed
5G Network A. ML/DL Tasks on-Edge-Devices

At the intelligence devices layer, ML/DL techniques are
applied for a range of applications, including virtual reality
Door Lock Camera Car Bycicle video streaming [161], image recognition [162], and pandemic
tracking [163]. The execution of ML/DL models on the edge
Set of features devices can reduce the latency of the training and inference
Device type Delay rate
Modulation Packet time [4]. However, it is computationally expensive for ML/DL
Speed Bandwidth type loss rate
models, in particular, DNN, to make training and inference
Proposed Algorithm
on resource-constrained devices due to millions of parameters
that need to be refined over several time periods. For this
Final Features X Weight Function reason, many studies have proposed strategies to reduce the
training and inference times of DNN models running on
Optimal Weight Features resource-constrained devices while optimizing the QoS and
QoE requirements. Such studies can have benefits throughout
Accuracy
DBN Classification NN
the MEC ecosystem by minimizing the latency of the DNN
Maximization
models while running on the IoT device or edge server. In
this section, we discuss key approaches for enabling ML/DL
Network Slicing Type tasks on smart devices such as model compression, hardware
eMBB mMTC URLLC
designing, and neural networks optimization.
1) Model Compression: The common strategy for enabling
Fig. 9: ML and DL based 5G Network Slicing [145]
ML/DL models on intelligent devices is to compress the
model. Model compression reduces the resource and compu-
Enabling Techniques for tational requirements, but it leads to an accuracy reduction
ML/DL Tasks in MEC
compared with the original model. The most popular models
compression techniques include knowledge distillation [164],
On-Edge-Devices On-Edge-Servers
Across devices, pruning [165], and quantization [166].
edge/cloud servers
The Knowledge Distillation (KD) method (also called as
Model Model Joint Model
Compressing Offloading Offloading and
“student-teacher networks”) transfers the knowledge learned
Hardware Model
Partitioning from a larger DNN model (teacher model) to a model that
Designing Partitioning
Federated has fewer parameters and layers (student model) (see Fig.
Model Model Learning 11a). The first step consists of training a larger DNN to
Optimization Caching
generate labeled data. After that, the generated data is used to
Fig. 10: Major Enabling Techniques for ML/DL Tasks in MEC train a smaller and shallower mimic model, then the student
model is deployed. Several studies have used this model
compression technique. For instance, the authors of [167]
combine knowledge distillation and auto-encoder methods to
approach is 0.61% better than the PSO+NN+DBN based visually interpret and diagnose image classifiers. Furthermore,
approach in terms of accuracy. a small locally accurate model is trained to mimic the behavior
of an original cumbersome DNN (big model) around one
image of interest. In this approach, knowledge distillation is
IV. E NABLING T ECHNOLOGIES FOR ML/DL TASKS IN used to transfer the knowledge from the big model to the small
MEC model, while the auto-encoder is used to generate neighbors
As the number of IoT devices increases, the need for around the image of interest. Tanghatari et al. [168] propose a
researchers to understand how to design architectures that inte- knowledge distillation approach to distribute the DNN training
grate ML/DL training and inference with MEC grows rapidly. over IoT edge devices with the objective to protect data privacy
Additionally, given the fact that MEC systems are distributed, on the edge devices and decrease the load on cloud servers.
a key question that arises is “where should we perform the Furthermore, the knowledge of the main network is transferred
training and inference and where should we deploy the fully to the generated small network. The experiments results show
trained model in MEC ?”. In the literature, they are different that the proposed approach preserves the IoT device data
approaches for performing ML/DL training and inference in privacy and obtains on average 2.3% accuracy loss compared
MEC. Here, we discuss three major approaches (see Fig. 10): to the conventional centralized training on the cloud.
1) On-Edge-Devices, where the ML/DL models training and Pruning is a powerful compression method that removes
inference are executed on the IoT device; 2) On-Edge-Servers, redundant parameters of a neural network that are not neces-
where data generated from the IoT devices are offloaded to sary for training or inference (Fig. 11b). In [169], the authors
one or more edge servers for training/inference; and 3) models propose a pruning technique based on activation maximization
training and inference across the edge devices, edge servers, for CNN model acceleration and compression. Activation
and cloud servers. maximization is a simple method to visualize the features
Training
Data DNN Model
(4-Bit
DRR3 Memory
Quantize +
Knowledge Transfer
Input Data)
KNOWLEDGE
Memory
Controller
Student model
Input/Output
Model
Teacher model
Deployed in Buffers
Mobile
Device
FantastIC4
Control Unit
(a) Knowledge Distillation
FantastIC4
Original DNN Model Pruned DNN Model Accelerator
Pruned DNN Fig. 12: FantastIC4 hardware accelerator architecture for DNN
Model
Deployed in
Mobile Device
[181]
(b) Pruning 2) Hardware Designing: Although model compression is a

Fig. 11: Model compression techniques enabled ML/DL tasks powerful technique to enable model training and inference on
on edge device edge devices, it can impact the final accuracy of the model.
For instance, the pruning technique can reduce the model size
but does not improve the training or inference time [182].
of a trained neural network with the objective to maximize Also, most of the existing model acceleration approaches
the activation of certain neurons [170]. The experimental used compression techniques to reduce the complexity of the
results based on RadioML2016.10a dataset [171] show that model. However, few compression techniques are implemented
the proposed model obtains a higher accuracy compared to the on popular DL frameworks such as Tensorflow Lite and
weight sum (WS) [172] and average percentage of zeros [173] Core ML [176]. For these reasons, recent studies have been
approaches. PruneFL [174], a federated learning approach with focused on hardware architecture to facilitate the execution of
adaptive and distributed parameters pruning is proposed to ML/DL tasks on resource-constrained devices. For instance,
minimize the training time on edge devices while ensuring a the authors of [181] present FantastIC4, a new hardware
similar accuracy as the original model. PruneFL has two main architecture, which efficiently executes highly compact repre-
phases: initial pruning at a selected edge device and further sentations of DNNs based on fully-connected layers. As shown
pruning which involves both the edge device and edge server in Fig. 12, the FantastIC4 hardware-accelerator system is a
during the federated learning process. combination of a CPU and an FPGA. FantastIC4 has three
Concerning DNN quantization, it aims to reduce the number main parts: the software program, the DDR3 memory, and
of bits required to store the weights of the neural networks the hardware architecture on the FPGA chip. The software
[175] [176]. Coelho et al., [177] propose a novel heteroge- program comprises the CPU that transfers the input data and
neous quantization approach to minimize the energy consump- DNN model to the FPGA chip. Since the input data is usually
tion of the DNN model on Field Programmable Gate Arrays very large, and cannot fully be stored on an on-chip BRAM
(FPGAs) while achieving high accuracy. The evaluation of (Block RAM), some of the data is stored in an off-chip
parameters quantization during DNN training on edge device DRAM Dynamic RAM). The input data is accessed through
is the objective in [178]. Furthermore, it aims to understand a memory controller built across a MIG (memory interface
how to select the quantization parameters during training generator) IP. Concerning the FPGA chip part, it comprises
to optimize neural networks for inference. The authors of the FantastIC4 control unit, memory controller, I/O Buffers,
[179] propose a quantization method in federated learning to and the FantastIC4 accelerator. The memory controller aims
enhance the efficiency of data exchange between edge servers to facilitate the transfer of the input data from the off-chip
and cloud servers. In this approach, the model training is DRAM to the accelerator, then stores the execution results into
performed on the edge servers, and the model aggregation the DRAM. The control unit manages the behaviour of other
is done on the cloud servers. The main idea is to quantize modules on the FPGA, the data movement, and the process
the neural network weights when the models are transmitted inside the accelerator. Concerning the I/O buffers, they store
from the edge servers to the clouds and vice versa. Experiment the data for execution and cache the PSum data from the
results based on WikiText-2 [180] show that the proposed accelerator for inference of the subsequent layer. Finally, the
method reduces up to 19× the volume of data exchanged heart of the system is the FantastIC4 accelerator, which reads
between the edge servers and cloud servers. Simulations the input data from the DRAM, performs the execution, and
results also prove that the impact on the validation loss of caches the results into the DRAM memory.
the final model is around 5%. CMSIS-NN [183] is an efficient kernel designed to max-
imize the performance and minimize the memory footprint B. ML/DL Tasks on Edge Servers
of neural networks on Arm Cortex-M CPUs targeted for IoT While model compression and architecture optimization en-
devices. MCUNet [184] is another framework designed for ef- able to run ML/DL tasks on edge devices, it is still challenging
ficient neural network architecture (TinyNAS) and lightweight to deploy large DNNs models on resource-constrained devices
inference engine (TinyEngine) on microcontrollers. TinyNAS with limited power, computation, and storage in real-time.
first optimizes the search space to fit the resource-constrained Therefore, resource management approaches including task
devices and then performs a neural network architecture offloading, task partitioning, and content caching are good
search within the optimized space. TinyNAS is co-designed strategies to address this challenge.
with TinyEngine to enhance the search space and fit large 1) ML/DL Tasks Offloading: Offloading ML/DL tasks from
models. TinyEngine reduces the memory usage by 3.4× and edge devices to more powerful servers such as edge servers or
accelerates the inference by 1.7-3.3× compared to CMSIS-NN cloud servers is a good choice. Since the edge server is close
[183]. Simulation results based on ImageNet proved that the to edge devices, it is natural to offload ML/DL computational
MCUNet framework achieves 70.7% accuracy and accelerated tasks to the edge servers rather than the cloud. In [192], the au-
the inference of wake word applications by 2.4-3.4×. thors propose an algorithm called “Multiple Algorithm Service
The authors of [185] present a hardware accelerator based Model (MASM)” to offload AI tasks to the cloudlet servers
on quantum annealer. Quantum annealer is a hardware ar- with the objective to minimize the energy consumption of the
chitecture for discrete optimization problems. The proposed servers and the offloading delay cost while meeting the quality
architecture outperformed GPUs and quantum annealers in of the results (QoR). DNNOff [193] aims to automatically
terms of energy consumption. Hardware-Aware Automated determine the DNN tasks that should be offloaded to the edge
Quantization (HAQ) [186] is a hardware architecture that servers. To achieve this, the DNNOff algorithm first extracts
uses reinforcement learning to automatically determine the the structure and parameters of the deep neural network model,
quantization policy. HAQ also includes hardware architecture then a random forest regression model is used to predict the
into a loop to minimize the latency, energy, and storage on execution cost of each layer. Finally, the DNNOff algorithm
the target hardware. Compared with conventional architectures uses the prediction model to determine the parts that should
(e.g., fixed bit width quantization), HAQ reduces the latency be offloaded to the edge servers. The experiments based on
by 1.4-1.95× and the energy consumption by 1.9× with good real-world DNN applications with AlexNet, VGG, and ResNet
accuracy. models show that the DNNOff algorithm reduces the response
There are other hardware accelerators for DNN tasks that time by 12.4–66.6%.
have been proposed in the literature. For instance, DNNBuilder 2) ML/DL Tasks Partitioning: He et al. [194] propose DNN
[187] can automatically build high-performance DNN hard- tasks offloading approach to minimize the end-to-end inference
ware accelerators on FPGAs with the objective to satisfy the delay. To achieve this goal, a tandem queueing model is used to
throughput and latency requirements of both cloud and edge analyze queueing and processing delays of DL tasks. Tandem
devices. Kernel decomposition is another approach of hard- queueing models are queueing theory models that consider
ware accelerators for DNN models. For example, ESCALATE the possibility that an end-user may request services from
[188] is an algorithm-hardware co-design for CNN accelerator many sequentially arranged servers [195]. The authors of [194]
based on kernel decomposition technique. first formulate the problem as a joint optimization problem,
3) Neural Networks Models Optimization: Artificial net- that is, DNN partitions deployment and resource allocation
work networks (ANNs) model optimization is another key problems. Then, an algorithm based on Markov approximation
approach to achieve high accuracy for ML/DL tasks. Recent is used to solve the problem. Simulation results prove that the
studies have been focused on the architecture design of exist- proposed algorithm reduces the average end-to-end inference
ing ANN models to create optimized ANN models instead of delay by 25.7%. In [196], the authors also present DNN tasks
using compression methods to reduce their complexity. For partitioning and offloading algorithm with the objective to
instance, the authors of [189] propose an optimized ANN minimize the processing delay and the computing burden of
model to predict the thermal efficiency and water yield of edge devices. Compared to [194] which used the Markov
solar still. The optimized ANN model used PSO and HWO approximation method, in [196], a Mixed Integer Linear
(Humpback whale optimizer) [190] to optimize the traditional Programming (MILP) is used to solve the partitioning and
ANN model. Simulation results prove that the proposed opti- offloading problems. The experiments results show that the
mized ANN model achieves the highest prediction accuracy proposed algorithm obtains up to 90.5% and 69.5% processing
compared to ANN, ANN-HWO, and ANN-PSO. RouteNet delay reduction compared with the MEC-server-only scheme
[191] is another NN model optimization, which is based (i.e., all DNN tasks are offloaded to the edge server) and
on graph neural networks. RouteNet can optimize network mobile-device-only scheme (i.e., all DNN tasks are processed
representation in SDN. Compared to the conventional well- locally on the mobile devices), respectively. The work pro-
known NN (e.g, CNN, RNN) which are not designed to posed in [197] extends [196] by considering not only the
learn graph-based data, RouteNet is able to learn the complex processing delay but also the energy consumption and price
relationship between topology, routing, and input traffic of a paid for DNN tasks execution on the edge server.
graph-structured network. The experimental results show that 3) Content Caching: Edge servers can cache locally the
RouteNet obtains accurate delay prediction (Mean Relative user’s related data near the location where the data have been
Error) of 15.4%. generated to reduce latency. Therefore, content caching can
Cloud
EDGE DEVICE
Common to cache hit or miss
Encoder
Only on cache miss
Cloud
Fetch request
1 2
....... Encode
Cache
4 Enqueue Cloud
5
Interface Connector
Frame buffer
8 Response callback 6 Cloud
response
label_1, score
Lookup
7
...................
Infer
label_n,score)
3 Cache response
frame_ID
4 9 (label_1, score),....
Key1
(label_n,score)
(label_1, score), ...
Key2
(label_n,score)
Semantic Cache
(label_1, score), ...

KeyM
(label_n,score)
Fig. 13: Semantic cache to perform AI inference in edge [198]
enable model inference in the edge network. In [198], the

authors propose a semantic cache approach to perform AI DNN Tasks Partition
Offloading
inference on unstructured data in edge nodes, which reduces
the volume of data that needs to be sent to the remote cloud 1. Inputs data
1. Receive Partitioned DNN
server. As shown in Fig. 13, the user first submits the input 2. Determine partitioning points
3. Perform part of DNN tasks
2. Perform rest of DNN tasks
3. Return result
data (e.g., image or video) for AI inference to the cache service 4. Offload rest of DNN tasks
via the cache interface. Then, the interface forwards the input Fig. 14: Joint DNN tasks partitioning and offloading
data to the encoder for features extraction. The encoder first
searches in the cache. If there is a cache miss, then the image is
sent to the cloud server which will perform the inference. The the partitioning points and perform some part of the DNN
inference result from the cloud is stored in the cache indexed tasks on the edge device (vehicle). The rest of the DNN
by a key, and finally, it is sent back to the user. task is offloaded to the edge server, which will identify the
CacheNet [199] is a novel DNN model caching framework, partitioning point, processes it, and returns the result to the
which caches the low-complexity DNN models on the end vehicle.
devices and the high-complexity DNN models on edge/cloud The authors of [201] propose a joint multi-device DNN
servers. The basic idea of CacheNet is inspired by the caching partitioning, offloading, and allocation mechanism. The main
approach in computer architecture, where the computer ele- objective is to minimize the maximum DNN execution latency
ments (e.g., register, cache, RAM) are separated by memory among all the edge devices, and therefore reduce the global
hierarchy based on response time. Compared to the memory latency. In the proposed approach, multiple devices cooperate
hierarchy in a computer that only stores data, CacheNet stores with an edge server, and each device can make a DNN
DNN models. Especially, CacheNet generates multiple small partitioning decision on its own DNN model. The offloaded
sub-models. Then, each sub-model captures a partition of DNN layers of edge device are executed on the edge server
the knowledge represented by the large DNN model instead to accelerate the learning process. To partition the DNN tasks,
of training a single large-scale DNN model. Experiments the logical layers are divided into two types, one type that
results based on CIFAR-10 [200] and FVG dataset prove that will be executed locally on the device, and another type that
CacheNet is 58 - 217% faster than the benchmark approaches. will be executed on the edge server. Only intermediate outputs
are offloaded from an edge device to an edge server. The
C. ML/DL Tasks across Edge Devices, Edge Servers, and offloading decision at the device side is modeled as an integer
Cloud Servers variable Si ∈ {0, 1, 2..., k}, denoting that the layer 0 to Si are
1) ML/DL Tasks Offloading and Partitioning: The most executed locally on the edge device while the rest of the layers
widely approach used to efficiently enable ML/DL inferences are offloaded to the edge server. The simulation results prove
on MEC is tasks offloading and partitioning among the par- that the proposed approach outperforms the local-execution
ticipating nodes in MEC (i.e., edge devices, edge servers, method by 67.6% and the edge-only-execution scheme by 41%
and cloud servers). As shown in Fig. 14, the joint DNN when there are enough resources and bandwidth.
tasks partitioning and offloading approach first determines In [202], the authors present an efficient energy-aware DNN
offloading algorithm for intelligent IoT systems in cloud-edge Policy aggregation
environments. The main objective is to minimize the overall

DRL DRL DRL
energy consumption of all participating nodes (i.e., devices,
edge servers, and cloud servers). The offloading algorithm
called SPSO is based on two meta-heuristic methods, namely Edge
Server
the particle swarm optimization (PSO) and the Genetic Algo-
System 1 System 2 ... System k
rithm (GA). Layers partitioning is also introduced to reduce
the encoding dimension and the execution time. The DNN Policy redistribution
partitioning is performed as follows: firstly, the branches in
a DNN are divided into isolated modules. Then, for each Fig. 15: Federated learning for DRL [208]
module, the actual layer is initialized as the start layer. After
that, every two adjacent layers are checked based on a defined
fitness function. Once a partition point is found, the actual the distributed systems share a DRL model that represents
layer is updated to the next layer. This process is repeated the learning policy as shown in Fig. 15. Also, each system
until the last two adjacent layers are checked. Finally, the k learns its local policy model Wk by a DRL method. The
layers between each two adjacent partition points are merged edge server updates a central policy model Wcs by aggregating
to form a deployment unit. the learned policy models from the systems Wk ’s. Then, the
Qadeer and Lee [203] investigate the computation and updated central policy model of the server is redistributed
wireless resource allocation problem in edge-based cloud to each distributed system which replaces its local policy
(with limited computational resources) and traditional cloud model with the central one. By repeating this process, the FL
environments. Specially, they propose an algorithm based on mechanism can accelerate the learning speed of the resource
a deep deterministic policy gradient with a pruning approach. allocation policy and ensure adaptability to newly-arrived
The learning process is performed on the edge-based cloud to systems because of the central policy model at the edge server.
achieve dynamic resource allocation for the edge devices. Ex- The authors of [207] also apply FL to accelerate the training
perimental results show that the proposed algorithm achieves of the DRL agents for task offloading in MEC. The FL recur-
up to 55% reduction in terms of operational cost, and up to sively selects a random set of IoT devices to first download
86.5% reduction in rejection rate on average. Furthermore, the parameters of the DRL agents from the edge server, then
the proposed algorithm obtains up to 115% gain in terms uses their own data to perform the training process on the
of QoE. Another offloading approach in edge-cloud networks downloaded model. Finally, it uploads only the updated model
is proposed in [204] with the objective to minimize the parameters of the DRL agent to the edge server for model
task processing delay. To achieve this, a DRL method, in aggregation. This FL-based approach enables resource-limited
particular, a DQN is used to learn the optimal offloading devices to learn a shared DRL agent without centralizing the
schemes through exploration and exploitation processes. Shi training data.
et al., [205] investigate the trade-off between the inference In [209], the authors propose a task offloading and resource
latency and data privacy during the DNN tasks partitioning in allocation algorithm based on federated learning and DRL. The
a MEC network. algorithm distributed the DRL tasks from the edge servers to
2) Federated Learning: Federated Learning (FL) is another the edge devices for training, which improve the accuracy. The
powerful enabling approach for ML/DL models training and proposed algorithm called FDOR has four components: of-
inference in MEC while guaranteeing data privacy. FL is floading action generation, offloading policies updating, DNN
a decentralized ML approach that allows smart devices to model aggregation, and adaptive learning rate approach. The
cooperatively train a shared learning model while keeping the authors of [210] present a gradient-descent-based federated
raw data on their devices, thus protecting their privacy. In FL, learning approach, which comprises two main phases: local
the edge devices (e.g., mobile phones) use their local dataset update and global aggregation. In the local update phase,
to collaboratively train and learn the model required by an FL each edge node performs gradient descent to locally adjust
server without sending their data. Then, they send the model the model parameters with the objective to minimize the loss
updated to the server for aggregation. These steps are repeated function defined on its own raw data. In the aggregation step,
several times until an expected accuracy is obtained. the model parameters obtained at each edge node are sent to
Although, the learning techniques such as RL and DRL an aggregator, which regroups the parameters and sends back
can effectively address the resource allocation problem in an updated parameter to each edge node for the next round of
wireless networks, their learning speed may be slower in iteration.
complex networks. Indeed, in a complex wireless network, a The authors of [208] investigate resource optimization tech-
new learning policy should be updated for a newly-arrived niques in FL. Especially, an NN-aware resource management
system because of the lack of network adaptability [206]. mechanism based on FL is proposed, where the sub-networks
To solve this issue, recent efforts focus on the application of the global model are assigned to the mobile clients based
of FL for DRL. For instance, the authors of [206] present on their local resources. They also present a use case of FL,
a federated learning framework for resource allocation in namely virtual keyboard application (VKA) used by Google
wireless networks with the objective to accelerate the learning AI group [211]. The VKA on mobile devices uses a natural
speed. In the proposed FL framework, the edge server and language processing (NLP) DL model to predict a word. The
model compression, which reduces the DNN model inference

times. This approach has advantages in MEC by minimizing
the latency of DNN models running on resource-constrained
edge devices. However, the drawback of model compression
Aggregation NLP for word prediction techniques is that they do not guarantee to maintain the
Go
+
Upload local model
We go | original model accuracy. In other words, the compressed DNN
q1 w2 e3 r4 t5 y6 u7 i8 o9 p0
Central
Server
Global 0
a s d f g h j k l
model can obtain an accuracy less than that of the original
model
We
z x c v b n m
?123 , .
model. The second approach to facilitate the execution of
ML/DL tasks in MEC is hardware architecture designing. The
main challenge of these architectures is that they are designed
Training with
for a specific DNN model as explained in the previous section.

private data
Download global model
Therefore, it is challenging to design an architecture that can
support different DNN models.
Fig. 16: Federated learning: virtual keyboard use case [208] The third approach allowing DNN training and inference in
MEC is ML/DL tasks offloading from the resource-constrained
device to powerful servers (e.g, edge servers or cloud servers).
process of VKA with FL is described as follows (Fig. 16): Generally, for latency-aware applications, the tasks are of-
firstly, the mobile device uploads the global model from the floaded to other edge devices or to the edge servers, which
central server, then enhances it by training on local data. After are close to the specific edge device. It is important to notice
that, the updated local model is sent to the central server that the models compression method can be jointly used with
through a secure encrypted communication link. Next, the the offloading technique to meet QoS requirements as in [228],
global model is improved by aggregating the updated local [229]. The main challenge of offloading methods is to decide
models to the central server. This iteration of model training which part of the DNN tasks should be offloaded and where
and aggregation is repeated until the global model converges. to offload them. Scheduling the offloaded DNN tasks is also
In [34], the authors present a comprehensible survey of challenging during DNN inference in MEC. Therefore, it is
federated learning in MEC networks. They discuss several crucial to investigate the joint DNN tasks offloading and
aspects of FL including fundamentals of FL, applications of scheduling in MEC. Finally, a natural question that arises is
FL in MEC, and challenges. how ML/DL methods can improve the tasks offloading and
scheduling (i.e., resource allocation) in MEC. The answer to
this question is the purpose of the next sections.
D. Lessons Learned
In this section, we discuss lessons learned from enabling V. ML AND DL FOR R ESOURCE A LLOCATION IN MEC
techniques for ML/DL tasks (i.e., training and inference).
Table V summarizes the potential techniques that enable to A. Motivation Example
quickly perform ML/DL tasks in MEC. As shown in Table To illustrate the importance of ML/DL methods for resource
V, the majority of the enabling approaches for ML/DL tasks allocation in MEC, let’s consider that we would want to write
in MEC are evaluated using software stacks such Raspeberry an algorithm to offload “untrusted tasks” of an applica-
Pi, PyTorch, TensorFlow, and QKeras. ImageNet [219] dataset tion to an edge server. By using traditional programming
is widely used to evaluate the performance of the proposed approaches (Fig. 17), we would first look at what an “untrusted
ML/DL approaches because it contains enough labeled high- task” looks like. We might observe that some characters or
resolution images belonging to different categories, which words (such as “3pay”, “money”, ”free”, “bank”) tend to
facilitate the training of large CNNs [78]. appear several times. Perhaps we would also remark some
Although there are several hardware and software stacks patterns in the mobile user’s id, name, location, and so on.
(e.g., edge tensor processing unit, FPGAs, Tensorflow Lite, Then, we would write a discovery algorithm for each of
Core ML, and EdgeML [72]) for ML and DL techniques, there these patterns that would flag a task as “untrusted task” if a
is a need to evaluate these tools and propose a standard testbed number of these patterns are found. Finally, we would test the
for ML/DL tasks in MEC. A standard testbed consisting of algorithm and repeat the above steps until it is good enough.
neural network models, datasets, networking models, IoT de- Since this problem is not trivial, the algorithm will likely
vices, edge/cloud servers, and resource allocation mechanisms become enough hard to maintain.
is perceptible in enabling ML and DL tasks in MEC. In contrast to the traditional approaches, an ML-based “un-
In MEC, DNN training and inference are mainly performed trusted tasks” offloading algorithm automatically learns which
across edge devices, edge servers, and cloud servers. The main characters and words are good predictors of “untrusted tasks”
reason is that DNN training and inference require not only by discovering unusually frequent patterns of characters or
more computing power, but also less latency that a single words (Fig. 18). Also, the ML-based mechanism is much eas-
computing node (e.g., edge only, edge server only, or cloud ier to maintain and more accurate. Indeed, if malicious senders
server only) cannot provide. remark that all their tasks containing the word “3pay” are
There are mainly three types of strategies to enable fully blocked, they might start writing “pay” instead. An “untrusted
DNN models training and inference in MEC. The first one is tasks” offloading algorithm using traditional approaches would
TABLE V: Summary of Works focused on Enabling ML/DL Tasks in MEC

Computation Layer Ref. NN Models Techniques Application Simulation Tools Dataset Key Metric
2021, [179] LSTM Quantization Image classification PyTorch WikiText-2 [180], Validation loss
MNIST, Bar Craw
2019, [186] MobileNet-V1 Quantization Xilinx Zynq-7020 FPGA ImageNet Latency, energy, and
and Xilinx VU9P model size
2021, [181] MLPs, ResNet-50 and Hardware accelerator Hand-gesture recog- System Verilog + Mentor Google speech sommands, Throughput and power
ResNet-34, ResNet-20, nition, image classi- Graphics Simulator MNIST, and CIFAR-10 consumption
and EfficientNet-B0 fication
2019, [167] LeNet [212] and VGG16 [213] Knowledge distillation Image classification TensorFlow, Javascript, MNIST, QuickDraw, and Interpret and diagnostic
and Auto-encoder and Flask library CelebA [214] image classifiers with high
accuracy
2022, [168] LeNet5, AlexNet, MobileNet- Knowledge distillation - TensorFlow CIFAR-10 Privacy preserving
V3, VGG-19, ResNet-18,
ResNet-32
2018, [87] Inception, ResNet, and Mo- KNN Image classification Jetson TX2 embedded DL ImageNet ILSVRC 2012 Minimize inference time
bileNet validation dataset while meeting user QoS
On Edge Devices 2020, [169] VT-CNN2 Pruning based on activa- Automatic modula- TensorFlow RML2016.10a dataset Model acceleration
tion maximization tion classification [215]
2019, [174] Conv-2 [216], VGG-11 [79], Pruning Image classification PyTorch, Raspberry Pi FEMNIST [216], CIFAR- Accuracy and time trade-
ResNet-18 [217] 10 [218], and ImageNet of
[219]
2021, [177] Fully-connected NN [220] Quantization Particle QKeras [221], AutoQK- Hls4ml lhc jet dataset Energy minimization and
identification eras, and hls4ml [222] accuracy
2019, [178] Resnet-v1-50 Quantization TensorFlow Cifar10 Accuracy
2019, [192] - Offloading - MATLAB Numerical Energy consumption, de-
lay cost, and QoR
2022, [193] AlexNet, VGG, and ResNet Offloading Image recognition Python and Caffe2 unknown Response time
2020, [194] VGGNET-16 Offloading + Partitioning NVIDIA Tesla V100 GPU Numerical End-to-End Inference de-
lay
On Edge Servers 2020, [223] MobileNet, Inception, and Graph-based partition Unknown Scikit-learn, Caffe Mobility datasets [224] Throughput
ResNet and CRAWDAD [225]
2019, [196] , VGG16, VGG13, a Offloading + partition- Unknown Orange Pi Win Plus, Numerical Delay
ALEXNET, and LENET ing (MILP) MATLAB
2018, [198] RESNET-152 Caching Object classification NVIDIA Tegra TK1 + Youtube-Objects video Latency
Raspberry Pi3
[199], 2020 CacheNet, Shake-Shake and Caching - TensorFlow, TensorFlow CIFAR-10 and Frontal Latency
ResNet Lite and NCNN View Gait (FVG) dataset
[226]
2021, [201] MobilenetV2 and VGG19 Partitioning Unknown Raspberry Pis, NVIDIA Unknown Latency
Jetson Nanos
Across devices, edge 2022, [202] AlexNet, VGG19, GoogleNet, Offloading Unknown Python Unknown Energy Consumption
and cloud servers and ResNet101
2022, [203] Conv1D and gated recurrent Resource allocation + Python Numerical Operational cost, rejection
unit (GRU) Pruning rate, and QoE
2022, [227] MobileNetV1 Reinforcement learning Image classification AWS a1.medium, AWS Numerical Response time and Accu-
a1.large, ARM-core, and racy
ARM-NN SDK
2022, [209] Fully connected DNN Offloading, FL Wireless communi- PyTorch Numerical convergence speed, execu-
cation tion delay
2019, [210] Squared-SVM, Linear regres- Federated learning Unknown Raspberry Pi MNIST, SGD, DGD, Minimize the loss func-
sion, K-means, and CNN CIFAR-10 tion
Intelligent Device Layer MEC Layer Intelligent Device Layer MEC Layer
Edge Servers ML Approach Edge Servers

Smart device
Smart device
Traditional Approach
Data
Initiate Initiate
1) 1)
Problem:
Problem:
Set of
''Untrusted Define Set of
''Untrusted Train ML
Evaluate Evaluate
Tasks 2) Tasks''
rules Tasks Tasks''
Algorithm
Offloading 2)
Offloading
4)
4)
Analyze Analyze Offloaded
Error Offloaded
Error Task Queues
Task Queues
3) 3)
Fig. 17: Traditional approach for untrusted task offloading. Fig. 18: ML approach for untrusted task offloading.
need to be updated to flag the word “pay”. If malicious senders recently, many manufacturing factories are integrated edge
keep working around the traditional offloading algorithm, we computing and AI to solve the daily job scheduling problem.
will need to keep writing new rules forever. In contrast, an Hence, this tutorial will help both generalists and specialists to
ML-based “untrusted tasks” offloading algorithm will auto- understand how ML and DL can be used to solve the resource
matically observe that “pay” has become unusually frequent allocation problem in MEC. The flow shop scheduling problem
in “untrusted tasks” flagged, and it starts flagging them without can be formulated as follows: given a set of J jobs, the tasks
writing new rules (Fig. 19). of each job need to be processed by m machines (servers)
Now, suppose that we want to schedule the offloaded tasks in a specific order. In our case, we have one job with 3
to the MEC servers (Fig. 20). To provide a more realistic tasks that need to be scheduled on 3 servers (Table VI). The
scenario, we consider the problem as a flow shop scheduling main challenge in flow shop scheduling is to find the optimal
problem, which is widely used in manufacturing [230]. Also, sequence in which the tasks will be executed. For instance, if
Intelligent Device Layer MEC Layer TABLE VI: Processing time of 3 tasks on 3 machines
ML Approach Edge Servers
Smart device
Task M1 M1 M3
Data
Update Data Initiate T1 4 3 1

1)
Set of
Problem:
''Untrusted
Automatically T2 3 5 5
Train ML
Tasks 2) Tasks''
Evaluate
Algorithm
Offloading
4)
T3 3 2 4
Analyze Offloaded
Error Task Queues
3)
First episode:
Fig. 19: ML approach for untrusted task offloading: automat- • Current state: [T1 , T2 , T3 ];
ically adapting to environment. • Select a random action: T2 ;
• Sequence : T2 ;
Intelligent Device Layer MEC Layer 1
• Reward: R([T1 , T2 , T3 ], T2 ) = =
Smart device
ML Approach Edge Servers makespan(T2 )
1
Data = 0.0769;
Update Data Initiate 3+5+5
1)
Automatically
MEC SCHEDULER
• Calculate the new Qπ (s, a): Qπ ([T1 , T2 , T3 ], T2 )
Problem:
Set of
''Untrusted Train ML
Evaluate
Tasks 2) Tasks''
Algorithm
h
QL
Offloading 4)
= Qπ ([T1 , T2 , T3 ], T2 ) + α R([T1 , T2 , T3 ], T2 )
Offloaded
Analyze Task Queues

Error
+γ max[Qπ ([T1 , T3 ], T1 ), Qπ ([T1 , T3 ], T3 )]
3)
i
−Qπ ([T1 , T2 , T3 ], T2 )
Fig. 20: QL-based offloaded task scheduling. h i
= 0 + 0.4 × 0.0769 + 0.8 × max[0, 0] − 0 = 0.03076;
Algorithm 1 Q-Learning Algorithm
• The Q-table is updated (Table VIII index 1), and we
Input: Random state; process the current state [T1 , T3 ];
Output: Q-Table; • Select a random action: T3 ;
1: Initialize Qπ (s, a) arbitrary for all (state, action) pairs; • Sequence: [T2 , T3 ];
2: repeat(for each episode τ ) 1
• Reward: R([T1 , T3 ], T3 ) = =
3: Initialize state s; makespan([T2 , T3 ])
4: repeat(each step of episode) 1
= 0.0588;
5: Choose action a from s using policy derived from 17
Qπ (e.g., -greedy, ∈ [0, 1]); • Calculate the new Qπ (s, a): Qπ ([T1 , T3 ], T3 )
Take action a, observe r, s0 ;
h
6:
= Qπ ([T1 , T3 ], T3 ) + α R([T1 , T3 ], T3 )
7: Update Qπ (s, a) using (2); i
8: s ← s0 ; +γ max[Qπ ([T1 ], T1 )] − Qπ ([T1 , T3 ], T3 )
9: until s is terminal h i
10: until convergence (|Qπ π
τ (s, a) − Qτ −1 (s, a)| < ω); where
= 0 + 0.4 × 0.0588 + 0.8 × 0 − 0 = 0.0235;
ω is the infinitesimal value.
• The Q-table is updated (Table VIII index 3), and we
process the current state [T1 ];
we have n tasks, we will have n! feasible sequences. Here, we • Select a random action: T1 ;
aim to find the optimal sequence using a Q-learning method • Sequence: [T2 , T3 , T1 ];
1
described in Algorithm 1, where Qπ (s, a) is the value-function • Reward: R([T1 ], T1 ) = =
to maximize. At each step t, the Qπ (s, a) value function is makespan([T2 , T3 , T1 ])
1
iteratively updated using Bellman equation (2). = 0.0555;
18
Qπt+1 (s, a) = Qπt (s, a) + α[R(s, a) + γ max Qπt (s0 , a0 ) • Calculate the new Qπ (s, a): Qπ ([T1 ], T1 )
0 a (2) h
−Qπt (s, a)], = Qπ ([[T1 ], T1 ) + α R([T1 ], T1 )
i
where α is the learning rate, R(s, a) is the reward for taking +γ max[Qπ ([T1 ], ∅)] − Qπ ([T1 ], T1 )
action a at state s. The maxa0 Qπt (s0 , a0 ) value is the maximum h i
expected future reward for any action in state s0 . These = 0 + 0.4 × 0.0555 + 0.8 × 0 − 0 = 0.0222;
maximum values are stored in a table commonly called Q-
table. The Q-table has one row and one column for each • The Q-table is updated (Table VIII index 5), then the
possible state and action, respectively. next episode starts. This process is repeated until the con-
The parameters used for this tutorial are given in Table VII. vergence of the algorithm. Fig. 21 shows the makespan
The values of the Q-table are initially set to zero. The learning obtained for each episode. After a certain number of
steps to find the optimal sequence are described as follows: episodes we obtain the best sequence that is [T3 , T2 , T1 ].
TABLE VII: Q-Learning parameters communication networks [236], and SDN [237].
Name Notation Value
Greedy policy = 0.2 C. Deep Learning for Resource Allocation in MEC
Learning rate α α= 0.4
Deep learning has also been used for resource allocation
Discount factor γ γ= 0.8
1
in recent wireless communication technology such as 5G. For
Reward R R= instance, the authors of [247] propose a deep learning method
makespan
(LSTM) for resource allocation in 5G wireless networks with
TABLE VIII: Q-table of three tasks the objective to minimize the energy usage of the remote radio
heads while considering the QoS constraints of the users. In
Actions
[253], the authors present a DRL-based resource allocation and
Index States T1 T2 T3
power management in the cloud. Using an autoencoder and a
1 T1 , T2 , T3 0 0.03076a 0
weight sharing structure, the convergence speed is accelerated,
2 T1 , T2 0 0 0
while the LSTM and RL are used to manage the server
3 T1 , T3 0 0 0.0235
power usage. The simulation results based on Google cluster
4 T2 , T3 0 0 0
traces show that the proposed mechanism minimizes the power
5 T1 0.0222 0 0
6 T2 0 0 0
consumption and energy usage compared to the traditional
7 T3 0 0 0
schemes. Similarly, the authors of [254] propose a DRL-based
8 ∅ 0 0 0
resource allocation algorithm for V2V communications with
the objective to minimize the interference of the V2V links to
a Bold values indicate which action was selected in a
the V2I links while satisfying the latency constraints on V2V
specific state
links.
In summary, several ML and DL methods have been used
Makespan obtained in each episode for resource allocation in MEC networks. Table IX summa-
15 rizes potential ML and DL methods for resource allocation
14
and their advantages and disadvantages in MEC.
In the next sections, we provide an in-depth survey of
13
recent works in this area from three perspectives: ML/DL-
Makespan
12 based methods for task offloading (Section VI), ML/DL-based

11 methods for task scheduling (Section VII), and ML/DL-based
10
methods for joint resource allocation (Section VIII).
9
VI. ML/DL-BASED M ETHODS FOR TASK O FFLOADING IN
8
MEC
0 20 40 60 80 100 120
Episodes In this section, we survey state-of-the-art ML/DL-based
Fig. 21: Q-Learning for flow shop scheduling in MEC. methods for task offloading in MEC. We classify the research
in this area into works focused on the minimization of la-
tency (VI-A), minimization of the energy consumption while
B. Machine Learning for Resource Allocation in MEC satisfying a QoS metric such as slowdown, response time,
execution delay (VI-B), finding a proper trade-off between
ML has been widely used for solving resource allocation multiple QoS (VI-C), and finding a desirable trade-off between
problems in MEC networks. For instance, the authors of [231] both the privacy protection, the execution delay, and the energy
propose an ML-based algorithm for resource allocation in edge consumption (VI-D).
and IoT networks. Especially, they use a clustering approach
to categorize the IoT users into clusters. The cluster with the
highest priority offloads and executes its tasks at the edge A. Minimization of Latency
server, while the cluster with the lowest priority executes its The main objective of [12] is to find a proper offloading
tasks locally. For the other clusters, the task offloading decision mechanism, which can minimize the latency of the application.
is modeled by a Markov Decision Process (MDP), and a DQN To this end, the authors propose a DRL-based offloading
is used to train the optimal policy. In the same spirit, the algorithm, which can effectively learn the offloading policy
authors of [232] use a Q-learning (QL) approach for cross- represented by a Seq2Seq neural network. The algorithm has
layer resource allocation in cognitive radio networks (CRN). three main phases. In the first phase, the priority of each
Since the QL approach leads to a long convergence time for task is calculated based on the EFT, which is the summation
large state and action spaces, they also use a DQN to address of the running cost of the current task and the maximal
this challenge. ML is also used for spectrum sharing cellular EFT of the previous tasks. Then, the tasks are sorted in the
networks [233] and for resource trading in fog environment ascending order of EFT. In the second phase, the offloading
[234]. Furthermore ML is applied for resource allocation policy is designed. In this phase, the sorted task is converted
in various networking systems such as IoT [235], vehicular into embedding vectors, that is 1, 0, where 1 indicates that
TABLE IX: Potential ML and DL Methods for Resource Allocation (RA) in MEC
Method Main Idea Advantage Disadvantage Potential Applications in RA
ML Methods
DT Use a tree-like prediction model to Easy to understand and has logarithmic time com- Local optimal decision is obtained Task orchestration [238], task of-
learn from training datasets represented plexity. at each leaf. Hence, it cannot floading with the objective to min-
as a set of rules and decision branches. guarantee globally optimal deci- imize the energy consumption of
sion tree. mobile devices [239].
RF Randomly use many decision trees to Low prediction errors even for noisy workloads Time consuming Modulation and coding prediction
acquire robust prediction made by the in 5G [240].
individual trees to increase the overall
accuracy
SVM SVM learns the optimal separating hy- Less computational resources Monopolization issue: node with Cooperative offloading in balloon
perplane between the two classes of more assets have more power than networks [242].
training dataset in the feature space others.
[241]
QL QL learns an action-value function Simple to implement since it can be formulated by Slow convergence rate since all Q- Accurate task ordering and assign-
Q(s; a) for each state-action pair. The a single equation values must converge before attain- ment [243], [244], Adaptive re-
Q(s; a) value is the cumulative return ing the optimal policy. source allocation [245].
value obtained after executing action a
in state s.
Bagging Bagging is an ensemble machine learn- Take the advantage of different learners to avoid the Computationally expensive and Prediction task’s attributes [246]
ing technique that aggregates multiple overfitting of models. loss of interpretability
prediction models to reduce the vari-
ance.
DL Methods
Seq2Seq Train a model to generate a sequence of Easy to convert a resource allocation decision into a Challenging with large input se- Binary offloading decision by con-
items from one domain to a sequence of sequence prediction process. quences since the output sequence verting the task into embedding
items in another domain. is heavily related to the hidden vectors ”0” and ”1”, where ”1”
state in the final output. indicates that the task is offloaded
and ”0” indicates the task is exe-
cuted locally [12].
LSTM LSTM is a type of RNN that can learn Easy to capture and store long-term dependencies. Hard to remember a decision after resource allocation for TV Service
long-term dependencies in prediction a long period. It also requires high in 5G wireless network [247].
problems. memory bandwidth due to the lin-
ear layers in each cell.
Autoencoder (AE) AE is a type of NN used to train a Good for feature extraction Computationally time consumption Keep track of the long-term de-
compressed representation of raw data. pendencies that exist between the
It has two main layers, namely the tasks’ requirements and the VMs’
encoder that converts the input into the specifications [248].
code, and a decoder that uses the code
to reconstruct the initial input.
CNN CNN has two main steps: feature ex- CNN can strongly reduce the complexity of the Require large training datasets. Feature extraction of the task queue
traction, which performs a series of network model and the weigh sharing. CNN cannot encode the position of [244], power allocation for secure
convolutions and pooling operations for IoT devices. industrial IoT [249].
detecting features, and the classification
phase, which assigns a probability to
the object for prediction. [66]
RNN RNN save the output of a particular RNN can remember information over time. Very difficult to train an RNN Interference pattern prediction and
layer and feed the result back to the to solve problems with long-term power allocation in D2D network
input to predict the output of the layer. temporal dependencies because of [250].
the vanishing gradient problem
[66].
DRL DRL allows an agent to approximate High convergence rate compared to RL. DRL enable A DRL agent requires a million Big data task scheduling [251], Op-
its policy and get the optimal solu- IoT device to learn optimal policies(e.g., channel of training data samples to learn timal task ordering [251], decision-
tion without requiring any prior training selection) without knowing the environment (e.g., optimal policies. making process [252]
data knowledge. wireless channel information or mobility pattern).
TABLE X: Average latency (ms) of the DRL-based offloading proposed algorithm is that it has weak adaptability to a new
method proposed in [12] environment. Therefore, it requires full retraining to update
Tasks (n) DRL-based HEFT-based Round-Robin-based policy for the new environment, which is time-consuming.
n=10 349.7 365.05 436.53 To solve this issue, the authors of [256] introduce a task of-
n=25 790.79 833.90 948.71 floading scheme based on meta-reinforcement learning (MRL)
n=40 1185.59 1262.06 1372.12 called MRLCO that can learn a meta-offloading strategy for
all IoT devices and rapidly obtain the proper policy for each
IoT device. The main objective of [256] is to find an efficient
the task is offloaded on the MEC server, and 0 denotes that offloading mechanism that minimizes the total latency. Com-
the task is executed locally on the device. The proximal pared to the previous work that has weak adaptability for a
policy optimization (PPO) [255] method is used to train new environment, the method in [256] can quickly adapt to
the Seq2Seq neural network. Finally, the task is executed new environments. The proposed offloading strategy is also
according to the results of the offloading decision of the modeled as a seq2seq neural network. The novelty of the
second phase. The experimental results demonstrated that the MRLCO algorithm is its ability to achieve fast adaptation
proposed algorithm achieves lower latency than the HEFT- to new MEC environments with a small number of gradient
based and Round-Robin-based algorithms. For instance, when updates and samples. The experimental results show that the
the number of tasks is equal to 10, the proposed DRL-based MRLCO algorithm obtains the lowest latency compared to the
offloading algorithm obtains a latency of 349.7, while the scheme proposed in [12].
latency of the HEFT-based and RR-based are 365.05 and Another idea aiming at minimization of the latency is intro-
436.53, respectively (see Table X). The drawback of the duced in [257]. When compared to the previous studies, the
the policy πθold . To obtain an efficient training model, the

DNN Quantization
Generalized Advantage Estimations (GAEs) for each time step
in each trajectory are calculated in advance. Then, the sampled
Input for
Channel
Gain Compute
Offloading
data are cached for the optimization phase. In the optimization
the t-th Selected
time
frame
...
action
action
generation phase, the value θ of the policy πθ is updated for a certain
Selected action
for the t-th time
number of epochs. After that, the policy πθ is enhanced in each
frame
epoch by applying stochastic gradient ascend on the cached
Train
Replay Buffer data. Finally, the sampling policy πθold is updated with πθ , the
$$x_n$$
........
Sample random batch
........
Offloading
Policy
cached data is dropped, and the next iteration continues.
Update
Training sample
Channel Action
B. Minimization of Energy Consumption while Satisfying a
QoS Metric
Fig. 22: Illustration of DRL-based offloading scheme proposed The minimization of the energy consumption while sat-
in [257] isfying the slowdown is the main objective of [260]. The
authors propose a DRL-based offloading algorithm with DNN
as an approximator function that achieves the desired trade-off
authors of [257] also maximize the weighted sum computation between energy consumption and slowdown. The simulation
rate of all offloaded devices. The main objective of the paper is results show that the algorithm can learn the approximate
to find a mechanism that can achieve optimal task offloading optimal task offloading policy after several training rounds
policies and wireless resource allocations while considering and it is better than the baseline algorithms in terms of mean
the time-varying wireless channel conditions. To this end, energy consumption and mean slowdown.
the authors propose a DRL-based online offloading (DROO) The goal to minimize the energy consumption and response
algorithm with DNN that learns the offloading decisions from time is pursued in [261]. This is accomplished by an algorithm
the experience. The DROO algorithm has two main phases: of- based on classification and regression tree (CART), which
floading action generation and offloading policy updating (see finds the optimal offloading policy according to the device’
Fig. 22). The generation of the offloading action uses a DNN, QoS such as capacity, availability, authentication, speed, and
which is represented by its embedded parameters θ. In the tth cost. The proposed algorithm is compared with the First Fit
time slot, the DNN takes the channel gain ht as input and (FF) and local processing policy (LPL) (i.e., execution is done
gives an offloading action xt (xt ∈ [0, 1]) as output based on always locally on the device). The simulation results show
its current offloading policy πθt . The action is then quantified that the proposed CART-based offloading policy is better than
into K binary offloading actions, and the best action x∗t is FF and LPL methods in terms of energy consumption and
selected based on the feasible computation rate. The solution response time. For instance, the proposed algorithm is better
for ht is the output (x∗t , a∗t , τt∗ ), which guarantees that all the than FF method by 44 % in terms of energy consumption. In
constraints are satisfied. Finally, the network takes the action terms of response time, the proposed algorithm outperforms
x∗t , obtains a reward Q∗ (ht , x∗t ), and adds the obtained state- FF method by 50%.
action pair (ht , x∗t ) to the replay memory. In the policy action Compared to the above studies, which assume that the
updating phase, a group of training samples is extracted from wireless channel state information (CSI) is static and known
the memory to train the DNN, which appropriately updates by mobile users, the authors of [262] present an offloading
its offloading policy from θt to θt+1 . The new policy θt+1 is mechanism in a static and time-varying MEC system. Under
used in the next time slot to generate offloading decision x∗t+1 a static channel state information, the offloading problem is
based on the new channel ht+1 . Simulation results show that formulated as a noncooperative exact potential game (EPG),
the DROO algorithm minimizes the latency by more than an where each mobile user offloads its computation tasks to the
order of magnitude compared to the benchmark approaches. edge server to selfishly maximize its processed CPU cycles
In [258], the authors investigate the task offloaded schedul- in each time slot and reduce its energy consumption. Then,
ing problem in edge computing to minimize the long-term under time-variant and unknown channel state information,
cost defined as a trade-off between task latency and energy a payoff game theory is adopted, which is also proved to
consumption. To this end, a novel DRL-based algorithm called be an EPG. In this case, the offloading problem is solved
DRLOSM is proposed to solve the problem, which is mod- using a Q-learning method and a best-response approach [263],
eled as MDP. To improve the training efficiency, a proximal which helps mobile users to adapt their offloading policies to
policy optimization (PPO) algorithm is introduced, which is dynamic wireless environments. The simulation results prove
a policy gradient scheme with good stability and reliability that the proposed mechanism outperforms the local processing
[259]. Additionally, a convolutional neural network (CNN) is and random approaches, and achieves at least 87.87% average
integrated with the DNN scheme to better extract the features payoff compared to the full CSI case.
of the task queue. The DNN scheme estimates the offloading In [264], the authors propose a novel architecture called
scheduling policy. In the training phase of the proposed Space-Air-GroundEdge (SAGE) to offload task-intensive ser-
algorithm, two DNNs are initialized one with the parameter vices in maritime MEC networks to jointly minimize latency
θold for sampling πθold , and another with θ for optimizing πθ . and energy consumption. The offloading problem is formu-
In the sampling phase, N trajectories are sampled following lated as Multi-Armed Bandits (MABs) learning problem. The
Algorithm 2 D-DRL Algorithm Proposed in [266] each user to determine the approximately optimal computation
1: Input: computational tasks, states, and actions; offloading scheme directly from game history without any
2: Output: Optimal offloading policy; preliminary information about other users. The D-DRL has
3: for user n ∈ N do the following components: actor-critic network, replay buffer,
4: Initialize the learning parameters; policy optimizer, and actor-critic laws updating. Firstly, each
5: end for user initializes the parameters of their actor and critic networks
6: for time slot k ∈ 1, 2, ... do (line 2). At each time slot, each user notices its bandwidth and
7: for user n ∈ N do updates its parameters (lines 4-6). Then, each user inserts its
8: Observe bkn and update its observation ok−1n into previous observation, new observation, strategy, and reward
k
on ; . bkn is the user n bandwidth at time k into its replay buffer (line 8). After that, the current notice of
9: n; each user is considered as the input of its actor-network and
10: Store (ok−1 k−1 k k−1
n , xn , on , rn ) into the replay buffer;
a part of its input data is uploaded to the edge sever based
k on the output of the actor-network. Next, each user calculates
11: Input on into actor network πθn and determine the
size of input data xkn uploaded to the edge server. its reward and updates its actor and critic networks for each
12: Compute its reward rnk = un (xkn , xk−n ) time slot. The data stored in their replay buffers are taken as
13: end for one mini-batch, and they update the actor and critic networks
14: if k%D == 0 then by calculating the gradients (lines 13-16). Finally, each user
15: for m ∈ 1, 2, ...M do optimizes its actor and critic networks by mini-batch stochastic
16: for user n ∈ N do gradient and clears its replay buffer. Simulation results show
17: Compute the gradients; that the D-DRL algorithm outperforms the baselines algo-
18: Update user n actor and critic networks in rithms. For instance, the D-DRL takes about 8000 time slots
every time slots D; to converge to the stable state, while the MAPPO algorithm
19: end for [267] and MAA2C algorithm [268] take about 14000 time
20: end for slots and 18000 time slots, respectively.
21: Clear the replay buffer;
22: end if C. Trade-Off Between Execution Delay, task drops, queueing
23: end for delay, failure penalty, and cost
While previous surveyed works focused on a single objec-
tive or bi-objective offloading problems, the authors of [269]
MAB problem is a classic reinforcement learning problem of
present a trade-off analysis between the execution delay, task
exploration and exploitation [265]. To solve the problem, an
drops (i.e. once the task queue is full), queueing delay, failure
Upper Bound of Confidence interval (UCB) based algorithm
penalty, and execution cost for the offloading decision. Also,
is proposed. Firstly, the proposed algorithm analyzes the
compared to [262] which considers only the CSI, in [269], the
historical records and the associated reward and cost values
authors consider both the task queue state, the energy queue
of the maritime IoT devices to find the edge sever that
state, and the CSI between the mobile users and the base sta-
will be selected. Then, the algorithm updates the confidence
tions. The task offloading decision is formulated as MDP. Two
interval of each server forward degree and the number of times
DDQN learning algorithms are proposed to train the optimal
that the server is selected. Finally, the edge server with the
task offloading policy without any prior knowledge of CSI.
lowest regret value that satisfies the offloading QoS is selected
Simulation results show that the proposed algorithms called
as the optimal server. The simulation results prove that the
“DARLING” and “Deep-SARL” are better than the baselines
proposed algorithm outperforms the traditional UCB and -
in terms of long-term utility. The proposed algorithms obtain
greedy algorithms under various conditions in terms of latency
an optimal trade-off among the task execution delay, the task
and energy consumption.
drops, the task queueing delay, the task failure penalty, and the
The authors of [266] propose an algorithm called D-DRL
MEC service cost compared to the baselines. Furthermore, the
(Algorithm 2), which is based on DRL and differential neural
Deep-SARL algorithm outperforms the DARLING algorithm
computer (DNC) for decentralized computation offloading in
by considering the additive structure of the utility function.
edge computing to achieve the optimal offloading policy. DNC
The drawback of both previous surveyed works is that the
is a particular RNN with internal memory, which is capable
offloading decision scheme does not take into account the
of training and remembering the previous hidden states of the
application security and privacy requirements. Hence, the next
inputs data. Hence, with DNC, not only the learning process
section surveys studies that address the security-awareness and
is accelerated but also the agent can continue learning policy
privacy-awareness task offloading problems.
when the network is uncertain and time-varying. Compared to
previous approaches which assume that all users should share
their information, in [266], the users have the possibility to D. Trade-Off Between Privacy, Execution Delay, and Energy
share or not their QoS’s information. The offloading problem Consumption
is formulated as a “multi-agent partially observable Markov The protection of the mobile’s user location and usage
decision process (POMDP)” and an algorithm based on DRL pattern privacy in MEC networks while minimizing the de-
is proposed to solve the problem. This approach enables lay and energy consumption of the offloaded tasks is the
Algorithm 3 RL-Based Privacy-Aware Offloading Algorithm computes the amount of the harvested energy denoted by ρk
for healthcare IoT Devices Proposed in [270] and evaluates the current battery level bk . Hence, the state
1: Input: Healthcare data, states, and actions; is selected as s(k) = {C1k , χk , hk , ρk , C0k , bk }, where C0k
2: Output: Optimal offloading policy; indicates the healthcare data records in the buffer. Finally, the
3: Initialize the learning parameters α, γ, and δ; offloading policy x(k) = [xk0 , xk1 ] is chosen with probability
4: HotBooting process; (1 − ) and other feasible offloading decisions are randomly
5: for time slot k ∈ 1, 2, 3... do selected with a small probability. The simulation results show
(k) (k)
6: Observe C1 , C0 b(k) ; that the proposed RL-based mechanism converges faster than
7: Compute χ , h(k) , and ρ(k) ;
(k)
the CMDP-based method proposed in [271]. For instance,
8: s(k) = {C1k , χk , hk , ρk , C0k , bk }; when the required privacy of the user is equal to 11, the
9: Divide the healthcare IoT data with size of C0k + proposed algorithm saves 40% of time slots to meet this
k
C1 into N equivalent tasks; requirement compared to the CMDP-based approach. Com-
10: Choose x(k) = [x0k , x1k ] using -greedy policy; pared with the CMDP-based method, the proposed algorithm
(k)
11: Offload x1 (C0k + C1k healthcare data to the also saves 9.63% of the energy consumption cost and reduces
(k) 68.79% of the computation latency while improving 36.63%
edge device, process x0 (C0k + C1k of the data
locally, and store the rest in the replay buffer; of the privacy level for a larger time slot.
12: Evaluate the achieved privacy, total energy con- Another idea aiming at maximizing the user privacy level
sumption, and computation latency; while minimizing the offloading latency and energy consump-
13: end for tion in a MEC-based blockchain network is introduced in
[273]. When compared to the previous work, the authors
of [273] address not only the computation tasks offloading
problem but also the blockchain mining tasks offloading issue.
main objective in [271]. To achieve this goal, the authors
In the MEC-based blockchain network, each miner (i.e, mobile
formulate the optimization problem as a constrained Markov
user) can offload its IoT processing and mining tasks to the
decision process (CMDP) and solved it using Q-learning and
MEC server. To achieve this goal, the authors propose a DRL-
Lagrangian approaches. The proposed scheme is described
based algorithm that can efficiently find optimal offloading
as follows: at each time slot n, the mobile user takes an
actions without any prior knowledge of the environment
action an = arg min Qn (sn , a) based on the observation of
a∈A dynamics. The authors first propose a QL-based offloading
the buffer and channel state sn . Then, it observes the next algorithm that enables miners to obtain optimal offloading
state sn+1 . Next, it updates the Qn (s, a) of the Q-function decisions (see Algorithm 4). An -greedy policy is used to
Q(s, a) and the estimate value of the Lagrangian multiplier. balance the exploration/exploitation and to update the QL. At
The numerical experiments prove that when the privacy level each time step, the miner selects an offloading action based
of the mobile user is equal to 0.6, the extra delay and energy on the blockchain state and evaluates the privacy value and
costs caused by the baseline algorithm are 100%, while the system costs (lines 6-12). After performing each action, the
delay and energy costs incurred by the proposed algorithm miner goes to the next step and updates the new state (lines
are 45%. However, the proposed algorithm suffers from a slow 13-15). This process is repeated until the optimal offloading
learning convergence. policy is obtained. To overcome the slow convergence of
To overcome the drawback of the above study, the authors of the QL-based algorithm for a larger state-action, the authors
[270] propose a novel offloading algorithm with the objective propose another algorithm called DRLO that uses a DNN to
to protect both the user location privacy and the usage pattern approximate the Q-values instead of using the traditional Q-
privacy while minimizing the computation latency and the table. The simulation results show that the DRLO algorithm
energy consumption cost of healthcare IoT devices. This is obtains a better long-term reward compared to the conventional
accomplished by an RL-based algorithm that achieves the QL-based method. For instance, when the trade-off value is
optimal offloading decision while protecting the user’s pri- equal to 0.8, the experiment results proves that the convergence
vacy and minimizing both the energy consumption and the of the DRLO algorithm is 9% higher than that of the QL-based
computation latency of the IoT devices. To accelerate the scheme. The experiment results also prove that the DRLO
learning process a transfer learning technique, i.e., PDS (Post algorithm outperforms the benchmark algorithms in terms of
Decision State) scheme [272] is used. The proposed algorithm offloading cost, energy consumption, and privacy-preserving.
is described as follows (see Algorithm 3): at each time slot k, For example, for mining 100 kb blockchain transactions, the
the IoT device observes the current state s(k) that depends privacy level of the DRLO algorithm is 5.5% better than the
on the new generated healthcare data size and its priority, conventional QL-based approach and 13.4% better than the
the state of the current radio channel, the current renewable CMDP-based method proposed in [271].
energy generated, the previous computation records in the
buffer, and battery level of the IoT device. After evaluating
the healthcare data of size C1k , the IoT device calculates the E. Lessons Learned from ML/DL-based Task Offloading
priority of the healthcare data denoted by χk , and evaluates A summary of studies on ML and DL for task offloading in
the channel power gain denoted by hk . Based on historical MEC is illustrated in Table XI. The majority of ML and DL-
data and the previous offloading experiences, the IoT device based offloading decision algorithms aim to minimize latency
TABLE XI: Summary of Studies on ML and DL for Task Offloading in MEC

Ref. Learning Type Algorithm Mathematical Model Simulation Tools Optimization Criteria
2020, [266] Deep RL DRL Game + POMDP Python-Based Minimize the energy consumption and delay cost.
2019 [274] SL SVM - - Maximize the throughput.
2020, [273] Deep RL QL+DQN MDP - Maximize the user privacy level while minimizing the offloading latency
and energy consumption of the mobile user device.
2018, [262] RL QL Game theory MATLAB Minimize the energy of the mobile device.
2019, [269] Deep RL DDQN MDP Tensorflow Trade-off analysis between the execution delay, task drops, queueing delay,
failure penalty, and execution cost.
2020, [261] RL CART MDP Cloudsim Minimize the energy consumption and response time.
2017, [271] RL QL CMDP - Protect the mobile’ user location and usage pattern privacy while minimiz-
ing the delay and energy consumption of the offloaded tasks.
2019, [270] RL QL+TL+PDS MDP - Protect both the user location privacy and the usage pattern privacy while
minimizing the computation latency and the energy consumption cost of
healthcare IoT devices.
2019, [260] RL DRL + DNN - - Minimize the energy consumption while satisfying the slowdown.
2019, [12] DRL Seq2Seq MDP - Minimize the latency of the application.
2020, [258] Deep RL PPO + CNN + DNN MDP Tensorflow Minimize the long-term cost defined as a trade-off between task latency
and energy consumption.
2021, [256] MRL Seq2Seq NN MDP Tensorflow Minimizes the total latency.
2020, [257] Deep RL DNN MIP non-convex Tensorflow Minimize the latency while maximizing the weighted sum computation
rate of all offloaded devices.
Algorithm 4 QL-Based Task Offloading Algorithm for Mobile additional execution delays, including the time to send the
Blockchain Networks Proposed in [273] offloaded tasks to the MEC server, the time to execute
1: Input: blockchain transaction data, channel states, and the offloaded tasks on the server, and the time to send
actions; the execution results back to the IoT device. Especially,
2: Output: Optimal offloading policy; there is a trade-off between the energy consumption of the
3: Initialize the learning parameters α, γ, and ; IoT device and the execution delay. For this reason, the
4: Initialize Q(s, a) value function, and Q-table Q(s0 , a0 ) majority of the surveyed works aim to find a mechanism
5: Set t=1; that minimizes the energy consumption of the IoT device
6: while t ≤ T do while satisfying the execution delay.
7: Observe blockchain transaction (D1t , D0t ); • The performance of an ML/DL-based offloading
8: Estimate the channel gain state g t , and set st = method is strongly related to the methods used to train
{D1 , D0t , g t };
t the optimal offloading policy. The reason is that the
9: Select a random action at with probability , otherwise training method determines the long-term reward (i.e., the
a = argmaxQ(st , a, θ);
t objective function). Also, the works that used two training
10: Offload data processing task xt (D1t + D0t ) to edge methods (e.g., [76]) outperform those that use a single
server or execute xt (D1t + D0t ) locally on mobile device; training method (e.g., [274]) because with two training
11: Calculate the reward rt = P t (s, a) + Rnmining + models, each model can train its optimal offloading policy
t
C (s, a); . Rnmining mining reward of miner n to cooperatively optimize the long-term reward.
12: Estimate the privacy level P t (s, a) and the system cost • For delay-sensitive and real-time applications,
t ML/DL-based methods outperform the traditional
C (s, a);
13: set st+1 = {D1t+1 , D0t+1 , g t+1 }; methods for task offloading in MEC. Traditional
14: update Q(st , at ) using Bellman equation (2); offloading techniques (e.g., approximation, heuristic, and
15: t ← t + 1; meta-heuristic) are computationally expensive. Therefore,
16: end while they are not suitable for delay-sensitive and real-time
IoT applications (e.g., online gaming, virtual reality,
and augmented reality). Also, compared to traditional
or to find a proper trade-off between the energy consumption offloading approaches, the ML/DL-based offloading
at the IoT device and the latency. Besides, most of the studies methods have the ability to predict both the delay
use the DRL method or a combination of ML and DL methods sensitivities of each application and the MEC servers’
to solve the offloading problem because these approaches computation capabilities by learning from historical
converge faster than the standard reinforcement learning. data and previous offloading experiences. For instance,
From the surveyed papers focused on task offloading, we the DRL-based offloading algorithm outperforms
learned the following key facts: the heuristic HEFT-based and Round-Robin-based
• Designing an efficient task offloading scheme is a
algorithms in terms of latency [12].
crucial challenge in MEC systems. The main reason is • The DRL-based offloading algorithm converges faster
that, although tasks offloading to MEC servers minimizes than the traditional RL-based algorithm when the
the energy consumption of the IoT device since the number of states and actions spaces is large. This is
execution does not have to be done locally, it also incurs due to the fact that the traditional RL method uses Q-
table to approximate the Q-values, while the DRL method Algorithm 5 The QL-HEFT Algorithm [244]
uses neural networks (e.g., DNN or DQN) to approximate Input: Tasks graphs;
the Q-values [273]. Therefore, for large state and action Output: Makespan;
spaces, the RL method leads to a long convergence time 1: Initialize Q-table, learning rate, and discount factor;
compared to the DRL-based method. 2: Calculate the immediate reward (ranku );
• Transfer learning (TL) can significantly accelerate the 3: repeat(for each episode)
training process and improve the performance of a 4: Randomly select an entry task as current task Tc ;
DL-based offloading algorithm. The reason is that the 5: repeat(for each episode)
TL method can learn from an existing offloading model to 6: Randomly choose a legal task (expected Tc ) as the
solve a similar offloading problem [275]. In the training next task Tnext ;
phase of TL, the offloading model is divided into multiple 7: Update Q(Tc , Tnext ) by Eq. (2)
sub-offloading models, where each sub-offloading model 8: Tc ← Tnext ;
can learn from other similar sub-offloading models to 9: until Tnext is terminal;
accelerate its learning process. Transfer learning requires 10: Obtain a task order according to the updated Q-table;
only small targeted training datasets to obtain high ac- 11: Allocate a processor to each task;
curacy [276]. Therefore, TL reduces the re-training time 12: Obtain the Makespan;
and consumes less amount of bandwidth. 13: until convergence (makespan no longer changes);
VII. ML/DL-BASED M ETHODS FOR TASK S CHEDULING

IN MEC
sorting the original order, i.e., the HEFT-based task order. A
In this section, we survey state-of-the-art ML and DL random selection approach is used to establish enough training
methods for task scheduling in MEC. The main objective in the state s and transfer to the state s0 (line 4). After the
of the works focused on the ML/DL-based methods for task agent selects an action a in the state s, the QL approach is
scheduling is to minimize the execution time (VII-A), to used to update the corresponding Q-value Q(s, a) in a Q-
minimize energy consumption while satisfying response time table (line 7). The iteration process continues until a final Q-
(VII-B), to find a proper trade-off between response time and table is obtained. After obtaining the optimal task order, the
utilization costs of MEC resources (VII-C), and to minimize task with the highest priority is executed on the server that
the communication cost (VII-D). achieves its minimum earliest finished time (line 11). The QL-
HEFT algorithm is compared with HEFT D, HEFT U, and
A. Minimization of Execution Time CPOP algorithms using CloudSim. The experimental results
One of the advantages of ML and DL techniques for task show that the QL-HEFT algorithm outperforms these three
scheduling problems is their ability to minimize the appli- algorithms in terms of makespan and speedup. The drawback
cation’s tasks execution time by predicting the computation of the QL-HEFT algorithm is that it requires a specific learning
capabilities of the target environment. To minimize the total rate and discount factor to converge to the optimal solution,
execution time (or makespan), the authors of [277] propose which is time-consuming.
a two-phase learning-based algorithm, which schedules the While the authors of [244] use the QL method to prioritize
data-intensive tasks into a cluster of resources. Firstly, the the tasks, in [279], the authors propose a novel scheduling
algorithm selects the cluster containing the nodes with the method called DeepSoCS that can learn the best task prioriti-
lowest data communication cost. Then, an adaptive assignment zation by using DRL. DeepSoCS comprises two main phases,
policy based on Q-learning is used to select a proper node in namely the task ordering phase and the server selection phase.
the selected cluster. The adaptive assignment policy comprises DeepSoCS employs the server selection strategy of the HEFT
one global broker agent that selects the cluster with minimum algorithm, i.e., the server that achieves the earliest finish time
data communication cost, and several local broker agents executes the ready task. In the task prioritization phase, two
within the selected clusters which select the proper node that Message Passing Neural Networks (MPNNs) capture the task’s
will execute the task. The experimental results show that the features (i.e., task dependencies and communication costs).
Q-learning-based scheduling algorithm outperforms the HCS The first one denoted as g1 takes a DAG as an input and
[278] algorithm in terms of makespan for different workloads calculates the task’s features by considering information about
configuration. its neighbor edges. The second one denoted as g2 captures
The minimization of the execution time is also the main all task’s features and takes jobs as inputs, then computes
goal in [244]. This is achieved by an algorithm called QL- local and global features of the jobs. Next, tasks’ features are
HEFT that combines the Q-learning algorithm and HEFT constructed by the forward task information. Finally, a task
algorithm. The QL-HEFT algorithm has two main phases: a is selected by using a conventional policy network which is
task prioritization phase for obtaining an optimal task order defined as the probability of taking an action a in a state s. The
using the Q-learning method as described in Algorithm 5, and experimental results show that the DeepSoCS outperforms the
a processor selection phase for selecting the suitable server that heuristic HEFT algorithm in terms of makespan and robustness
will execute the ready task. Firstly, the QL-HEFT algorithm under noise condition.
uses the Q-learning method to obtain an optimal task order by Another approach aiming at minimization of the task execu-
tion time is proposed in [280]. Compared to the previous work, The drawback of the above-surveyed scheduling algorithms
the authors of [280] propose a scheduling algorithm based is that they consider only the makespan as a performance
on Deep-Q-Network (DQN) called RLTS. The algorithm has metric, but other metrics such as cost and robustness need to be
three main phases including task ordering, state transition, and considered to improve the convergence of the algorithm. For
task scheduling training process. The application’s tasks are this reason, the authors of [283] present a multi-agent Deep-
ordered based on the upward rank value (as in HEFT [281]). Q-network (DQN) algorithm with reinforcement learning for
In the state transition phase, after the action a is executed, multi-objective application scheduling to minimize both the
i.e., after a task is allocated on the server a, the state space execution time and cost. To achieve this goal, a Markov Game
(i.e., the task’s start time and finished time) change from the Model (MGM) is used, which considers the scheduling goals
present state s to the next state s0 . Then the reward at state as two agents. Each agent determines its actions based on a
s is calculated. In the task scheduling process, the DQN- neural network by mixing the output of the neural network
based scheduling algorithm uses neural networks to calculate with random actions to sample its training data.
the action-value function rather than updating the Q-table by The authors of [245] address not only the makespan min-
the Q-learning approach. Two neural networks namely target imization but also the response time and robustness of the
Q-network and evaluated Q-network are used. The output of algorithm. To this end, they propose an online Q-learning
each neural network is the probability to choose an action. scheduling algorithm that can adapt the task arrival and
A started episode with an empty state space represents that execution processes automatically. The scheduling problem
the tasks are assigned on the servers. For each task, an action is formulated as an MDP and solved by QL approach. The
(server) is selected by -greed policy, which selects the optimal algorithm has three main phases: initialization, action selection
server with probability 1 − and selects a random server with for a task, and learning. The first stage aims to initialize
probability . Then, the reward and the next state are obtained. the “discounted accumulative reward (DAR)” of each action.
The simulation results show that the proposed RLTS algorithm DAR is an objective function used in MDP problem which
outperforms the HEFT and PEFT [109] algorithms in terms considers forthcoming allocations into account. The second
of makespan. For instance, when the number of tasks is equal phase aims to allocate a task to the processing units. If
to 100, the RLTS algorithm outperforms the HEFT algorithm the task is explored with an -greedy exploration probability
and PEFT algorithm by 20% and 16%, respectively. value pe , the allocation is performed randomly. Otherwise, the
In [251], the authors propose a new cluster scheduler allocation is performed using the learning scheme. Finally, the
framework called “Spear” for dependent tasks scheduling to learning phase determines the estimated expected reward of an
minimize the makespan. For this purpose, Spear uses “Monte allocation and the estimated expected reward of a certain task
Carlo Tree Search (MCTS)” and “Deep Reinforcement Learn- type. The experimental results show that the online Q-learning
ing (DRL)”. The MCTS is a search approach for sequential scheduling algorithm outperforms the Min–Min, Min–Max,
decision-making problems where the result is a win or loss. Suffrage, and ECT algorithms in terms of response time and
MCTS keeps a state tree, where the nodes represent a path robustness.
of actions and the edges indicate individual actions. In Spear,
a neural network is first designed to represent a scheduling
policy. Then, the network is trained to minimize the makespan. B. Minimization of Energy Consumption While Satisfying Re-
In the DRL model, a neural network takes as input a list sponse Time
of ready tasks and the state of the cluster, then provides a The scheduling and resource allocation problem in IoT
scheduling action. The DRL method comprises three phases: devices is addressed in [284] with the objective to minimize
state, action, and reward. In the state phase, Spear first adopts the energy consumption at the IoT device while satisfying
the b-level approach as a task prioritization scheme. Since the response time. The authors formulate the problem as an
the b-level only obtains information about the execution time MDP problem and solved it using a reinforcement learning
of the tasks, the b-load, which accumulates the load of the approach. Particularly, they present an algorithm called DA-
tasks is also considered. The b-load is defined as the product DRLS that takes the advantage of DNN and RL. The algorithm
of the task execution time and the resource demands. In the can rapidly adapt to a sudden change in the IoT device
action phase, once the DRL agent is called for an action, it requirements by continuously observing “demand drift” and
draws one action from the actions space. Then, the tasks in dynamically updating the scheduling policy. Simulation results
the cluster are executed for one time slot. Finally, in the last show that the proposed algorithm outperforms the benchmark
phase, the agent received -1 reward each time the processing algorithms in terms of energy consumption (reduces by 36.7%)
action is selected in order to obtain the scheduling length. and response time (reduces by 59.7%). It also increases
The total accumulated reward is equal to the negative of the the resource utilization by at least 10.4% compared to the
given graph makespan. The experimental results show that benchmark methods.
Spear outperforms the Graphene [282] algorithm in terms of The optimization of both the energy consumption and
makespan. For instance, on average, Spear surpasses Graphene application performance is the main objective in [285]. This
by 90% in terms of makespan. In terms of algorithms running is pursued by an algorithm called ISVM-Q that combines
time Spear outperforms Graphene algorithm since the average the Q-learning and SVM methods for scheduling the ap-
running time of Spear is 500 seconds and the one of Graphene plication tasks in the wireless sensor network (WSN). The
is 1000 seconds. WSN comprises n sensor nodes which are always learning
until the ending learning condition is reached. The SVM scheme (i.e., users tasks are randomly scheduled to the VMs),
model is improved by using a linear basis function to get equal scheduling scheme (i.e., users tasks are ordered then
the current state of the system, and the Q-value (i.e., the scheduled to the VMs), and mix-scheduling scheme (i.e.,
output of the improved SVM model) is obtained by estimating a user’s task is first randomly scheduled to a VM. If the
the regression model. Then, the algorithm selects an action remaining buffer memory of the VM is equal to zero, then
that corresponds to the current estimated Q-value. Finally, the rescheduling the task to the VMs by maximizing the rest
selected action is executed to get the corresponding reward, buffer memory). In terms of response time, the experiment
and the Q-value is updated. Experiment results show that results show that the Q-sch algorithm outperforms the random
the ISVM-Q algorithm outperforms the baseline algorithms scheduling, mix-scheduling, and equal scheduling schemes by
in terms of energy consumption and application performance. 1.85%, 2.45%, and 4%, respectively, when the task arrival
For instance, the performance of ISVM-Q is 0.55% higher rate varies. The results also prove that the Q-sch algorithm
than that of the IQ algorithm proposed in [286]. improves the resource utilization while reducing the average
The minimization of the execution time of the task and response time. The drawback of the proposed approach is that
energy consumption is the main goal in [287]. The authors it used a random scheduler, i.e., the first Come First Serve
investigate the task allocation problem for Multi-task Transfer (FCFS) approach to process user requests. However, the FCFS
Learning (MTL) in edge computing. The main idea consists scheme is inappropriate in a cloud/edge environment because
of dividing a machine learning-based application into multiple the performance of a cloud/edge platform depends on how
machine learning tasks, where each task can learn from well it can satisfy users’ requirements specified in the SLA.
other tasks to improve its performance. To achieve this goal, When compared to the previous work, the objective of [248]
they propose a “Data-driven Cooperative Task Allocation” is to minimize not only the CPU utilization cost but also
approach based on clustered reinforcement learning (CRL) the Ram utilization cost. The authors use a DRL approach
and SVM models. They formulated the problem as an MDP to model the task scheduling problem. In this approach, the
< S, A, P, r, λ >, where S, A, P , r, and λ denote the states; state represents an “offloaded task”, the action represents a
actions; transition probability reward function, and discount VM, and the reward represents the task execution cost on
factor for future rewards, respectively. The CRL model makes the VM. A manager node observes the resource utilization
allocation decisions based on the relation between the obser- of each VM to populate the reward values of each action.
vations of the current environment and those previously seen. Then, the manager uses the DRL algorithm to assign the new
Concerning the SVM model, it predicts the task importance arrival task to the optimal VM. The proposed DRL approach
and dynamically adjusts the CRL model allocation decisions integrates a Long Short-Term Memory (LSTM) layer, which
based on real-time data. The experiment results show that keeps track of the long-term dependencies that exist between
the proposed mechanism reduces the processing time by 3.24 the tasks’ requirements and VMs’ specifications. The DRL
times and saves energy consumption by 48.4%. approach integrated with LSTM helps to improve the decision-
making process and reduce the runtime of the DRL by storing
the long-term dependencies into the LSTM’s memory cell.
C. Trade-off Between Response Time and Resource Utilization The experiments based on real-world datasets show that the
Costs proposed method outperforms the Shortest Job First (SJF)
In [288], the authors address the task scheduling problem algorithm by 28.8%, the RR algorithm by 14%, and the PSO
with the objective to minimize the task response time while algorithm by 14% in terms of CPU utilization. The results
maximizing the VM resources utilization. This is achieved also prove that the proposed algorithm minimizes the RAM
by an algorithm based on queueing theory [289] and re- utilization cost by 31.25%, 25%, and 18.78% compared to the
inforcement learning schemes. Since the M/M/1 queueing SJF, RR, and PSO, respectively.
system is hard to analyze in a cloud/edge environment (due While energy consumption is not considered in the two
to the complexity and dynamicity of the system), the authors previous works, the main objective of [243] is to minimize
propose a new queueing model divided into three submodels, the energy consumption while reducing task response time and
namely, the task scheduling submodel (TSSM), task execution maximizing CPU utilization. To this end, the authors propose
submodel (TESM), and task transmission submodel (TTSM). a Q-learning-based task scheduling algorithm in data centers
The TSSM receives user tasks (requests) and pushes them that has two main phases: task dispatcher for assigning user
into a global queue Gq using FIFO approach. Then, the task requests to servers in the data center and task scheduling phase
dispatcher implemented in TSSM pushes user tasks to the for ordering the assigned tasks in each server. The dispatcher
corresponding buffer queue of the TESM module, which is (i.e., global scheduler) is implemented at the data center level,
then assigned to the VMs resided in TESM. Next, the VM where an M/M/S queueing system (i.e., a single queue with
submits the execution results to TTSM module. Finally, the more than one parallel server) is used to reduce the energy
TTSM module transmits the execution results to the requesting consumption in the data center. In this M/M/S queueing model,
users. The main objective of the task dispatcher is to schedule the task arrivals are supposed to follow the Poisson distribution
the tasks in Gq to VM resources. This is accomplished by model, and the server’s service time follows a negative expo-
a Q-Learning approach which continuously interacts with nential distribution. It is also supposed that the servers provide
the environment to obtain the optimal policy. The proposed the same type of services. The task dispatcher distributes
method called Q-sch is compared with the random scheduling requests to the servers uniformly when all the servers in the
data center are treated as a whole. In this way, the average scheduling decision based on the task models and data sources.
task response time is shorter compared to M/M/1 queueing The generated vector is then passed to the neural network for a
system [290], which considers the servers in the data center scheduling decision. This process is repeated until all tasks are
as independent. After a task is assigned to a server, it is first completed. The RL approach is compared with Round Robin
pushed onto a queue Q, waiting to be scheduled to a specific (RR), First Come First Served (FCFS), and Random (RN). The
VM. A time window is used to determine when to process the authors use three bioinformatic applications to evaluate the
tasks in Q. For each time window, the scheduler first removes performance of the algorithms, namely pangenome analysis,
all the tasks from the queue Q, then assigns each task to a phylogenetic profiling, and metagenomics. The experimental
specific VM by pushing it to the buffer queue implemented results based on the pangenome analysis application prove
in that VM. A dynamic task prioritization approach based on that the RL approach is 20% faster than the FCFS algorithm.
task laxity and task lifetime is used to order all the tasks in The RL approach also has 50% lower network transfers than
Q, and a Q-learning-based approach is employed to reward the FCFS algorithm and achieves almost zero failed tasks.
task assignments to simultaneously optimize the task response For the phylogenetic profiling application, the RL approach is
time and CPU utilization. The Q-learning-based scheduler 25% faster than the RR algorithm in terms of execution time.
continuously learns from the environment to obtain the optimal Finally, for the metagenomics application, the RL approach
policy that meets this objective. That is, if a is an action (i.e., is 16% faster than the FCFS algorithm and reduces by 24%
assigning the task t to V Mi ), the scheduler will receive a the network transfers cost compared to the FCFS algorithm.
reward of 1 in case that (i) V Mi can satisfy the deadline of In sum, the experimental results show that the scheduling
the task t; (ii) V Mi is the VM with the smallest waiting time; based on the RL approach outperforms the RR, FCFS, and
and (iii) V Mi gives the best CPU utilization for the task t. RN algorithms in terms of execution time and network traffic
Another idea aiming at minimization of the MEC resource cost.
utilization cost is introduced in [76]. Compared to the previous In [292], the authors address the problem of N independent
studies, the authors of [76] also minimize the rate of missed wireless links scheduling in a dense wireless network with
tasks. To this end, the authors propose a two-phase scheduling the objective to maximize the sum-rate. The wireless link
algorithm based on deep learning. The first phase of the scheduling problem consists of selecting a subset of links in
algorithm determines the location to execute the task (i.e., on any given transmission time slot with the goal to maximize
edge or cloud). The second phase of the algorithm assigns the a certain network service function of the reached long-term
task on the edge or cloud according to the execution location average rates. To achieve this goal, the authors propose a deep
determined at the first phase. To determine the location of the learning-based algorithm, which trains the optimal scheduling
task execution, i.e., whether the IoT task will be executed on policy based on the geographical locations of the neighboring
the edge or cloud nodes, three different clustering methods transmitters and receivers. Particularly, they propose a DNN
based on self-organizing map method are used, namely “task with three main phases: convolution phase, fully connected
clustering by self-organizing map (SOM)”, “task clustering by phase, and feedback connection phase. The convolution aims
hierarchical self-organizing map (H-SOM)”, and “task cluster- to capture the interference patterns of neighboring links us-
ing by autoencoder and self-organizing map (AE-SOM)”. The ing geographic location information. In this stage, a spatial
SOM method directly sends the parameter of the tasks for convolution approach is used to estimate the total interference
clustering. Concerning the H-SOM, it is used in every layer. generated by the transmitter and the receiver. Concerning the
In AE-SOM method, the encoder extracts the task features fully connected phase, it is responsible for capturing the non-
(e.g., task type, task priority, task privacy, task execution time, linear functional allocation of the optimized schedule. It takes
etc.) before clustering. After the clustering step, the Earliest- a vector of links as input and gives an output xi ∈ [0, 1], where
Deadline-First (EDF) algorithm is used to schedule the tasks xi = 1 if the link is scheduled, and xi = 0 otherwise. Finally,
in each cluster to the edge or cloud nodes. The experimental the feedback connection phase is proposed between each
results show that the AE-SOM method has a better result in iteration of the neural network to update the optimization’s
terms of missed tasks rate. For instance, the AE-SOM method state. This phase is described as follows: after the execution
is better than SOM method by 3.19% and H-SOM method by of (t − 1)th of the convolution phase and the fully connected
4.23% in terms of missed task rate. In terms of memory and phase, the vector xi ∈ [0, 1] which represents the activation
bandwidth costs, the AE-SOM method has 305.75 (G$) less status of the links is obtained. Then, a new convolution stage
than the SOM and H-SOM average cost. and fully connected stage begin with density grids so that the
activation status for all N wireless links are updated for the
next interference estimations. Finally, the scheduling decisions
D. Minimization of Communication Cost of the N links are determined after a fixed number of neural
In [291], the authors investigate a machine learning ap- network iterations by quantifying the x vector from the last
proach to predict the tasks execution time and the scheduling iteration into binary values.
failure probability. A Seq2Seq NN and RL approaches are used
to improve the scheduling decisions. The proposed method E. Lessons Learned From ML/DL-based Task Scheduling in
can identify a near-optimal scheduling decision by presenting MEC
all possible scheduling choices to the system. During the A summary of studies on ML and DL for task scheduling in
task execution phase, a vector is generated for each possible MEC is illustrated in Table XII. The majority of ML/DL-based
TABLE XII: Summary of Studies on ML and DL for Task Scheduling in MEC

Ref. Learning Type Algorithm Mathematical Model Simulation Tools Optimization Criteria
2020, [293] SL CNN - - Security-awareness of deep network embedded de-
vices
2019, [291] SL+RL Seq2Seq - Okeanos Cloud Minimize the execution time, network traffic cost,
and failure rate
2021, [76] USL + SL SOM+AE - MATLAB Minimize the rate of missed tasks and cost
2020, [280] RL DQN - - Minimize the execution time and running time
2019, [292] USL DNN - - Maximize the sum-rate
2020, [287] RL + SL SVM + Clustered RL MDP AIOPS Minimize the execution time and energy consump-
tion
2019, [285] RL + SL QL + SVM - - Minimize the energy consumption while maximizing
the application performance
2014, [245] RL QL MDP MATLAB Minimize the response time
2018, [277] RL QL - OptorSim Minimize the execution time
2016, [294] RL QL - CloudSim Minimize the execution time
2020, [243] RL QL Queueing theory CloudSim Minimize the energy consumption while reducing
task response time and maximizing CPU utilization
2019, [284] RL DRL MDP Python-based Minimize the energy consumption at the IoT device
while satisfying the response time
2019, [283] RL DQN Markov Game EC2 Cloud Minimize both the execution time and execution cost.
2020, [279] RL DRL POMDP DS3 Minimize the execution time
2019, [251] RL DRL MCTS Python(Theano) Minimize the execution time and running time
2019, [248] RL+SL DRL+LSTM MDP Python-Based Minimize the CPU utilization cost and Ram utiliza-
tion cost
2019, [244] RL Q-Learning - CloudSim Minimize the execution time.
scheduling methods aim to minimize the application execution CRL model scheduling decisions.
time or to find a trade-off between response time and resource • The efficiency of ML/DL-based scheduling schemes
utilization costs. Besides, most of the studies used QL and is strongly related to the type of algorithms used
DRL methods to solve the task scheduling problem. for both task prioritization and server selection. For
From the surveyed papers focused on ML/DL for task instance, when a Q-Learning method is used to calculate
scheduling in MEC, we learned the following main lessons: the task priority and the earliest finish time (EFT) is used
• For a large dataset, ML/DL-based scheduling meth- to select a server for a task, the convergence rate of the
ods outperform the traditional heuristic scheduling algorithm is lower as proved in [244]. This is due to
methods. The first reason is that the performance of a the fact that the Q-learning algorithm uses a Q-table to
learning method strongly depends on the amount and calculate the action-value function, in which each Q-value
quality of the training dataset. The second reason is that must converge before attaining the optimal policy. On the
the ML/DL methods can predict and extract the task’s other hand, when the upward rank value [281], i.e., the
features (e.g., task dependencies, communication costs, critical path approach is used to calculate the task priority,
and QoS requirements) by learning from previous experi- and a DQN method is used to select a server for a task, the
ences [295]. Therefore, ML/DL methods can provide the convergence rate is higher [280]. Therefore, it is crucial
optimal task priority order, which can significantly reduce to choose the appropriate ML/DL method for both task
the scheduling length (makespan). Also, DL methods, in prioritization and server selection.
particular, differential neural computer (DNC) is capable • The running time of DRL-based mechanisms for task
of training and remembering previous hidden states of scheduling can be significantly reduced by integrating
inputs data. Hence, DNC can accelerate the learning the LSTM method in the learning process. The reason
process and enable the agent to continue learning policy is that LSTM can predict the long-term dependencies that
when the network is uncertain and time-varying. exist between the task’s QoS and the specifications of
• Dividing an ML-based application into multiple ma- the MEC servers by exploring its memory cell, in which
chine learning tasks can significantly improve the the previous long-term dependencies have been stored.
scheduling decision. In this way, each ML-task can learn This is also proved in [296], where an automated task
from each other to improve its reward. For instance, by scheduling method based on DRL and LSTM has been
dividing the scheduling problem of multi-task transfer proposed to minimize both the CPU utilization cost and
learning in MEC into clustered reinforcement learning Ram utilization cost.
(CRL) and SVM models, the processing time and energy
consumption is reduced [287]. Furthermore, the CRL VIII. ML AND DL-BASED M ETHODS FOR J OINT
model can make scheduling decisions based on the re- R ESOURCE A LLOCATION IN MEC
lation between each cluster, while the SVM model can While the above surveyed works address the task offloading
predict the task’s features and dynamically adjusts the problem and the task scheduling problem separately, this
section surveys current works addressing the joint resource candidate action with the lowest energy-time cost is selected as
allocation (i.e., joint task offloading and scheduling) problem the solution. After generating the candidate offloading actions,
in MEC using ML/DL techniques. The offloading decision the energy-time cost (ETC) of each action is evaluated, and the
directly impacts the scheduling strategy because the offloaded best offloading action is selected. Finally, the optimal actions
tasks have different QoS requirements (e.g, latency, security, learned in the offloading action generation phase are used to
execution time, etc.), and the MEC resources are limited [4]. update the parameters of the DNN. Simulation results show
Furthermore, the task offloading decision in MEC impacts that the proposed DRL-based algorithm attains up to 99.1%
the application transmission delay and power consumption, of the optimal ETC.
which leads to an extra scheduling length. Hence, it is vi- The authors of [298] investigate a joint resource allocation
tal to jointly address the task offloading problem and task algorithm for hybrid mobile edge computing (H-MEC) sys-
scheduling problem. We classify the research in this area into tems. The H-MEC comprises ground stations (GSs), ground
studies focused on minimization of the energy consumption vehicles (GVs), and unmanned aerial vehicles (UAVs), all
(VIII-A), minimization of the execution delay under energy with MEC-enhanced to enable IoT devices to offload their
constraints (VIII-B), minimization of latency (VIII-C), and computationally intensive tasks. The authors proposed a DL-
privacy-preserving (VIII-D). based online offloading algorithm called “H2O” with the
objective to minimize the energy consumption of all IoT
devices. The H2O algorithm has two main phases: the offline
A. Minimization of Energy Consumption training phase and the online optimization phase. The offline
In [11], the authors investigate the joint task resource allo- training step which requires high computation and storage
cation problem in MEC. They consider a MEC system with N capacities is performed in the remote cloud. To find a training
mobile users with M independent tasks to be offloaded to the sample, which is required to train the DNN, they first propose a
edge servers to minimize the overall offloading cost in terms of clustering method called “large-scale path-loss fuzzy c-means
energy consumption, computation cost, and delay cost. In this (LS-FCM)” to find the positions of UAVs and GVs. The
regard, the authors propose a DQN-based joint task offloading UAVs and GVs are then deployed based on their locations.
and bandwidth allocation algorithm. The authors formulate Compared to the conventional FCM method [299], the LS-
the problem as a DQN problem with state space, action FCM approach does not allow some GS cluster centers to par-
space, and reward function. The state space is considered as a ticipate in the iteration process. Then, a PSO-based algorithm
1 × (N M + 2N ) vector, which involves all users’ offloading called “U-PSO” is applied to solve the offloading decision
decisions xnm and the bandwidth allocations. That is, the and resource allocation problems. After that, a supervised
offloading decision space xnm ∈ {0, 1} for n = 1, 2, ...N and learning algorithm is used to train the DNN that can be used
m = 1, 2, ...M . Concerning the action space, it is defined as an when the number of IoT devices varies. Next, the trained
index selection, which determines how the offloading decision DNN is implemented for online decisions. That is, after an
is changed. This index also indicates if the uplink and down- UE processes its membership values, the DNN outputs its
link bandwidth is increased or decreased for mobile users. offloading decision and resource allocation results. Compared
Finally, the reward of the state-action pair is rs,a ∈ {1, −1, 0}. to the previous work, the H2O algorithm uses both DL and
The experimental results show that the proposed DQN-based meta-heuristic approaches. Moreover, it exploits the advantage
algorithm outperforms the MUMTO algorithm [297] in terms of the PSO algorithm in giving global optimal solutions and
of overall offloading cost and convergence. uses the advantage of DNN in speeding up the real-time
While in [11] the application’s tasks are considered as decision. Also, compared to traditional DL-based methods that
independent, the dependent tasks offloading decision and the need to input the information of all UEs, the H2O is efficient
resource allocation in MEC is investigated in [1]. The main in hybrid MEC networks with a large number of IoT devices
objective in [1] is to simultaneously minimize the task exe- and UEs. The experiment results show that the H2O has better
cution time and energy consumption of the mobile device. To efficiency and accuracy compared to the random offloading,
achieve this goal, the authors first formulate the problem as greedy offloading, and standard PSO offloading approaches.
a mixed-integer optimization problem, then propose a DRL- The drawback of both above-mentioned works on joint
based algorithm with deep neural networks (DNNs) to learn resource allocation is that they do not consider the user’s
the optimal allocation between the states and the actions. The QoE. For this reason, the authors of [300] introduce a DRL-
wireless channels and edge CPU frequency represent the state, based method to address the joint offloaded task scheduling
and the task offloading decisions represent the actions. The and resource allocation in vehicular networks. Compared to the
DRL uses the actor-critic learning scheme, which trains a DNN previous work, the main goal of [300] is to minimize not only
periodically in the actor-network from past experiences to the execution delay but also the energy consumption while
learn the optimal allocation between the states and the actions. maximizing the user’s QoE. Due to the complexity of the
The proposed DRL-based algorithm has two main phases: joint offloading and scheduling problem, the authors divide
the task offloading action generation based on DNN, and the the problem into two sub-problems: vehicle offloading task
offloading policy updating. To generate an offloading decision, scheduling and decision of resource allocation. The first sub-
the output of the DNN (i.e., offloading action) is first quantified problem is solved by a two-sided matching method with the
into candidate offloading actions. Then, the critic network aim to maximize the total utilities reflecting the user’s QoE
evaluates the performance of the candidate actions. Next, the level. An algorithm called “Dynamic V2I Matching (DVIM)”
is introduced to find the optimal match. The DVIM algorithm be executed in the cloud server. After the procedure of the LA-
first initializes the forbidden list (i.e., offloading tasks rejected based algorithm for the offloading decision, LSTM, and QL
by an RSU) and accepted list (i.e., offloading tasks accepted techniques are used for resource provisioning. In particular,
by an RSU). Then, each vehicle i calculates its utility ui,k if LSTM model is used to predict the future number of requests,
its task is offloaded to RSU k. Next, all vehicles make their while the QL is used to find a suitable number of edge servers
preference RSU list Pi in descending order of ui,k . After required to process the dynamic workloads. The experimental
that, the vehicles that have been matched with less than qv results based on real workloads from [301] show that the
RSUs submit offloading requests to the preferred RSU in Pi . proposed hybrid learning technique can reduce the average
The selected RSU is then removed from the preference list. execution time by up to 8.3% compared with the fuzzy-based
After the vehicles submit their requests, the RSUs accept those offloading (FO) algorithm proposed in [302], and by up to
requests that increase the overall utility values. The vehicles 11.3% compared to the post-decision state (PDS)-based online
continue submitting requests until their preference lists are learning algorithm presented in [303].
empty. Concerning the second sub-problem, it is solved by
an improved DRL method called MADD. The MADD algo-
rithm first initializes the experience replay buffer D with N B. Minimization of Execution Delay under Energy Constraints
transitions, the action-value function Q with random weight θ, The task offloading to the MEC servers reduces the en-
and the target Q-network, which gives the temporal difference ergy consumption of the IoT device since the execution is
(TD) target. Then, it schedules the offloading requests. For done on the remote edge servers. On the other hand, the
each phase, a random RSU is selected from the available task offloading, especially in ultra-dense networks also incurs
list with probability . Otherwise, the RSU with the largest additional execution delays, which include the delay to send
Q-value is selected using a greedy approach. After that, the the application to MEC servers and the delay to receive results
immediate reward rt and the next state are observed. In the of the computation from the MEC servers [304]. Therefore,
NN training phase, a DQN randomly samples a transition it is vital to investigate the trade-off between task execution
from the buffer D. For each sample, if the next state is the delay and energy consumption of IoT devices.
last state, then the TD target is rj . Otherwise, the DQN is The authors of [305] present a joint task offloading and
used to calculate the TD target. Then, the gradient descent scheduling decision mechanism to minimize the energy con-
method is used to update the Q-network parameters. Finally, sumption of the application while satisfying the overall execu-
the TD target network parameters and the random probability tion delay. The optimization problem is formulated as a Lya-
are updated every step to accelerate the convergence speed. punov optimization problem. To solve the problem, an online
Simulation results show that the DVIM and MADD algorithms Q-learning algorithm is proposed. It first transforms the joint
outperform the baseline algorithms. For instance, in terms of problem of task offloading and scheduling into Lyapunov drift-
utilities, the DVIM algorithm is 20% better than the greedy plus-penalty (LDPP) optimization, which is a popular method
algorithm and 50% better than the random algorithm. In terms used for the optimization of queueing networks and stochas-
of average QoE, the MADD algorithm outperforms the DQN tic systems [306]. Then, the algorithm obtains the optimal
method by 15%, the Q-learning method by 25%, and the offloading mechanism using a QL-based offloading method.
greedy method by 35%. The QL-based offloading method is formulated as MDP. It first
In [301], the authors use three learning techniques, namely initializes the offloading vector of all tasks with random values.
LA, LSTM, and RL to address the joint computation of- Then, the state transition is decided, either by choosing the task
floading and resource provisioning problems in an edge- that has the highest reward LDP P value with a probability
cloud environment. The main objective is to maximize the 1−δ or by choosing a task randomly with a probability δ. The
CPU utilization while minimizing the execution time and next phase of the QL-based offloading method is the action
energy consumption. To this end, the authors first propose a selection policy. To select an action, the agent first identifies a
learning automata (LA)-based algorithm to make a decision set of actions with a positive Q-value. If there is no action with
about offloading the incoming workload tasks to the edge a positive Q-value, an action is chosen randomly. Otherwise,
servers or cloud servers. If the task is to be computed in it uses -greedy policy to choose an action. Furthermore,
the edge environment, the master edge server submits the job given a state, the agent with -greedy policy chooses the
to the slave servers. For n numbers of requests as input, the action with the maximum Q-value with a probability 1 − ,
LA-based algorithm first initializes the actions probabilities and chooses a random action with a probability , where
denoted by P (1), P (2), . . . , P (n), which are equal. Then, is very small. Next, the Q-matrix is updating by a positive
the first action “a” is selected randomly. In each period, or negative reward generated by the state transitions. Finally,
“a” number of requests are executed in the edge server and the optimal offloading decision is obtained when the reward
cloud server. If the total execution time of the “a” number function cannot generate any positive reward. After obtaining
of requests in the edge server is less than the one in the the optimal offloading policy, the next stage of the algorithm
cloud server, then the selected action is penalized, and the is the scheduling and migration decision to improve resource
corresponding probability is reduced, while the remaining utilization. To this end, a Lagrange migration (L-migration)
actions probabilities are increased. Otherwise, the selected method is introduced. The L-migration method first calculates
action is rewarded. Finally, the algorithm selects the optimal the edge premigration cost also called “premigration marginal
action, which corresponds to the number of requests that can migration cost (MMC)”. After that, the sum of the minimal
MMC and transmission cost is compared with the maximal one dynamic programming (ADP), temporal-difference learning
to decide whether to start the migration. If this sum is greater (TDL), and semi-gradient descent (SGD). The main idea of the
or equal to the maximal MMC value, then no migration is TDL method is to make the learner’s current prediction for the
required. Otherwise, the algorithm starts the iterative process current input pattern more closely match the next prediction at
to determine the optimal migration policy. The task with the the next time step [311]. While most of the RL approaches are
lowest delay requirements has higher priority to migrate in based on Q-value, the proposed algorithm uses TDL with a
order to avoid migration congestion delay for delay-sensitive post-decision state (PDS). The main advantage of the PDS
tasks. It is then assigned to the edge node with the highest is that it does not require information about the transition
load rate. The simulation results show that the proposed joint probabilities to find the optimal action to take. It also has a
offloading and scheduling algorithm outperforms the delay- much smaller state-space compared to Q-value. The proposed
optimal algorithm [307] (which only considers the delay algorithm applies uniformization to the CTMDP model to
cost of task offloading ignoring the energy consumption), the avoid a heavy signaling overhead of the IoT devices. Particu-
energy-optimal algorithm [303] (that only considers the energy larly, the proposed algorithm has five main steps: initialization,
cost ignoring delay cost), and T2E algorithm [308] (that only local state updating, optimal control action, post-decision local
focuses on delay optimization between the edge and terminal state updating, and per-node value functions updating. The first
layer ignoring energy constraints) in terms of energy-saving phase initializes the per-node value functions Vn,k of each IoT
while achieving low delay cost. device n, where k denotes the index of the decision epoch.
The minimization of the application execution delay while Then, when an event ek (i.e., packet arrival or departure)
saving the battery power of the mobile user’s equipment is occurs at the IoT device, the BS fixes k = k + 1, and the
the main goal in [309]. When compared to the previous study, kth decision epoch begins. The BS notifies the second and
the authors of [309] integrate SDN and MEC to propose third IoT devices to update their local states. Based on the
a novel software-defined edge cloudlet (SDEC) framework event ek an action is determined. After that, each IoT device
for task offloading and scheduling. Particularly, QL and co- updates its post-decision local state. Finally, each IoT device
operative Q-learning (C-QL) schemes are proposed to solve updates its per-node value function under the PDS.
the joint task offloading and scheduling problem in SDEC.
The QL scheme aims to minimize the total execution delay C. Minimization of the Latency
(Sumdelay ), while the C-QL aims to reduce the search time of The latency minimization of IoT users in large-scale MEC
the optimal resource scheduling. Like most of the RL learning networks is the main goal in [312]. Toward this end, a
approaches, the proposed QL scheme also has three main DRL algorithm is proposed, which comprises three methods,
elements, including the set of states S, the set of actions A, namely 2r-SAE, ASA, and 2p-ER. The 2r-SAE is a stacked
and reward R. S comprises two components: the Sumdelay of autoencoder approach, which provides quality data to the DRL
the system and the available computation resource capability model by compressing and representing the high dimensional
Cavail of the MEC server. The set of actions A performed data. Hence, the 2r-SAE can reduce the state space and
by the agent is the resource allocation v and the computation improve the learning efficiency of the DRL. Concerning the
ratio α. After executing an action a in each time step, the agent ASA method, it tries to find the optimal action for the DRL
obtains a reward R(s; a). Then, each state-action pair obtains model to produce an offloading decision with the observed
a long-term reward Q(s; a). After that, the agent computes state. The 2p-ER is introduced to enhance the learning process
and stores Q(s; a) in a Q-table. This iteration is repeated of DNN in the DRL model. In the proposed DRL algorithm,
until the Q-learning method converges to the optimal value the agent interacts with the environment in discrete decision
of Q. Concerning the C-QL scheme, it allows agents to learn epochs. At each epoch t, the agent takes an action based
from each other in order to reduce the search time in the on the state st , then the environment generates a reward rt .
proposed QL scheme. Furthermore, the C-QL scheme reduces After that, the ASA is introduced to find the optimal action
the communication time among the base stations by sharing a∗t and the state–action pairs (ht , a∗t ) are putted into the
useful information among them. Simulation results show that experience replay (ER) for DNN (agent) learning. Next, a
the proposed scheme reduces the execution delay by up to batch of transitions is selected from the buffer by priority.
62.10%. Also, the proposed C-Q-learning scheme achieves Moreover, the transition which induces evident loss function
better performance than other benchmark approaches in terms decrease will have higher priority, while the transition which
of delay requirements. cannot enhance the performance of DNN will have the lower
Another idea aiming at minimization of the execution de- priority. Simulation results prove that the proposed algorithm
lay of the mobile application while minimizing the energy outperforms existing benchmarks in terms of latency.
consumption is introduced in [310]. Compared to the previ- A joint spectrum allocation and scheduling in V2V broad-
ous study that used LTE as wireless technology, [310] uses cast communications is proposed in [313]. The main objective
Narrowband-IoT (NB-IoT) to transmit data to the base station is to address the strict latency constraints on V2V links while
(BS). The optimization problem is formulated as a continuous- minimizing the interference to V2I links. To this end, the
time MDP (CTMDP) model, where the states of the system authors propose a DRL mechanism, where a DQN is used
transit only when a packet (arrival/departure) event occurs, to find the optimal policy for the problem. The simulation
but not at every time slot. To solve the problem, the authors results show that each vehicle satisfies the latency constraints
propose a combination of methods including approximate and minimizes the interference to V2I links.
The minimization of the task’s offloading latency is also the • A combination of learning techniques can significantly
main objective in [314]. Toward this end, the authors propose improve the joint resource allocation algorithm be-
an online learning algorithm based on the Multi-Armed Ban- cause each learning method can accomplish a specific
dits (MAB) framework by jointly optimize the task offloading task to maximize the long-term reward. For instance, it
decision and the spectrum scheduling decision. The simulation is proved in [298] that by combining the DNN, clustering,
results show that the proposed algorithm is better than the and PSO methods, the joint resource allocation algorithm
UCB algorithm [315] in terms of performance delay. Another obtains better efficiency and accuracy compared to the
MAB-based learning algorithm for tasks-intensive offloading greedy, random, and PSO algorithms.
and balancing in MEC-enabled vehicular networks is proposed • Stacked autoencoder (SAE) is an efficient method for
in [316]. The proposed algorithm enables individual MEC latency minimization in large-scale MEC networks
servers to learn global knowledge iteratively. Multiple players because it can collect a good quality training data required
compete for multiple arm machines, where each player selects by the DL-based method. Indeed, the SAE method can
one of the arm machines and accordingly obtains a reward. compress and represent the high-dimensional data gen-
erated by IoT devices. Hence, it can reduce the state
space and provide quality data to the DL method used
D. Privacy Preserving
for the joint resource allocation [312]. Therefore, it can
In [252], the authors address three main issues faced by accelerate the learning process.
existing methods for computation offloading and resource • To provide better QoS/QoE for the end-users, it is
allocation in MEC. These issues include security and privacy, crucial to execute the DL training process on the
cooperative computation offloading, and dynamic optimiza- remote cloud because the DL training procedure requires
tion. To address the first two issues, they employ blockchain powerful computing resources. Furthermore, by executing
technology, in particular, a consensus approach to ensure data the offloading and scheduling training procedures in the
security, and use cooperative communication to offload the cloud servers, we are saving the limited resources of
computation tasks from mobile devices to the MEC system. edge servers to meet the QoS (e.g., for delay-sensitive
Concerning the dynamic optimization problem, a Markov de- applications) and QoE (e.g., video quality received by
cision process (MDP) is formulated and a deep reinforcement end-users).
learning algorithm is proposed to solve the MDP problem. • The efficiency of ML and DL methods for joint
Particularly, an algorithm based on “asynchronous advantage resource allocation is strongly related to the type
actor-critic (A3C)” reinforcement learning is proposed to solve of task being offloading in the sense that the training
the problem. A3C is a fast parallel reinforcement learning of dependent tasks (i.e., tasks graph) is more complex
method that utilizes multiple CPU threads on a single machine than the training of independent tasks. Indeed, in the
to learn more intelligently and efficiently. The experimental case of dependent tasks, the task dependencies must
results show that the proposed algorithm converges fast and be preserved during both the offloading and scheduling
performs better than the fixed block size (FBS) and fixed block process. Furthermore, if a task ti preceding a task tk
interval (FBT) schemes. is offloaded to the MEC server for training (due to
IoT device resource constraints), the result must be sent
E. Lessons Learned From ML/DL-based Joint Resource Allo- back to the IoT device before considering the task tk .
cation One possible solution to this challenge is gathering all
dependent tasks to the same MEC server as proposed in
A comprehensive summary of studies on ML/DL for joint [12].
resource allocation in MEC is given in Table XIII. The
majority of ML/DL-based Joint Task Offloading and Schedul-
ing (JTOS) methods in MEC aim to minimize the energy F. Summary of ML/DL-based Resource Allocation in MEC
consumption of the IoT device or to minimize the execution In summary, ML and DL methods enable efficient resource
delay while satisfying the energy constraints. Most of the allocation in MEC by their ability to predict both the task’s
works in this area formulated the problem as an MDP problem. features (e.g., QoS, QoE, security, privacy requirements, etc.)
Then, the MDP problem is solved using DRL techniques to and the target MEC resource capabilities. Particularly, DL
find the optimal JTOS decisions. Besides, the majority of the and DRL can extract complex features from large amounts of
proposed mechanisms are implemented using Tensorflow. high-dimensional data generated by IoT devices. Furthermore,
After deeply surveying the works addressing joint resource with the recent progress in mobile communication such as
allocation issues in MEC, we list the following key lessons: 5G and beyond, it is crucial to embed artificial intelligence
• The performance of the ML/DL-based algorithm for (AI) into MEC systems. Current ML and DL algorithms are
joint resource allocation strongly depends on both the good approaches to address the resource allocation problem in
offloading policy and the scheduling policy used by MEC networks. ML-based mechanisms for resource allocation
the algorithm. This is due to the fact that the offloading require datasets to train from. Then, the trained model is
results depend on the target computing edge servers applied to the real dataset to achieve the optimal resource
capabilities that are responsible to schedule and execute allocation policy. However, the trained model may not be
the offloaded tasks. adapted well to the entire features and properties of the data.
TABLE XIII: Summary of Studies on ML and DL for Joint Resource Allocation in MEC
Ref. Learning Type Algorithm Problem formulation Simulation Tools Optimization Criteria
2020, [1] DRL DNN Mixed integer optimization TensorFlow Minimize the task execution time and energy consumption of the
mobile device
2019, [11] RL DQN - TensorFlow Minimize the execution time, energy consumption, and delay Cost
2020, [301] RL + SL LA + QL + LSTM - iFogSim Maximize the CPU utilization while minimizing the execution time
and energy consumption
2020, [305] RL QL Lyapunov theory + MDP MATLAB Minimize the energy consumption of the application while satisfying
the overall execution delay.
2020, [309] RL Q-Learning - OpenAI Gym Minimize the execution delay of the mobile application while saving
battery power of the mobile device.
2019, [310] RL TD-learning MDP - Minimize the execution delay of the mobile application while mini-
mizing the energy consumption.
2020, [298] SL DNN + PSO MINLP - Minimize the energy consumption of all user devices.
2018, [313] Deep RL DQN+DNN MDP - Minimize the latency on V2V links while minimizing the interference
to V2I links.
2019, [317] RL MADL+D3QN Game+MDP - Maximize the long-term downlink utility while satisfying the UE’s
QoS requirements.
2019, [318] Deep RL DQN - Tensorflow Maximize both the short-term reward and long-term reward of the
DQN model.
2020, [194] SL DNN Queueing theory - Minimize the end-to-end inference delay of DL tasks.
2019, [319] RL QL MDP - Minimize the power consumption while maximizing the system
throughput.
2020, [312] USL AE +SA + DNN - - Minimize the latency of IoT users.
2019, [300] Deep RL DDQN MDP Tensorflow Minimize the execution delay and the energy consumption while
maximizing the user’s QoE.
2020, [252] RL DRL MDP + A3C Tensorflow Protect data security while maximizing the computation rate and
throughput of blockchain systems.
Hence, DL techniques have been used to address some of the changed datasets (e.g., when datapoints have been added or
limitations of ML mechanisms. deleted) [320]. Generally, a ML/DL model need to be retrained
Nevertheless, the application of ML and DL techniques from scratch when the data distributions have deviated sig-
for resource allocation brings new challenges that need to nificantly from those of the original training dataset. This is
be addressed. In particular, these techniques require a large known as model drift. In terms of latency and promptness,
amount and good quality of datasets to learn from, which model retraining at the edge of the networks has the advan-
are often scarce and also difficult to generate. For instance, tage of reducing the communication bandwidth and latency
the training model of the DRL-based task offloading method between the IoT device and the remote server since it does
in large-scale MEC networks requires quality data from the not require submitting data from IoT networks to the cloud
IoT users to improve the learning efficiency [312]. Also, [321]. However, it is expensive to retrain models from scratch.
although some datasets can be available online, their privacy The training model needs to be refined and adapted with the
protection must be considered during the learning process. new sub-dataset when the new dataset is similar to the dataset
Therefore, data generation and privacy-preserving are other observed in the past on which the model was trained.
issues that need to be addressed to develop efficient ML/DL- The main reason a model needs to be retrained is that the en-
based resource allocation algorithms. vironment in which the model is being predicted keeps chang-
Another challenge is that the collected dataset cannot be ing, and consequently the dataset changes, causing model drift.
generalized for all types of MEC tasks because different tasks Nevertheless, how and why the dataset changes depend on the
have different structures, attributes, and requirements. For use case, such as small dataset and adversarial environments.
instance, in most research papers, MEC tasks are considered For instance, if initially the model wasn’t trained with a dataset
as bag of tasks (BoT), where the tasks are independent, and large enough the model accuracy may vary significantly
each task has its input location id, input size, number of task between training and testing. Therefore, the model can be
execution instructions. Other studies considered the MEC tasks retrained with a training dataset that contains new observations
as dependent tasks represented by a Direct Acyclic Graph and increases its size. In an adversarial environment (e.g.,
(DAG) G = (T, E), where T is a set of t tasks, and E is a intrusion and fraud detection), where the environment behavior
set of e edges. Each task t represents a set of instructions that is actively trying to reduce the rewards given to the agent,
must be executed on the same computing resource. Each edge model retraining becomes much more crucial. Hence, models
corresponds to the precedence constraints among the tasks. retraining mechanisms capable of considering the adversarial
Therefore, the collected dataset from different sources cannot environment should be further investigated.
be generalized to all MEC tasks due to the heterogeneity and The next section discusses challenges and future research
distributed features of the datasets sources. Consequently, it is directions in detail.
challenging to find a unified task model for resource allocation
in MEC. IX. C HALLENGES AND F UTURE R ESEARCH D IRECTIONS
Additionally, the ML/DL training models are rarely static. In this section, we discuss key challenges and potential
Therefore, the models may need to be retrained on slightly future research directions of applying ML and DL techniques
for resource allocation in MEC. C. Deep Learning Models Caching

Most of the studies that focused on content caching ap-
A. Trade-off Between Large-Scale Training Datasets and proaches used deep learning to improve the traditional caching
Computation Delay strategies. Furthermore, few works have investigated strategies
The main challenge of DL algorithms is that they require to cache the DNN model at the edge of the network. However,
enough high-quality training datasets, which are difficult to DNN model caching at the edge network may enhance the
collect or generate, and also may not be available. Indeed, learning process and reduce inference time. Indeed, caching
the training dataset needs to be collected from multiple IoT the DNN model on the edge node reduces the volume of
devices, which is challenging due to the heterogeneity and input data that needs to be sent to the remote cloud server
distributed features of these devices [322]. One possible solu- and therefore reduces the inference latency [198]. Therefore,
tion is to move the trained model to mobile devices. However, researchers need to investigate novel DNN model caching
most IoT devices have limited battery capacity and compu- approaches instead of DL methods to improve traditional
tation/storage capabilities to execute deep learning models content caching approaches. A recent study on DNN model
which required high computation and storage capabilities. It caching is CacheNet [199], which caches the low-complexity
is for this reason that many studies have proposed methods DNN models on IoT devices, and the high-complexity DNN
to offload the trained model to the edge servers rather than models on edge/cloud servers.
running it on mobile devices. For instance, DNNOff [193]
aims to automatically determine the DNN models that should D. Integration of Blockchain and ML/DL for Resource Allo-
be offloaded to the edge servers. cation in MEC
Also, large training datasets are required to well analyze
Blockchain can be defined as a distributed data structure
and compare the efficiency of different related DL-based RA
consisting of a chain of blocks that keeps records of all
methods. For example, it is proved in many studies (e.g., [244])
transactions in the blockchain network while ensuring security
that different learning algorithms can provide similar perfor-
[323]. Each block is identified by a unique cryptography hash
mance for small datasets. On the other hand, high dimensional
function. The blockchain network comprises nodes that record
datasets for model training can lead to unacceptable delays
the same transactions. Due to its security and decentralization
because the training process of DL algorithms, in particular,
features, blockchain technology has been used in many areas,
DRL algorithms is computationally intensive [34]. Therefore, a
including banking systems [324], finance [325], cloud com-
vital research direction is the trade-off analysis between large-
puting [326], healthcare applications [327], and MEC [328].
scale training datasets and computation delay. One potential
The most well-known application of blockchain technology
solution is the compression of the training dataset as proposed
is the Bitcoin created by Satoshi Nakamoto in 2008 [329].
in [312], where the authors use a stacked autoencoder to
Blockchain technology can also be used to solve the security
compress and represent the high-dimensional dataset.
and privacy challenges of resource allocation in MEC. A
recent investigation towards this approach is introduced in
B. Trade-off Between Convergence Rate and Time Complexity [252], where a consensus approach is used to ensure data
The acceleration of the convergence rate of a training security. The authors of [330] depicts the limitations of edge
model is one of the most challenging issues of ML and DL intelligence (EI) and why blockchain technology could benefit
techniques. Furthermore, RL-based methods require a large from EI.
number of datasets, which increases the number of states and The integration of blockchain and ML/DL may improve the
actions spaces, and therefore leads to a slow convergence resource allocation method in terms of different metrics. On
rate. To solve this challenge, existing studies combine RL the one hand, the main challenge of ML/DL techniques is that
and DL methods. For instance, the work in [76] exploits they require enough high-quality training datasets which need
the features of the autoencoder method in an RL model to be collected from multiple distributed IoT devices. Also, the
to accelerate the convergence speed of the traditional RL training datasets may contain security-sensitive information
algorithm. By combining autoencoder and RL, the encoder which make the training and inference more challenging since
can extract the task’s features (e.g., task type, task priority, the data privacy must be considered during the learning and
task execution time, etc.). Then, the feature obtained by the inference processes. Hence, Blockchain technology can be
encoder can help the RL algorithm to find the best action in regarded as a complementary technology to solve the above
a given state, instead of using the traditional Q-table, which challenge due to its decentralized, privacy persevering, and
suffers from a slow convergence rate. On the other hand, secure features. Particularly, distributed training and inference
the combination of different learning methods may increase can be performed securely.
the algorithm time complexity, which will lead to a high On the other hand, Blockchain also faces many challenges
computation delay. However, a learning method that leads to when it comes to performing edge intelligence, such as storage
high time complexity is not suitable for delay-sensitive and load, transaction capacity, and fault tolerance, which prevent
real-time applications (e.g., video streaming). Consequently, many blockchain systems from being implemented [330].
researchers need to investigate new mechanisms that accelerate Blockchain also faces a technical challenge when it comes
the convergence rate of RL-based methods while minimizing to performing ML/DL tasks in the edge network, such as
the time complexity. the training and inference. Hence, DL techniques can benefit
blockchain for resource utilization. Specially, since a DL H. Resource Allocation in MEC for ML/DL
technique can predict data, it may also facilitate the prediction In the literature, the majority of works applied ML/DL
of computational tasks that a miner (or a consensus node) techniques to improve traditional resource allocation methods
needs to offload to the edge/cloud server. without investigating how resource allocation mechanisms in
Therefore, the integration of blockchain and ML/DL for MEC can improve ML/DL techniques. Indeed, since ML/DL
resource allocation in MEC is feasible due to the impor- are increasingly integrated into MEC, it is vital to investigate
tant current interest in blockchain and ML/DL techniques. resource allocation methods (i.e., offloading, caching, and
Nevertheless, few studies have focused on the integration of scheduling), which can accelerate the learning process and the
lightweight blockchain (i.e., a blockchain that can be applied convergence of ML/DL methods. Also, when a DL algorithm
on resource-constrained devices without affecting the security is implemented on a resource-constraint IoT device, some
features [331]) and DL techniques. Consequently, a vital future DL tasks, in particular, delay-intensive tasks (e.g., visual
research direction is the development of mechanisms based target tracking, online video editing) cannot be processed on
on lightweight blockchain and DL for resource allocation in the IoT device because they require a lot of computational
large-scale MEC networks. resources. Therefore, a potential future research direction is the
application of offloading, caching, and scheduling techniques
E. Resource Allocation under Time-Varying Wireless Channel to improve ML/DL methods. A recent work towards this
Conditions approach is presented in [333]. The authors proposed a novel
Most of the surveyed works ignored the time-varying DL task distributed framework, where the lower layers of the
wireless channel state information (CSI) in joint resource convolutional neural network (CNN) model are executed on
allocation. However, the CSI significantly impacts the joint the unmanned aerial vehicles, while the higher layers of the
resource allocation decision of a wireless-powered MEC sys- CNN model are offloaded to the MEC server.
tem because of the uncertain channel. The existing methods
focused either on task offloading under CSI or task scheduling
I. Deep Learning Inference on Resource-Constrained Devices
under CSI ignoring the joint resource allocation problem under
CSI. Therefore, ML/DL algorithms for joint task offloading Most of the surveyed papers focused on the deep learning
and scheduling in MEC networks with time-varying wireless training phase to obtain the optimal resource allocation policy,
channels should be further investigated. A recent study towards ignoring the inference (prediction) phase of DL. While DL
this approach is presented in [257]. training “teaches” a DNN using datasets to perform an AI
task, DL inference, on the other hand, uses a trained DNN
F. Considering More Computing Resources model to make predictions on new data that the model has
never seen before [86]. DL inference is usually a production
Most of the existing ML/DL-based mechanisms for resource
phase where a model is deployed to predict real-world datasets.
allocation in MEC define the target computation resource only
The DL inference is computationally expensive because of the
in terms of CPU processing capacity. However, ML tech-
high dimensional data and millions of operations that need to
niques, in general, and DRL techniques in particular, require
be performed on the data [33]. In addition, most DL models
more computation resources such as memory and storage to
are offloaded and executed in cloud data centers because they
obtain a high accuracy model. Particularly, the training model
require more computation resources that resource-constrained
of a DRL algorithm might have specific memory requirements
devices cannot provide. However, offloading and executing DL
that the target MEC servers have to meet in order to execute
models on the cloud servers or MEC servers [194] cannot
the model. Hence, one topic of interest is the investigation
satisfy the delay requirement of real-time services such as real-
of ML/DL-based algorithms that consider more computing
time video analytics and intelligent manufacturing. Therefore,
resources constraints such as memory, storage, and bandwidth.
a vital research direction is to propose novel DL models that
can be embedded in resource-constrained IoT devices while
G. Resource Allocation on Hybrid Architectures
considering the trade-off between the DL inference accuracy
Many studies assumed that the target MEC system com- and latency.
prises a bounded number of similar processors (i.e., related
CPU resources). However, nowadays more and more high-
performance computing systems use hybrid architectures (i.e., J. Federated Deep Transfer Learning
unrelated resources), such as multicore and accelerators like Although transfer learning (TL) can significantly accelerate
GPU. These novel architectures have introduced new resource the training process by using knowledge of an existing trained
allocation issues. For instance, how to jointly select the model, it can negatively impact the learning performance
required CPU and GPU resources to execute the offloaded of a target domain if there is a high dissimilarity between
task or the training model is a big challenge. In particular, the source domain and the target domain. This issue called
DL algorithms require GPU-enabled servers to accelerate the negative transfer, is one of the most challenging problems
training model [332]. Therefore, it is vital to investigate in TL [334]. Negative transfer reduces the accuracy of a
mechanisms that can jointly share the hybrid computation machine learning model after retraining. It is mainly caused by
resources (i.e., CPU and GPU) to both the offloaded task and a poor dependency between the source and the target domains.
the training model. To overcome this issue, deep transfer learning (DTL) has
been introduced as a new paradigm that uses DL methods to [3] T. Taleb, K. Samdanis, B. Mada, H. Flinck, S. Dutta,
perform an efficient knowledge transfer (i.e., positive transfer) and D. Sabella, “On multi-access edge computing: A
[335]. Indeed, DL has a strong dependence on large training survey of the emerging 5g network edge cloud architec-
datasets compared to traditional ML techniques because it ture and orchestration,” IEEE Commun. Surveys Tuts.,
requires a huge amount of data to understand and extract vol. 19, no. 3, pp. 1657–1681, 2017.
the hidden patterns. Hence, the utilization of DL to perform [4] H. A. Alameddine, S. Sharafeddine, S. Sebbah, S. Ay-
TL tasks (i.e., DTL) can significantly mitigate the negative oubi, and C. Assi, “Dynamic task offloading and
transfer issue. Also, DTL has been proved to achieve high scheduling for low-latency iot services in multi-access
prediction accuracy in various research fields including fault edge computing,” IEEE J Sel Areas Commun, vol. 37,
diagnosis in manufacturing for cross-domain prediction [336], no. 3, pp. 668–682, 2019.
medical (e.g., magnetic resonance imaging (MRI), classifi- [5] P. Mach and Z. Becvar, “Mobile edge computing: A
cation for covid-19 disease [337]), machine fault diagnosis survey on architecture and computation offloading,”
[338], and networking [339]. However, its application for IEEE Commun. Surveys Tuts., vol. 19, no. 3, pp. 1628–
resource allocation in MEC is still limited. Further research 1656, 2017.
on resource allocation in MEC needs to investigate new DTL [6] ETSI, “Multi-access edge computing.” [Online]. Avail-
mechanisms that consider not only the negative transfer issue able: https://www.etsi.org/technologies/multi-access-
but also the challenges of IoT data scarcity and privacy faced edge-computing
by deep learning techniques. A key research direction is the [7] N. Boyd, “Mobile edge computing vs multi-access
combination of federated learning (to address the privacy and edge computing,” Mar. 2018. [Online]. Available:
security issues) and DTL (to solve the datasets scarcity and https://www.sdxcentral.com/edge/definitions/mobile-
the negative transfer issues), which we called federated deep edge-computing-vs-multi-access-edge-computing/
transfer learning (FDTL). [8] P. Porambage, J. Okwuibe, M. Liyanage, M. Ylianttila,
and T. Taleb, “Survey on multi-access edge computing
X. C ONCLUSION for internet of things realization,” IEEE Commun. Sur-
veys Tuts., vol. 20, no. 4, pp. 2961–2991, 2018.
This paper provides a comprehensive survey and tutorial of
[9] C. o. M. I. Alex Reznik, “Mec proof of concept.”
ML and DL methods for resource allocation problems in MEC.
[10] M. Rodriguez and R. Buyya, “A taxonomy and survey
We first present tutorials that demonstrate the advantages of
on scheduling algorithms for scientific workflows in
applying ML and DL techniques in MEC. Then, we discuss
iaas cloud computing environments,” Concurr. Comput.
potential technologies for quickly running ML and DL tasks
Pract. E., vol. 29, no. 8, 2017.
(e.g., training and inference) in MEC. We also discuss and
[11] L. Huang, X. Feng, C. Zhang, L. Qian, and Y. Wu,
summarize key ML and DL methods and their importance for
“Deep reinforcement learning-based joint task offload-
resource allocation in MEC. Afterward, we provide a compre-
ing and bandwidth allocation for multi-user mobile edge
hensive and in-depth survey of recent works that applied ML
computing,” Digit Commun Netw, vol. 5, no. 1, pp. 10
and DL techniques to address the resource allocation problem
– 17, 2019.
in MEC from three aspects including task offloading problem,
[12] J. Wang, J. Hu, G. Min, W. Zhan, Q. Ni, and N. Geor-
task scheduling problem, and joint resource allocation prob-
galas, “Computation offloading in multi-access edge
lem. Furthermore, the state-of-the-art ML/DL-based resource
computing using a deep sequential model based on
allocation techniques are reviewed and classified within the
reinforcement learning,” IEEE Commun Mag, vol. 57,
scope of this study. Finally, we present an extensive list
no. 5, pp. 64–69, 2019.
of challenges and future research directions related to the
[13] F. Rebecchi, M. Dias de Amorim, V. Conan, A. Pas-
application of ML and DL for resource allocation in MEC.
sarella, R. Bruno, and M. Conti, “Data offloading tech-
This survey provides an effective manual that can motivate
niques in cellular networks: A survey,” IEEE Commun.
readers to advance this research field, and help them to well
Surveys Tuts., vol. 17, no. 2, pp. 580–603, 2015.
understand how and when ML/DL-based resource allocation
[14] R. V. Lopes and D. Menascé, “A taxonomy of job
techniques perform better than the traditional methods.
scheduling on distributed computing systems,” IEEE
Trans. Parallel Distrib. Syst., vol. 27, no. 12, pp. 3412–
R EFERENCES 3428, 2016.
[1] J. Yan, S. Bi, and Y. J. A. Zhang, “Offloading and re- [15] E. Nunes, M. Manner, H. Mitiche, and M. Gini, “A
source allocation with general task graph in mobile edge taxonomy for task allocation problems with temporal
computing: A deep reinforcement learning approach,” and ordering constraints,” Rob. Auton. Syst., vol. 90,
IEEE Trans Wireless Commun, vol. 19, no. 8, pp. 5404– pp. 55–70, 2017.
5419, 2020. [16] N. C. Luong, P. Wang, D. Niyato, Y. Wen, and Z. Han,
[2] GSMA, “Iot in the 5g era opportunities “Resource management in cloud networking using eco-
and benefits for enterprises and consumers.” nomic analysis and pricing models: A survey,” IEEE
[Online]. Available: https://www.gsma.com/iot/wp- Commun. Surveys Tuts., vol. 19, no. 2, pp. 954–1001,
content/uploads/2019/11/201911-GSMA-IoT-Report- 2017.
IoT-in-the-5G-Era.pdf [17] K. Wang, Q. Zhou, S. Guo, and J. Luo, “Cluster frame-
works for efficient scheduling and resource allocation in mobile and wireless networking: A survey,” IEEE
in data center networks: A survey,” IEEE Commun. Commun. Surveys Tuts., vol. 21, no. 3, pp. 2224–2287,
Surveys Tuts., vol. 20, no. 4, pp. 3560–3580, 2018. 2019.
[18] E. Ivashko, I. Chernov, and N. Nikitina, “A survey of [33] J. Chen and X. Ran, “Deep learning with edge comput-
desktop grid scheduling,” IEEE Trans Parallel Distrib ing: A review,” Proc IEEE, vol. 107, no. 8, pp. 1655–
Syst, vol. 29, no. 12, pp. 2882–2895, 2018. 1674, 2019.
[19] L. F. Bittencourt, A. Goldman, E. R. Madeira, N. L. da [34] W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y.-C.
Fonseca, and R. Sakellariou, “Scheduling in distributed Liang, Q. Yang, D. Niyato, and C. Miao, “Federated
systems: A cloud computing perspective,” Comput Sci learning in mobile edge networks: A comprehensive
Rev, vol. 30, pp. 31 – 54, 2018. survey,” IEEE Commun. Surveys Tuts., vol. 22, no. 3,
[20] D. Xu, Y. Li, X. Chen, J. Li, P. Hui, S. Chen, and pp. 2031–2063, 2020.
J. Crowcroft, “A survey of opportunistic offloading,” [35] F. Hussain, S. A. Hassan, R. Hussain, and E. Hossain,
IEEE Commun. Surveys Tuts., vol. 20, no. 3, pp. 2198– “Machine learning for resource management in cellular
2236, 2018. and iot networks: Potentials, current solutions, and open
[21] M. Kumar, S. Sharma, A. Goel, and S. Singh, “A com- challenges,” IEEE Commun. Surveys Tuts., vol. 22,
prehensive survey for scheduling techniques in cloud no. 2, pp. 1251–1275, 2020.
computing,” J Netw Comput Appl, vol. 143, pp. 1 – 33, [36] A. Shakarami, M. Ghobaei-Arani, and A. Shahidinejad,
2019. “A survey on the computation offloading approaches
[22] A. Arunarani, D. Manjula, and V. Sugumaran, “Task in mobile edge computing: A machine learning-based
scheduling techniques in cloud computing: A literature perspective,” Comm Com Inf Sc, vol. 182, p. 107496,
survey,” Future Generat. Comput. Syst., vol. 91, pp. 407 2020.
– 415, 2019. [37] M. McClellan, C. Cervelló-Pastor, and S. Sallent, “Deep
[23] M. Adhikari, T. Amgoth, and S. N. Srirama, “A survey learning at the mobile edge: Opportunities for 5g net-
on scheduling strategies for workflows in cloud environ- works,” Appl. Sci., vol. 10, no. 14, p. 4735, 2020.
ment and emerging trends,” ACM Comput Surv, vol. 52, [38] A. Shakarami, M. Ghobaei-Arani, M. Masdari, and
no. 4, Aug. 2019. M. Hosseinzadeh, “A survey on the computation of-
[24] B. Wang, C. Wang, W. Huang, Y. Song, and X. Qin, “A floading approaches in mobile edge/cloud computing
survey and taxonomy on task offloading for edge-cloud environment: a stochastic-based perspective,” J Grid
computing,” IEEE Access, vol. 8, pp. 186 080–186 101, Comput, pp. 1–33, 2020.
2020. [39] X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan,
[25] H. Lin, S. Zeadally, Z. Chen, H. Labiod, and L. Wang, and X. Chen, “Convergence of edge computing and
“A survey on computation offloading modeling for edge deep learning: A comprehensive survey,” IEEE Com-
computing,” J Netw Comput Appl, vol. 169, p. 102781, mun. Surveys Tuts., vol. 22, no. 2, pp. 869–904, 2020.
2020. [40] D. Xu, T. Li, Y. Li, X. Su, S. Tarkoma, T. Jiang,
[26] M. H. Hilman, M. A. Rodriguez, and R. Buyya, “Mul- J. Crowcroft, and P. Hui, “Edge intelligence: Archi-
tiple workflows scheduling in multi-tenant distributed tectures, challenges, and applications,” arXiv preprint
systems: A taxonomy and future directions,” ACM Com- arXiv:2003.12172, 2020.
put Surv, vol. 53, no. 1, Feb. 2020. [41] Y. Liu, M. Peng, G. Shou, Y. Chen, and S. Chen,
[27] Q.-H. Nguyen and F. Dressler, “A smartphone per- “Toward edge intelligence: Multiaccess edge computing
spective on computation offloading—a survey,” Comput for 5g and internet of things,” IEEE Int. Things J., vol. 7,
Commun, vol. 159, pp. 133 – 154, 2020. no. 8, pp. 6722–6747, 2020.
[28] A. Islam, A. Debnath, M. Ghose, and S. Chakraborty, [42] F. Saeik, M. Avgeris, D. Spatharakis, N. Santi, D. De-
“A survey on task offloading in multi-access edge chouniotis, J. Violos, A. Leivadeas, N. Athanasopoulos,
computing,” J Syst Architect, vol. 118, p. 102225, 2021. N. Mitton, and S. Papavassiliou, “Task offloading in
[29] S. Yang, F. Li, S. Trajanovski, R. Yahyapour, and X. Fu, edge and cloud computing: A survey on mathemati-
“Recent advances of resource allocation in network cal, artificial intelligence and control theory solutions,”
function virtualization,” IEEE Trans Parallel Distrib Comm Com Inf Sc, vol. 195, p. 108177, 2021.
Syst, vol. 32, no. 2, pp. 295–314, 2021. [43] Q. Luo, S. Hu, C. Li, G. Li, and W. Shi, “Resource
[30] S. Chen, Q. Li, M. Zhou, and A. Abusorrah, “Recent scheduling in edge computing: A survey,” IEEE Com-
advances in collaborative scheduling of computing tasks mun. Surveys Tuts., pp. 1–1, 2021.
in an edge computing paradigm,” Ah S Sens, vol. 21, [44] Y. Jiang, “A survey of task allocation and load balancing
no. 3, p. 779, 2021. in distributed systems,” IEEE Trans Parallel Distrib
[31] Y. Xu, G. Gui, H. Gacanin, and F. Adachi, “A survey Syst, vol. 27, no. 2, pp. 585–599, 2016.
on resource allocation for 5g heterogeneous networks: [45] Q.-V. Pham, F. Fang, V. N. Ha, M. J. Piran, M. Le,
Current research, future trends, and challenges,” IEEE L. B. Le, W.-J. Hwang, and Z. Ding, “A survey of
Commun. Surveys Tuts., vol. 23, no. 2, pp. 668–695, multi-access edge computing in 5g and beyond: Fun-
2021. damentals, technology integration, and state-of-the-art,”
[32] C. Zhang, P. Patras, and H. Haddadi, “Deep learning IEEE Access, vol. 8, pp. 116 974–117 017, 2020.
[46] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. vol. 54, pp. 216–228, 2017.
Letaief, “A survey on mobile edge computing: The [60] E. Schiller, N. Nikaein, E. Kalogeiton, M. Gasparyan,
communication perspective,” IEEE Commun. Surveys and T. Braun, “Cds-mec: Nfv/sdn-based application
Tuts., vol. 19, no. 4, pp. 2322–2358, 2017. management for mec in 5g systems,” Comm Com Inf
[47] N. F. V. ETSI, “Network functions virtualisation (nfv),” Sc, vol. 135, pp. 96–107, 2018.
https://portal.etsi.org/nfv/nfv white paper.pdf. [61] P. Shantharama, A. S. Thyagaturu, N. Karakoc, L. Fer-
[48] R. Sairam, S. S. Bhunia, V. Thangavelu, and M. Gu- rari, M. Reisslein, and A. Scaglione, “Layback: Sdn
rusamy, “Netra: Enhancing iot security using nfv-based management of multi-access edge computing (mec) for
edge traffic analysis,” IEEE Sensors J, vol. 19, no. 12, network access services and radio resource sharing,”
pp. 4660–4671, 2019. IEEE Access, vol. 6, pp. 57 545–57 561, 2018.
[49] A. Pastor, A. Mozo, D. R. Lopez, J. Folgueira, and [62] A. Nasrallah, A. S. Thyagaturu, Z. Alharbi, C. Wang,
A. Kapodistria, “The mouseworld, a security traffic X. Shao, M. Reisslein, and H. ElBakoury, “Ultra-low
analysis lab based on nfv/sdn,” in Proc. 13th Int. Conf. latency (ull) networks: The ieee tsn and ietf detnet
Availability Rel. Secur., 2018, pp. 1–6. standards and related 5g ull research,” IEEE Commun.
[50] M. Pattaranantakul, R. He, Q. Song, Z. Zhang, and Surveys Tuts., vol. 21, no. 1, pp. 88–145, 2019.
A. Meddahi, “Nfv security survey: From use case [63] P. V. Klaine, M. A. Imran, O. Onireti, and R. D. Souza,
driven threat analysis to state-of-the-art countermea- “A survey of machine learning techniques applied to
sures,” IEEE Commun. Surveys Tuts., vol. 20, no. 4, self-organizing cellular networks,” IEEE Commun. Sur-
pp. 3330–3368, 2018. veys Tuts., vol. 19, no. 4, pp. 2392–2431, 2017.
[51] R. Mijumbi, J. Serrat, J.-L. Gorricho, N. Bouten, [64] Q. Mao, F. Hu, and Q. Hao, “Deep learning for intelli-
F. De Turck, and R. Boutaba, “Network function virtual- gent wireless networks: A comprehensive survey,” IEEE
ization: State-of-the-art and research challenges,” IEEE Commun. Surveys Tuts., vol. 20, no. 4, pp. 2595–2621,
Commun. Surveys Tuts., vol. 18, no. 1, pp. 236–262, 2018.
2016. [65] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and
[52] T. Wang, J. Zu, G. Hu, and D. Peng, “Adaptive service M. Guizani, “Deep learning for iot big data and stream-
function chain scheduling in mobile edge computing via ing analytics: A survey,” IEEE Commun. Surveys Tuts.,
deep reinforcement learning,” IEEE Access, vol. 8, pp. vol. 20, no. 4, pp. 2923–2960, 2018.
164 922–164 935, 2020. [66] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang,
[53] L. Linguaglossa, S. Lange, S. Pontarelli, G. Rétvári, Y.-C. Liang, and D. I. Kim, “Applications of deep rein-
D. Rossi, T. Zinner, R. Bifulco, M. Jarschel, and forcement learning in communications and networking:
G. Bianchi, “Survey of performance acceleration tech- A survey,” IEEE Commun. Surveys Tuts., vol. 21, no. 4,
niques for network function virtualization,” Proc IEEE, pp. 3133–3174, 2019.
vol. 107, no. 4, pp. 746–764, 2019. [67] Y. Liu, F. R. Yu, X. Li, H. Ji, and V. C. M. Leung,
[54] P. Shantharama, A. S. Thyagaturu, and M. Reisslein, “Blockchain and machine learning for communications
“Hardware-accelerated platforms and infrastructures for and networking systems,” IEEE Commun. Surveys Tuts.,
network functions: A survey of enabling technologies vol. 22, no. 2, pp. 1392–1431, 2020.
and research studies,” IEEE Access, vol. 8, pp. 132 021– [68] Y. Sun, J. Liu, J. Wang, Y. Cao, and N. Kato, “When
132 085, 2020. machine learning meets privacy in 6g: A survey,” IEEE
[55] R. Riggio, S. N. Khan, T. Subramanya, I. G. B. Yahia, Commun. Surveys Tuts., vol. 22, no. 4, pp. 2694–2724,
and D. Lopez, “Lightmano: Converging nfv and sdn 2020.
at the edges of the network,” in NOMS 2018 - 2018 [69] M. A. Al-Garadi, A. Mohamed, A. K. Al-Ali, X. Du,
IEEE/IFIP Netw. Operations Manage. Symp., 2018, pp. I. Ali, and M. Guizani, “A survey of machine and deep
1–9. learning methods for internet of things (iot) security,”
[56] C.-L. I, S. Kuklinskı́, and T. Chen, “A perspective of IEEE Commun. Surveys Tuts., vol. 22, no. 3, pp. 1646–
o-ran integration with mec, son, and network slicing in 1685, 2020.
the 5g era,” IEEE Netw., vol. 34, no. 6, pp. 3–4, 2020. [70] S. Dong, P. Wang, and K. Abbas, “A survey on deep
[57] A. Reznik, L. M. C. Murillo, Y. Fang, W. Featherstone, learning and its applications,” Comput Sci Rev, vol. 40,
M. Filippou, F. Fontes, F. Giust, Q. Huang, A. Li, p. 100379, 2021.
C. Turyagyenda et al., “Cloud ran and mec: A perfect [71] F. Tang, B. Mao, Y. Kawamoto, and N. Kato, “Survey
pairing,” ETSI White paper, no. 23, pp. 1–24, 2018. on machine learning for intelligent end-to-end commu-
[58] Z. Lv and W. Xiu, “Interaction of edge-cloud computing nication toward 6g: From network access, routing to
based on sdn and nfv for next generation iot,” IEEE traffic control and streaming adaption,” IEEE Commun.
Internet Things J, vol. 7, no. 7, pp. 5706–5712, 2020. Surveys Tuts., vol. 23, no. 3, pp. 1578–1598, 2021.
[59] B. Blanco, J. O. Fajardo, I. Giannoulakis, E. Kafetzakis, [72] M. S. Murshed, C. Murphy, D. Hou, N. Khan, G. Anan-
S. Peng, J. Pérez-Romero, I. Trajkovska, P. S. Kho- thanarayanan, and F. Hussain, “Machine learning at
dashenas, L. Goratti, M. Paolino et al., “Technology the network edge: A survey,” ACM Computing Surveys
pillars in the architecture of future 5g mobile networks: (CSUR), vol. 54, no. 8, pp. 1–37, 2021.
Nfv, mec and sdn,” Comput. Standards & Interfaces, [73] J. Verbraeken, M. Wolting, J. Katzy, J. Kloppenburg,
T. Verbelen, and J. S. Rellermeyer, “A survey on “An online algorithm for task offloading in heteroge-
distributed machine learning,” ACM Comput Surv, neous mobile clouds,” ACM Trans Internet Technol,
vol. 53, no. 2, Mar. 2020. [Online]. Available: vol. 18, no. 2, Jan. 2018.
https://doi.org/10.1145/3377454 [89] P. Lin, Q. Song, F. R. Yu, D. Wang, and L. Guo, “Task
[74] I. Portugal, P. Alencar, and D. Cowan, “The use of offloading for wireless vr-enabled medical treatment
machine learning algorithms in recommender systems: with blockchain security using collective reinforcement
A systematic review,” Expert Syst Appl, vol. 97, pp. 205 learning,” IEEE Internet Things J, pp. 1–1, 2021.
– 227, 2018. [90] A. Samanta, Z. Chang, and Z. Han, “Latency-oblivious
[75] D. Ucci, L. Aniello, and R. Baldoni, “Survey of ma- distributed task scheduling for mobile edge computing,”
chine learning techniques for malware analysis,” Com- in 2018 IEEE Global Commun. Conf. (GLOBECOM),
put. & Secur., vol. 81, pp. 123 – 147, 2019. 2018, pp. 1–7.
[76] S. Shadroo, A. M. Rahmani, and A. Rezaee, “The two- [91] J. Huang, S. Li, and Y. Chen, “Revenue-optimal task
phase scheduling based on deep learning in the internet scheduling and resource management for iot batch jobs
of things,” Comm Com Inf Sc, vol. 185, p. 107684, 2021. in mobile edge computing,” Peer-to-Peer Netw. Appl.,
[77] A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, “A no. 8, 2020.
survey of the recent architectures of deep convolutional [92] Y. Cui, D. Zhang, T. Zhang, P. Yang, and H. Zhu,
neural networks,” Artif Intell Rev, vol. 53, no. 8, pp. “A new approach on task offloading scheduling for
5455–5516, 2020. application of mobile edge computing,” in 2021 IEEE
[78] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Im- Wireless Commun. Netw. Conf. (WCNC), 2021, pp. 1–6.
agenet classification with deep convolutional neural [93] X. Jiang, F. R. Yu, T. Song, and V. C. M. Leung, “A
networks,” Adv. Neural Inf. Process. Syst., vol. 25, 2012. survey on multi-access edge computing applied to video
[79] K. Simonyan and A. Zisserman, “Very deep convo- streaming: Some research issues and challenges,” IEEE
lutional networks for large-scale image recognition,” Commun. Surveys Tuts., vol. 23, no. 2, pp. 871–903,
arXiv preprint arXiv:1409.1556, 2014. 2021.
[80] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, [94] E. Coffman, J. Csirik, G. Galambos, S. Martello, and
D. Anguelov, D. Erhan, V. Vanhoucke, and A. Ra- D. Vigo, Bin Packing Approximation Algorithms: Survey
binovich, “Going deeper with convolutions,” in 2015 and Classification, 01 2012, p. (to appear).
IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), [95] P. Festa, “A brief introduction to exact, approximation,
2015, pp. 1–9. and heuristic algorithms for solving hard combinatorial
[81] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual optimization problems,” in 2014 16th Int. Conf. Trans-
learning for image recognition,” in Proc. IEEE Conf. parent Opt. Netw. (ICTON). IEEE, 2014, pp. 1–20.
Comput. Vision Pattern Recognit. (CVPR), June 2016. [96] J. Shen, N. Yi, B. Wu, W. Jiang, and H. Xiang, “A
[82] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, greedy-based resource allocation algorithm for multicast
W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level and unicast services in ofdm system,” in 2009 Int. Conf.
accuracy with 50x fewer parameters and¡ 0.5 mb model Wireless Commun. Signal Process., 2009, pp. 1–5.
size,” arXiv preprint arXiv:1602.07360, 2016. [97] Y. Fan, L. Wang, W. Wu, and D. Du, “Cloud/edge
[83] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, computing resource allocation and pricing for mobile
W. Wang, T. Weyand, M. Andreetto, and H. Adam, blockchain: An iterative greedy and search approach,”
“Mobilenets: Efficient convolutional neural networks IEEE Trans Comput Social Syst, vol. 8, no. 2, pp. 451–
for mobile vision applications,” arXiv preprint 463, 2021.
arXiv:1704.04861, 2017. [98] F. Wei, S. Chen, and W. Zou, “A greedy algorithm
[84] M. Z. Alom, T. M. Taha, C. Yakopcic, S. Westberg, for task offloading in mobile edge computing system,”
P. Sidike, M. S. Nasrin, M. Hasan, B. C. Van Essen, China Commun, vol. 15, no. 11, pp. 149–157, 2018.
A. A. Awwal, and V. K. Asari, “A state-of-the-art survey [99] M. T. Islam, A.-E. M. Taha, S. Akl, and S. Choudhury,
on deep learning theory and architectures,” Electronics, “A local search algorithm for resource allocation for
vol. 8, no. 3, p. 292, 2019. underlaying device-to-device communications,” in 2015
[85] L. Deng, G. Li, S. Han, L. Shi, and Y. Xie, “Model IEEE Global Commun. Conf. (GLOBECOM), 2015, pp.
compression and hardware acceleration for neural net- 1–6.
works: A comprehensive survey,” Proc IEEE, vol. 108, [100] Q. Wei, W. Sun, B. Bai, L. Wang, E. G. Ström, and
no. 4, pp. 485–532, 2020. M. Song, “Resource allocation for v2x communications:
[86] “Deep learning training vs deep learning infer- A local search based 3d matching approach,” in 2017
ence,” https://premioinc.com/blogs/blog/deep-learning- IEEE Int. Conf. Commun. (ICC), 2017, pp. 1–6.
training-vs-deep-learning-inference. [101] A. L. Stolyar, “Greedy primal-dual algorithm for
[87] B. Taylor, V. S. Marco, W. Wolff, Y. Elkhatib, and dynamic resource allocation in complex networks,”
Z. Wang, “Adaptive deep learning model selection on Queueing Syst, vol. 54, no. 3, pp. 203–220, 2006.
embedded systems,” ACM SIGPLAN Notices, vol. 53, [102] M. Chen and J. Huang, “Optimal resource allocation for
no. 6, pp. 31–43, 2018. ofdm uplink communication: A primal-dual approach,”
[88] B. Zhou, A. V. Dastjerdi, R. N. Calheiros, and R. Buyya, in 2008 42nd Annu. Conf. Inf. Sci. Syst., 2008, pp. 926–
931. [117] J. L. de Souza Toniolli and B. Jaumard, “Resource

[103] Y.-H. Chiang, T. Zhang, and Y. Ji, “Joint cotask-aware allocation for multiple workflows in cloud-fog
offloading and scheduling in mobile edge computing computing systems,” in Proc. 12th IEEE/ACM Int.
systems,” IEEE Access, vol. 7, pp. 105 008–105 018, Conf. Utility Cloud Comput. Companion, ser. UCC
2019. ’19 Companion. New York, NY, USA: Association
[104] V. T. Chakaravarthy, A. R. Choudhury, S. Gupta, S. Roy, for Computing Machinery, 2019, p. 77–84. [Online].
and Y. Sabharwal, “Improved algorithms for resource Available: https://doi.org/10.1145/3368235.3368846
allocation under varying capacity,” in Eur. Symp. Algo- [118] M. Y. Özkaya, A. Benoit, B. Uçar, J. Herrmann, and
rithms. Springer, 2014, pp. 222–234. Ü. V. Çatalyürek, “A scalable clustering-based task
[105] K. Mukherjee, P. Dutta, G. Raravi, T. Rajasubramaniam, scheduler for homogeneous processors using dag par-
K. Dasgupta, and A. Singh, “Fair resource allocation titioning,” in IEEE Int. Parallel Distrib. Process. Symp.
for heterogeneous tasks,” in 2015 IEEE Int. Parallel (IPDPS). IEEE, 2019, pp. 155–165.
Distrib. Process. Symp., 2015, pp. 1087–1096. [119] A. Dogan and R. Ozguner, “Ldbs: A duplication based
[106] Wikipedia, “Heuristic.” [Online]. Available: scheduling algorithm for heterogeneous computing sys-
https://en.wikipedia.org/wiki/Heuristic tems,” 2002, pp. 352 – 359.
[107] D. Ouelhadj and S. Petrovic, “A survey of dynamic [120] H. Chen, X. Zhu, D. Qiu, L. Liu, and Z. Du, “Schedul-
scheduling in manufacturing systems,” J. Scheduling, ing for workflows with security-sensitive intermediate
vol. 12, no. 4, pp. 417–431, 2009. data by selective tasks duplication in clouds,” IEEE
[108] H. Djigal, J. Feng, and J. Lu, “Task scheduling for Trans Parallel Distrib Syst, vol. 28, no. 9, pp. 2674–
heterogeneous computing using a predict cost matrix,” 2688, 2017.
in Proc. 48th Int. Conf. Parallel Process.: Workshops, [121] K. He, X. Meng, Z. Pan, L. Yuan, and P. Zhou, “A
ser. ICPP 2019. New York, NY, USA: ACM, 2019, novel task-duplication based clustering algorithm for
pp. 25:1–25:10. heterogeneous computing environments,” IEEE Trans
[109] H. Arabnejad and J. G. Barbosa, “List scheduling Parallel Distrib Syst, vol. 30, no. 1, pp. 2–14, Jan 2019.
algorithm for heterogeneous systems by an optimistic [122] F. Wu, Q. Wu, and Y. Tan, “Workflow scheduling in
cost table,” IEEE Trans Parallel Distrib Syst, vol. 25, cloud: a survey,” J. Supercomput., vol. 71, no. 9, pp.
no. 3, pp. 682–694, 2014. 3373–3418, 2015.
[110] H. Djigal, J. Feng, and J. Lu, “Performance evalu- [123] Y. Liu, L. Meng, and H. Tomiyama, “A genetic algo-
ation of security-aware list scheduling algorithms in rithm for scheduling of data-parallel tasks on multicore
iaas cloud,” in 2020 20th IEEE/ACM International architectures,” IPSJ Trans. Syst. LSI Des. Methodol.,
Symposium on Cluster, Cloud and Internet Computing vol. 12, pp. 74–77, 2019.
(CCGRID), 2020, pp. 330–339. [124] S. Basu, M. Karuppiah, K. Selvakumar, K.-C. Li, S. H.
[111] P. Paymard and N. Mokari, “Resource allocation in pd- Islam, M. M. Hassan, and M. Z. A. Bhuiyan, “An
noma–based mobile edge computing system: Multiuser intelligent/cognitive model of task scheduling for iot
and multitask priority,” Trans Emerg Telecommun Tech- applications in cloud computing environment,” Future
nologies, p. e3631, 2019. Gener. Comput. Syst, vol. 88, pp. 254–261, 2018.
[112] H. Djigal, F. Jun, J. Lu, and J. Ge, “Ippts: An efficient [125] R. L. Kadri and F. F. Boctor, “An efficient genetic algo-
algorithm for scientific workflow scheduling in het- rithm to solve the resource-constrained project schedul-
erogeneous computing systems,” IEEE Trans Parallel ing problem with transfer times: The single mode case,”
Distrib Syst, pp. 1–1, 2020. Eur J Oper Res, vol. 265, no. 2, pp. 454–462, 2018.
[113] H. Djigal, L. Liu, J. Luo, and J. Xu, “Buda: Budget and [126] M. Nouiri, A. Bekrar, A. Jemai, S. Niar, and A. C.
deadline aware scheduling algorithm for task graphs in Ammari, “An effective and distributed particle swarm
heterogeneous systems,” in 2022 IEEE/ACM 30th In- optimization algorithm for flexible job-shop scheduling
ternational Symposium on Quality of Service (IWQoS), problem,” J. Intell. Manuf., vol. 29, no. 3, pp. 603–615,
2022, pp. 1–10. 2018.
[114] A. Yoosefi and H. R. Naji, “A clustering algorithm [127] Q. You and B. Tang, “Efficient task offloading using
for communication-aware scheduling of task graphs on particle swarm optimization algorithm in edge comput-
multi-core reconfigurable systems,” IEEE Trans Parallel ing for industrial internet of things,” J Cloud Comput,
Distrib Syst, no. 10, pp. 2718–2732, 2017. vol. 10, no. 1, pp. 1–11, 2021.
[115] H. Kanemitsu, M. Hanada, and H. Nakazato, [128] R. S. Elhabyan and M. C. Yagoub, “Two-tier particle
“Clustering-based task scheduling in a large number of swarm optimization protocol for clustering and routing
heterogeneous processors,” IEEE Trans Parallel Distrib in wireless sensor network,” J. Netw. Comput. Appl.,
Syst, vol. 27, no. 11, pp. 3144–3157, 2016. vol. 52, pp. 116 – 128, 2015.
[116] L. Dong, M. N. Satpute, J. Shan, B. Liu, Y. Yu, [129] W. Deng, J. Xu, and H. Zhao, “An improved ant colony
and T. Yan, “Computation offloading for mobile-edge optimization algorithm based on hybrid strategies for
computing with multi-user,” in 2019 IEEE 39th Int. scheduling problem,” IEEE Access, vol. 7, pp. 20 281–
Conf. Distrib. Comput. Syst. (ICDCS), 2019, pp. 841– 20 292, 2019.
850. [130] Y. Moon, H. Yu, J.-M. Gil, and J. Lim, “A slave
ants based ant colony optimization algorithm for task vol. 20, no. 3, pp. 1092–1109, 2021.
scheduling in cloud computing environments,” Human- [144] X. Shen, J. Gao, W. Wu, K. Lyu, M. Li, W. Zhuang,
centric Comput. Inf. Sciences, vol. 7, no. 1, p. 28, 2017. X. Li, and J. Rao, “Ai-assisted network-slicing based
[131] J. Meshkati and F. Safi-Esfahani, “Energy-aware re- next-generation wireless networks,” IEEE Open J. Veh.
source utilization based on particle swarm optimization Technol., vol. 1, pp. 45–66, 2020.
and artificial bee colony algorithms in cloud comput- [145] M. H. Abidi, H. Alkhalefah, K. Moiduddin, M. Alazab,
ing,” J Supercomput, vol. 75, no. 5, pp. 2455–2496, M. K. Mohammed, W. Ameen, and T. R. Gadekallu,
2019. “Optimal 5g network slicing using machine learning
[132] S. Chai, Y. Li, J. Wang, and C. Wu, “A list simulated and deep learning concepts,” Comput Standards Inter-
annealing algorithm for task scheduling on network-on- faces, vol. 76, p. 103518, 2021.
chip.” JCP, vol. 9, no. 1, pp. 176–182, 2014. [146] V. P. Kafle, Y. Fukushima, P. Martinez-Julia, and
[133] C. Gallo and V. Capozzi, “A simulated annealing al- T. Miyazawa, “Consideration on automation of 5g
gorithm for scheduling problems,” J Appl Math Phys, network slicing with machine learning,” in 2018 ITU
vol. 7, no. 11, pp. 2579–2594, 2019. Kaleidoscope: Mach. Learn. a 5G Future (ITU K).
[134] J. J. F. S. Avinash Dixit, “Game theory explained,” IEEE, 2018, pp. 1–8.
https://www.pbs.org/wgbh/americanexperience/features [147] Y. Liu, H. Lu, X. Li, Y. Zhang, L. Xi, and D. Zhao, “Dy-
/nash-game/. namic service function chain orchestration for nfv/mec-
[135] W. Lu, W. Wu, J. Xu, P. Zhao, D. Yang, and L. Xu, enabled iot networks: A deep reinforcement learning
“Auction design for cross-edge task offloading in het- approach,” IEEE Int. Things J., vol. 8, no. 9, pp. 7450–
erogeneous mobile edge clouds,” Computer Communi- 7465, 2021.
cations, vol. 181, pp. 90–101, 2022. [148] T. Subramanya, D. Harutyunyan, and R. Riggio, “Ma-
[136] W. Lu, S. Zhang, J. Xu, D. Yang, and L. Xu, “Truthful chine learning-driven service function chain placement
multi-resource transaction mechanism for p2p task of- and scaling in mec-enabled 5g networks,” Comput.
floading based on edge computing,” IEEE Transactions Netw., vol. 166, p. 106980, 2020.
on Vehicular Technology, vol. 70, no. 6, pp. 6122–6135, [149] B. Trinh and G.-M. Muntean, “A deep reinforcement
2021. learning-based resource management scheme for sdn-
[137] J. Moura and D. Hutchison, “Game theory for multi- mec-supported xr applications,” in 2022 IEEE 19th
access edge computing: Survey, use cases, and future Annu. Consum. Commun. Netw. Conf. (CCNC), 2022,
trends,” IEEE Commun. Surveys Tuts., vol. 21, no. 1, pp. 790–795.
pp. 260–288, 2019. [150] C. Li, C. Qianqian, and Y. Luo, “Low-latency edge
[138] X. Feng, Y. Liu, and S. Wei, “Livedeep: Online viewport cooperation caching based on base station cooperation
prediction for live virtual reality streaming using life- in sdn based mec,” Expert Syst. Appl., vol. 191, p.
long deep learning,” in 2020 IEEE Conf. Virtual Real. 116252, 2022.
3D User Interfaces (VR), 2020, pp. 800–808. [151] H. Zhang, R. Wang, W. Sun, and H. Zhao, “Mobil-
[139] C.-J. Wu, D. Brooks, K. Chen, D. Chen, S. Choudhury, ity management for blockchain-based ultra-dense edge
M. Dukhan, K. Hazelwood, E. Isaac, Y. Jia, B. Jia, computing: A deep reinforcement learning approach,”
T. Leyvand, H. Lu, Y. Lu, L. Qiao, B. Reagen, J. Spisak, IEEE Trans. Wireless Commun., vol. 20, no. 11, pp.
F. Sun, A. Tulloch, P. Vajda, X. Wang, Y. Wang, 7346–7359, 2021.
B. Wasti, Y. Wu, R. Xian, S. Yoo, and P. Zhang, “Ma- [152] D. Wang, X. Tian, H. Cui, and Z. Liu, “Reinforce-
chine learning at facebook: Understanding inference ment learning-based joint task offloading and migration
at the edge,” in 2019 IEEE Int. Symp. High Perform. schemes optimization in mobility-aware mec network,”
Comput. Architecture (HPCA), 2019, pp. 331–344. China Commun., vol. 17, no. 8, pp. 31–44, 2020.
[140] S. Tuli, N. Basumatary, S. S. Gill, M. Kahani, R. C. [153] A. Lekharu, M. Jain, A. Sur, and A. Sarkar, “Deep
Arya, G. S. Wander, and R. Buyya, “Healthfog: An learning model for content aware caching at mec
ensemble deep learning based smart healthcare system servers,” IEEE Trans Netw Service Manag, 2021.
for automatic diagnosis of heart diseases in integrated [154] W.-C. Chien, H.-Y. Weng, and C.-F. Lai, “Q-learning
iot and fog computing environments,” Future Gener based collaborative cache allocation in mobile edge
Comp Sy, vol. 104, pp. 187–200, 2020. computing,” Future Gener. Comput. Syst., vol. 102, pp.
[141] K. Chakrabarti, “Deep learning based offloading for 603–610, 2020.
mobile augmented reality application in 6g,” Comput. [155] L. Sun, L. Wan, and X. Wang, “Learning-based resource
& Elect. Eng., vol. 95, p. 107381, 2021. allocation strategy for industrial iot in uav-enabled mec
[142] Z. Wu and D. Yan, “Deep reinforcement learning- systems,” IEEE Trans. Ind. Inf., vol. 17, no. 7, pp. 5031–
based computation offloading for 5g vehicle-aware 5040, 2021.
multi-access edge computing network,” China Commun, [156] H. Peng and X. Shen, “Multi-agent reinforcement learn-
vol. 18, no. 11, pp. 26–41, 2021. ing based resource management in mec- and uav-
[143] A. Asheralieva and D. Niyato, “Learning-based mobile assisted vehicular networks,” IEEE J. Sel. Areas Com-
edge computing resource management to support public mun., vol. 39, no. 1, pp. 131–141, 2021.
blockchain networks,” IEEE Trans. Mobile Comput., [157] S. Vimal, M. Khari, N. Dey, R. G. Crespo, and
Y. Harold Robinson, “Enhanced resource allocation in Graf, “Pruning filters for efficient convnets,” arXiv
mobile edge computing using reinforcement learning preprint arXiv:1608.08710, 2016.
based moaco algorithm for iiot,” Comput. Commun., [173] H. Hu, R. Peng, Y.-W. Tai, and C.-K. Tang, “Net-
vol. 151, pp. 355–364, 2020. work trimming: A data-driven neuron pruning approach
[158] L. Liu, J. Feng, Q. Pei, C. Chen, Y. Ming, B. Shang, towards efficient deep architectures,” arXiv preprint
and M. Dong, “Blockchain-enabled secure data sharing arXiv:1607.03250, 2016.
scheme in mobile-edge computing: An asynchronous [174] Y. Jiang, S. Wang, V. Valls, B. J. Ko, W.-H. Lee,
advantage actor–critic learning approach,” IEEE Int. K. K. Leung, and L. Tassiulas, “Model pruning enables
Things J., vol. 8, no. 4, pp. 2342–2353, 2021. efficient federated learning on edge devices,” arXiv
[159] U. Majeed and C. S. Hong, “Flchain: Federated learn- preprint arXiv:1909.12326, 2019.
ing via mec-enabled blockchain network,” in 2019 [175] T. Choudhary, V. Mishra, A. Goswami, and J. Saranga-
20th Asia-Pacific Net. Operations Manage. Symp. (AP- pani, “A comprehensive survey on model compression
NOMS), 2019, pp. 1–4. and acceleration,” Artif Intell Rev, vol. 53, no. 7, pp.
[160] Z. Mlika and S. Cherkaoui, “Network slicing with mec 5113–5155, 2020.
and deep reinforcement learning for the internet of [176] A. Berthelier, T. Chateau, S. Duffner, C. Garcia, and
vehicles,” IEEE Netw, vol. 35, no. 3, pp. 132–138, 2021. C. Blanc, “Deep model compression and architecture
[161] J. Du, F. R. Yu, G. Lu, J. Wang, J. Jiang, and X. Chu, optimization for embedded systems: A survey,” J Signal
“Mec-assisted immersive vr video streaming over tera- Process. Syst., vol. 93, no. 8, pp. 863–878, 2021.
hertz wireless networks: A deep reinforcement learning [177] C. N. Coelho, A. Kuusela, S. Li, H. Zhuang, J. Ngadi-
approach,” IEEE Int. Things J., vol. 7, no. 10, pp. 9517– uba, T. K. Aarrestad, V. Loncar, M. Pierini, A. A. Pol,
9529, 2020. and S. Summers, “Automatic heterogeneous quantiza-
[162] S. Wan, L. Qi, X. Xu, C. Tong, and Z. Gu, “Deep tion of deep neural networks for low-latency inference
learning models for real-time human activity recogni- on the edge for particle detectors,” Nature Mach. Intell.,
tion with smartphones,” Mobile Netw. Appl., vol. 25, vol. 3, no. 8, pp. 675–686, 2021.
no. 2, pp. 743–755, 2020. [178] A. Kwasniewska, M. Szankin, M. Ozga, J. Wolfe,
[163] A. Feriani, A. Refaey, and E. Hossain, “Tracking pan- A. Das, A. Zajac, J. Ruminski, and P. Rad, “Deep
demics: A mec-enabled iot ecosystem with learning learning optimization for edge devices: Analysis of
capability,” IEEE Int. Things Mag., vol. 3, no. 3, pp. training quantization parameters,” in IECON 2019 -
40–45, 2020. 45th Annu. Conf. IEEE Ind. Electronics Soc., vol. 1,
[164] J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge 2019, pp. 96–101.
distillation: A survey,” Int J Comput Vision, vol. 129, [179] N. Tonellotto, A. Gotta, F. M. Nardini, D. Gadler, and
no. 6, pp. 1789–1819, 2021. F. Silvestri, “Neural network quantization in federated
[165] R. Reed, “Pruning algorithms-a survey,” IEEE Trans learning at the edge,” Inform Sciences, vol. 575, pp.
Neural Netw, vol. 4, no. 5, pp. 740–747, 1993. 417–436, 2021.
[166] T. Liang, J. Glossner, L. Wang, S. Shi, and X. Zhang, [180] S. Merity, C. Xiong, J. Bradbury, and R. Socher,
“Pruning and quantization for deep neural network “Pointer sentinel mixture models,” arXiv preprint
acceleration: A survey,” Neurocomputing, vol. 461, pp. arXiv:1609.07843, 2016.
370–403, 2021. [181] S. Wiedemann, S. Shivapakash, D. Becking, P. Wiede-
[167] J. Wang, L. Gou, W. Zhang, H. Yang, and H.-W. mann, W. Samek, F. Gerfers, and T. Wiegand, “Fantas-
Shen, “Deepvid: Deep visual interpretation and diag- tic4: A hardware-software co-design approach for ef-
nosis for image classifiers via knowledge distillation,” ficiently running 4bit-compact multilayer perceptrons,”
IEEE Trans Vis Comput Graphics, vol. 25, no. 6, 2019. IEEE Open J Circuits Syst., vol. 2, pp. 407–419, 2021.
[168] E. Tanghatari, M. Kamal, A. Afzali-Kusha, and M. Pe- [182] Y. Cheng, D. Wang, P. Zhou, and T. Zhang, “A survey
dram, “Distributing dnn training over iot edge devices of model compression and acceleration for deep neural
based on transfer learning,” Neurocomputing, vol. 467, networks,” arXiv preprint arXiv:1710.09282, 2017.
pp. 56–65, 2022. [183] L. Lai, N. Suda, and V. Chandra, “Cmsis-nn: Efficient
[169] Y. Lin, Y. Tu, and Z. Dou, “An improved neural neural network kernels for arm cortex-m cpus,” arXiv
network pruning technology for automatic modulation preprint arXiv:1801.06601, 2018.
classification in edge devices,” IEEE Trans Veh Technol, [184] J. Lin, W.-M. Chen, Y. Lin, C. Gan, S. Han et al.,
vol. 69, no. 5, pp. 5703–5706, 2020. “Mcunet: Tiny deep learning on iot devices,” Adv Neur
[170] Z. Zhou, H. Cai, S. Rong, Y. Song, K. Ren, W. Zhang, In, vol. 33, pp. 11 711–11 722, 2020.
Y. Yu, and J. Wang, “Activation maximization genera- [185] S. Mücke, N. Piatkowski, and K. Morik, “Hardware
tive adversarial nets,” arXiv preprint arXiv:1703.02000, acceleration of machine learning beyond linear algebra,”
2017. in Joint Eur. Conf. Mach. Learn. Knowl. Discovery
[171] T. J. O’shea and N. West, “Radio machine learning Databases. Springer, 2019, pp. 342–347.
dataset generation with gnu radio,” in Proc. GNU Radio [186] K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han, “Haq:
Conf., vol. 1, no. 1, 2016. Hardware-aware automated quantization with mixed
[172] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. precision,” in Proc. IEEE/CVF Conf. Comput. Vision
Pattern Recognit., 2019, pp. 8612–8620. multiuser dnn partitioning and computational resource
[187] X. Zhang, J. Wang, C. Zhu, Y. Lin, J. Xiong, W.-m. allocation for collaborative edge intelligence,” IEEE
Hwu, and D. Chen, “Dnnbuilder: An automated tool for Internet Things J, vol. 8, no. 12, pp. 9511–9522, 2021.
building high-performance dnn hardware accelerators [202] X. Chen, J. Zhang, B. Lin, Z. Chen, K. Wolter, and
for fpgas,” in 2018 IEEE/ACM Int. Conf. Computer- G. Min, “Energy-efficient offloading for dnn-based
Aided Des. (ICCAD). IEEE, 2018, pp. 1–8. smart iot systems in cloud-edge environments,” IEEE
[188] S. Li, E. Hanson, X. Qian, H. H. Li, and Y. Chen, Trans Parallel Distrib Syst, vol. 33, no. 3, pp. 683–697,
“Escalate: Boosting the efficiency of sparse cnn accel- 2022.
erator with kernel decomposition,” in MICRO-54: 54th [203] A. Qadeer and M. J. Lee, “Ddpg-edge-cloud: A deep-
Annu. IEEE/ACM Int. Symp. Microarchitecture, 2021, deterministic policy gradient based multi-resource allo-
pp. 992–1004. cation in edge-cloud system,” in 2022 Int. Conf. Artif.
[189] E. B. Moustafa, A. H. Hammad, and A. H. Elsheikh, “A Intell. Inf. Commun. (ICAIIC), 2022, pp. 339–344.
new optimized artificial neural network model to predict [204] F. Dai, G. Liu, Q. Mo, W. Xu, and B. Huang, “Task
thermal efficiency and water yield of tubular solar still,” offloading for vehicular edge computing with edge-
Case Stud. Thermal Eng., vol. 30, p. 101750, 2022. cloud cooperation,” World Wide Web, pp. 1–19, 2022.
[190] S. Mirjalili and A. Lewis, “The whale optimization [205] C. Shi, L. Chen, C. Shen, L. Song, and J. Xu, “Privacy-
algorithm,” Adv. Eng. Softw., vol. 95, pp. 51–67, 2016. aware edge computing based on adaptive dnn partition-
[191] K. Rusek, J. Suárez-Varela, P. Almasan, P. Barlet-Ros, ing,” in 2019 IEEE Global Commun. Conf. (GLOBE-
and A. Cabellos-Aparicio, “Routenet: Leveraging graph COM), 2019, pp. 1–6.
neural networks for network modeling and optimization [206] H.-S. Lee and D.-E. Lee, “Resource allocation in wire-
in sdn,” IEEE J. Sel. Areas Commun., vol. 38, no. 10, less networks with federated learning: Network adapt-
pp. 2260–2270, 2020. ability and learning acceleration,” ICT Express, vol. 8,
[192] W. Zhang, Z. Zhang, S. Zeadally, H.-C. Chao, and no. 1, pp. 31–36, 2022.
V. C. M. Leung, “Masm: A multiple-algorithm service [207] J. Ren, H. Wang, T. Hou, S. Zheng, and C. Tang, “Fed-
model for energy-delay optimization in edge artificial erated learning-based computation offloading optimiza-
intelligence,” IEEE Trans Ind Informat, vol. 15, no. 7, tion in edge computing-supported internet of things,”
pp. 4216–4224, 2019. IEEE Access, vol. 7, pp. 69 194–69 201, 2019.
[193] X. Chen, M. Li, H. Zhong, Y. Ma, and C.-H. Hsu, [208] R. Yu and P. Li, “Toward resource-efficient feder-
“Dnnoff: Offloading dnn-based intelligent iot appli- ated learning in mobile edge computing,” IEEE Netw.,
cations in mobile edge computing,” IEEE Trans Ind vol. 35, no. 1, pp. 148–155, 2021.
Informat, vol. 18, no. 4, pp. 2820–2829, 2022. [209] L. Zang, X. Zhang, and B. Guo, “Federated deep
[194] W. He, S. Guo, S. Guo, X. Qiu, and F. Qi, “Joint dnn reinforcement learning for online task offloading and re-
partition deployment and resource allocation for delay- source allocation in wpc-mec networks,” IEEE Access,
sensitive deep learning inference in iot,” IEEE Internet vol. 10, pp. 9856–9867, 2022.
Things J, vol. 7, no. 10, pp. 9241–9254, 2020. [210] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya,
[195] C. Kim, A. Dudin, O. Dudina, and S. Dudin, “Tandem T. He, and K. Chan, “Adaptive federated learning in
queueing system with infinite and finite intermediate resource constrained edge computing systems,” IEEE
buffers and generalized phase-type service time distri- J. Sel. Areas Commun., vol. 37, no. 6, pp. 1205–1221,
bution,” Eur J Oper Res, vol. 235, no. 1, pp. 170–179, 2019.
2014. [211] A. Hard, K. Rao, R. Mathews, S. Ramaswamy, F. Bea-
[196] M. Gao, W. Cui, D. Gao, R. Shen, J. Li, and Y. Zhou, ufays, S. Augenstein, H. Eichner, C. Kiddon, and
“Deep neural network task partitioning and offloading D. Ramage, “Federated learning for mobile keyboard
for mobile edge computing,” in 2019 IEEE Global prediction,” arXiv preprint arXiv:1811.03604, 2018.
Commun. Conf. (GLOBECOM), 2019, pp. 1–6. [212] Y. LeCun et al., “Lenet-5, convolutional neural
[197] M. Gao, R. Shen, L. Shi, W. Qi, J. Li, and Y. Li. (2021) networks,” URL: http://yann. lecun. com/exdb/lenet,
Task partitioning and offloading in dnn-task enabled vol. 20, no. 5, p. 14, 2015.
mobile edge computing networks. [213] H. Qassim, A. Verma, and D. Feinzimer, “Compressed
[198] S. Venugopal, M. Gazzetti, Y. Gkoufas, and K. Katrinis, residual-vgg16 cnn model for big data places image
“Shadow puppets: Cloud-level accurate {AI} inference recognition,” in 2018 IEEE 8th Annu. Comput. Com-
at the speed and economy of edge,” in USENIX Work- mun. Workshop Conf. (CCWC). IEEE, 2018.
shop Hot Topics Edge Comput. (HotEdge 18), 2018. [214] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning
[199] Y. Fang, S. M. Shalmani, and R. Zheng, “Cachenet: A face attributes in the wild,” in Proc. IEEE Int. Conf.
model caching framework for deep learning inference Comput. vision, 2015, pp. 3730–3738.
on the edge,” arXiv preprint arXiv:2007.01793, 2020. [215] T. O’shea and J. Hoydis, “An introduction to deep
[200] B. Recht, R. Roelofs, L. Schmidt, and V. Shankar, learning for the physical layer,” IEEE Trans Cogn
“Do cifar-10 classifiers generalize to cifar-10?” arXiv Commun Netw, vol. 3, no. 4, pp. 563–575, 2017.
preprint arXiv:1806.00451, 2018. [216] S. Caldas, S. M. K. Duddu, P. Wu, T. Li, J. Konečnỳ,
[201] X. Tang, X. Chen, L. Zeng, S. Yu, and L. Chen, “Joint H. B. McMahan, V. Smith, and A. Talwalkar, “Leaf:
A benchmark for federated settings,” arXiv preprint Informat, vol. 15, no. 7, pp. 4276–4284, 2019.
arXiv:1812.01097, 2018. [231] X. Liu, J. Yu, J. Wang, and Y. Gao, “Resource alloca-
[217] S. Ayyachamy, V. Alex, M. Khened, and G. Krishna- tion with edge computing in iot networks via machine
murthi, “Medical image retrieval using resnet-18,” in learning,” IEEE Internet Things J, vol. 7, no. 4, pp.
Med. Imag. 2019: Imag. Inf. Healthcare Res. Appl., vol. 3415–3426, 2020.
10954. International Society for Optics and Photonics, [232] A. Kwasinski, W. Wang, and F. S. Mohammadi, “Rein-
2019, p. 1095410. forcement learning for resource allocation in cognitive
[218] A. Krizhevsky, G. Hinton et al., “Learning multiple radio networks,” Mach. Learn. Future Wireless Com-
layers of features from tiny images,” 2009. mun., pp. 27–44, 2020.
[219] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and [233] H. Ghauch, H. Shokri-Ghadikolaei, G. Fodor, C. Fis-
L. Fei-Fei, “Imagenet: A large-scale hierarchical image chione, and M. Skoglund, “Machine learning for spec-
database,” in 2009 IEEE conf. comput. vision pattern trum sharing in millimeter-wave cellular networks,”
recognition. Ieee, 2009, pp. 248–255. Mach. Learn. Future Wireless Commun., pp. 45–62,
[220] J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, 2020.
B. Kreis, J. Ngadiuba, M. Pierini, R. Rivera, N. Tran [234] N. C. Luong, Y. Jiao, P. Wang, D. Niyato, D. I. Kim, and
et al., “Fast inference of deep neural networks in fpgas Z. Han, “A machine-learning-based auction for resource
for particle physics,” J Instrum, vol. 13, no. 07, p. trading in fog computing,” IEEE Commun Mag, vol. 58,
P07027, 2018. no. 3, pp. 82–88, 2020.
[221] [Online]. Available: https://github.com/google/qkeras [235] Z. Al-Makhadmeh and A. Tolba, “Sraf: Scalable re-
[222] J. Ngadiuba, V. Loncar, M. Pierini, S. Summers, source allocation framework using machine learning
G. Di Guglielmo, J. Duarte, P. Harris, D. Rankin, in user-centric internet of things,” Peer-to-Peer Netw.
S. Jindariani, M. Liu et al., “Compressing deep neural Appl., pp. 1–11, 2020.
networks on fpgas to binary and ternary precision with [236] S. C. Rajanarayanan, R. Misra, and R. J. Pandya, “Ma-
hls4ml,” Mach Learn : Sci Technol, vol. 2, no. 1, p. chine learning oriented resource allocation to achieve
015001, 2020. ultra low power, low latency and high reliability ve-
[223] H.-J. Jeong, H.-J. Lee, K. Yong Shin, Y. Hwan Yoo, and hicular communication networks,” in 2020 IEEE 17th
S.-M. Moon, “Perdnn: Offloading deep neural network India Council Int. Conf. (INDICON), 2020, pp. 1–5.
computations to pervasive edge servers,” in 2020 IEEE [237] J. Liu, T. Yang, J. Bai, and B. Sun, “Resource alloca-
40th Int. Conf. Distrib. Comput. Syst. (ICDCS), 2020, tion and scheduling in the intelligent edge computing
pp. 1055–1066. context,” Future Gener Comp Sy, vol. 121, pp. 48–53,
[224] Y. Zheng, X. Xie, W.-Y. Ma et al., “Geolife: A collab- 2021.
orative social networking service among user, location [238] C. Mechalikh, H. Taktak, and F. Moussa, “A fuzzy
and trajectory.” IEEE Data Eng. Bull., vol. 33, no. 2, decision tree based tasks orchestration algorithm for
pp. 32–39, 2010. edge computing environments,” in Adv. Inf. Netw. Appl.,
[225] J. Scott, R. Gass, J. Crowcroft, P. Hui, C. Diot, and L. Barolli, F. Amato, F. Moscato, T. Enokido, and
A. Chaintreau, “Crawdad dataset cambridge/haggle (v. M. Takizawa, Eds. Cham: Springer International
2006-09-15),” CRAWDAD wireless netw data arch, Publishing, 2020, pp. 193–203.
2006. [239] H. Guo, J. Liu, and J. Lv, “Toward intelligent task
[226] Z. Zhang, L. Tran, X. Yin, Y. Atoum, X. Liu, J. Wan, offloading at the edge,” IEEE Netw, vol. 34, no. 2, pp.
and N. Wang, “Gait recognition via disentangled repre- 128–134, 2020.
sentation learning,” in Proc. IEEE/CVF Conf. Comput. [240] S. Imtiaz, H. Ghauch, G. P. Koudouridis, and J. Gross,
Vision Pattern Recognit., 2019, pp. 4710–4719. “Random forests resource allocation for 5g systems:
[227] S. Shahhosseini, D. Seo, A. Kanduri, T. Hu, S.-s. Lim, Performance and robustness study,” in 2018 IEEE Wire-
B. Donyanavard, A. M. Rahmani, and N. Dutt, “Online less Commun. Netw. Conf. Workshops (WCNCW), 2018,
learning for orchestration of inference in multi-user end- pp. 326–331.
edge-cloud networks,” ACM Trans Embedded Comput [241] M. E. Mavroforakis and S. Theodoridis, “A geometric
Syst (TECS), 2022. approach to support vector machine (svm) classifica-
[228] D. Xu, Q. Li, and H. Zhu, “Energy-saving computation,” IEEE Trans Neural Netw, vol. 17, no. 3, pp. 671–
tion offloading by joint data compression and resource 682, 2006.
allocation for mobile-edge computing,” IEEE Commun [242] S. Wang, M. Chen, C. Yin, W. Saad, C. S. Hong,
Lett, vol. 23, no. 4, pp. 704–707, 2019. S. Cui, and H. V. Poor, “Federated learning for task
[229] T. T. Nguyen, V. N. Ha, L. B. Le, and R. Schober, and resource allocation in wireless high altitude balloon
“Joint data compression and computation offloading in networks,” IEEE Internet Things J, pp. 1–1, 2021.
hierarchical fog-cloud systems,” IEEE Trans Wireless [243] D. Ding, X. Fan, Y. Zhao, K. Kang, Q. Yin, and J. Zeng,
Commun, vol. 19, no. 1, pp. 293–309, 2020. “Q-learning based dynamic task scheduling for energy-
[230] C.-C. Lin, D.-J. Deng, Y.-L. Chih, and H.-T. Chiu, efficient cloud computing,” Future Gener Comp Sy, vol.
“Smart manufacturing scheduling with edge computing 108, pp. 361 – 371, 2020.
using multiclass deep q network,” IEEE Trans Ind [244] Z. Tong, X. Deng, H. Chen, J. Mei, and H. Liu,
“Ql-heft: a novel machine learning scheduling scheme IEEE Trans Mobile Comput, vol. 19, no. 11, pp. 2581–
base on cloud computing environment,” Neural Comput. 2593, 2020.
Appl., pp. 1–18, 03 2019. [258] W. Zhan, C. Luo, J. Wang, C. Wang, G. Min, H. Duan,
[245] Z. Tong, Z. Xiao, K. Li, and K. Li, “Proactive schedul- and Q. Zhu, “Deep-reinforcement-learning-based of-
ing in distributed computing—a reinforcement learning floading scheduling for vehicular edge computing,”
approach,” J Parallel Distr Com, vol. 74, no. 7, pp. 2662 IEEE Internet Things J, vol. 7, no. 6, pp. 5449–5465,
– 2672, 2014. 2020.
[246] T. Pham, J. J. Durillo, and T. Fahringer, “Predicting [259] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and
workflow task execution time in the cloud using a two- O. Klimov, “Proximal policy optimization algorithms,”
stage machine learning approach,” IEEE Trans on Cloud arXiv preprint arXiv:1707.06347, 2017.
Comput, vol. 8, no. 1, pp. 256–268, 2020. [260] H. Meng, D. Chao, and Q. Guo, “Deep reinforcement
[247] P. Yu, F. Zhou, X. Zhang, X. Qiu, M. Kadoch, and learning based task offloading algorithm for mobile-
M. Cheriet, “Deep learning-based resource allocation edge computing systems,” in Proc. 2019 4th Int. Conf.
for 5g broadband tv service,” IEEE Trans Broadcast, Math. Artif. Intell., ser. ICMAI 2019. New York, NY,
vol. 66, no. 4, pp. 800–813, 2020. USA: Association for Computing Machinery, 2019, p.
[248] G. Rjoub, J. Bentahar, O. Abdel Wahab, and 90–94.
A. Bataineh, “Deep smart scheduling: A deep learning [261] D. Rahbari and M. Nickray, “Task offloading in mobile
approach for automated big data scheduling over the fog computing by classification and regression tree,”
cloud,” in 2019 7th Int. Conf. Future Internet Things Peer-to-Peer Netw. Appl., vol. 13, no. 1, 2020.
Cloud (FiCloud), 2019, pp. 189–196. [262] T. Q. Dinh, Q. D. La, T. Q. S. Quek, and H. Shin,
[249] P. Goswami, A. Mukherjee, M. Maiti, S. K. S. Tyagi, “Learning for computation offloading in mobile edge
and L. Yang, “A neural network based optimal resource computing,” IEEE Trans Commun, vol. 66, no. 12, pp.
allocation method for secure iiot network,” IEEE Inter- 6353–6367, 2018.
net Things J, pp. 1–1, 2021. [263] X. Chen, L. Jiao, W. Li, and X. Fu, “Efficient multi-user
[250] J. Shi, Q. Zhang, Y.-C. Liang, and X. Yuan, “Distributed computation offloading for mobile-edge cloud comput-
deep learning power allocation for d2d network based ing,” IEEE/ACM Trans Netw, vol. 24, no. 5, pp. 2795–
on outdated information,” in 2020 IEEE Wireless Com- 2808, 2016.
mun. Netw. Conf. (WCNC), 2020, pp. 1–6. [264] T. Yang, S. Gao, J. Li, M. Qin, X. Sun, R. Zhang,
[251] Z. Hu, J. Tu, and B. Li, “Spear: Optimized dependency- M. Wang, and X. Li, “Multi-armed bandits learning for
aware task scheduling with deep reinforcement learn- task offloading in maritime edge intelligence networks,”
ing,” in 2019 IEEE 39th Int. Conf. Distrib. Comput. IEEE Trans. Veh. Technol., pp. 1–1, 2022.
Syst. (ICDCS), July 2019, pp. 2037–2046. [265] A. Slivkins, “Introduction to multi-armed bandits,”
[252] J. Feng, F. Richard Yu, Q. Pei, X. Chu, J. Du, and arXiv preprint arXiv:1904.07272, 2019.
L. Zhu, “Cooperative computation offloading and re- [266] Y. Zhan, S. Guo, P. Li, and J. Zhang, “A deep re-
source allocation for blockchain-enabled mobile-edge inforcement learning based offloading game in edge
computing: A deep reinforcement learning approach,” computing,” IEEE Trans Comput, vol. 69, no. 6, pp.
IEEE Internet Things J, vol. 7, no. 7, pp. 6214–6228, 883–893, 2020.
2020. [267] T. Bansal, J. Pachocki, S. Sidor, I. Sutskever, and
[253] N. Liu, Z. Li, J. Xu, Z. Xu, S. Lin, Q. Qiu, J. Tang, I. Mordatch, “Emergent complexity via multi-agent
and Y. Wang, “A hierarchical framework of cloud competition,” arXiv preprint arXiv:1710.03748, 2017.
resource allocation and power management using deep [268] S. Li, S. Bing, and S. Yang, “Distributional advantage
reinforcement learning,” in 2017 IEEE 37th Int. Conf. actor-critic,” arXiv preprint arXiv:1806.06914, 2018.
Distrib. Comput. Syst. (ICDCS), 2017, pp. 372–382. [269] X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji, and M. Ben-
[254] H. Ye, G. Y. Li, and B.-H. F. Juang, “Deep rein- nis, “Optimized computation offloading performance in
forcement learning based resource allocation for v2v virtual edge computing systems via deep reinforcement
communications,” IEEE Trans Veh Technol, vol. 68, learning,” IEEE Internet Things J, vol. 6, no. 3, pp.
no. 4, pp. 3163–3173, 2019. 4005–4018, 2019.
[255] W. Zhan, C. Luo, J. Wang, G. Min, and H. Duan, “Deep [270] M. Min, X. Wan, L. Xiao, Y. Chen, M. Xia, D. Wu,
reinforcement learning-based computation offloading in and H. Dai, “Learning-based privacy-aware offloading
vehicular edge computing,” in 2019 IEEE Global Com- for healthcare iot with energy harvesting,” IEEE Internet
mun. Conf. (GLOBECOM). IEEE Press, 2019, p. 1–6. Things J, vol. 6, no. 3, pp. 4307–4316, 2019.
[256] J. Wang, J. Hu, G. Min, A. Y. Zomaya, and N. Geor- [271] X. He, J. Liu, R. Jin, and H. Dai, “Privacy-aware
galas, “Fast adaptive task offloading in edge computing offloading in mobile-edge computing,” in GLOBECOM
based on meta reinforcement learning,” IEEE Trans 2017 - 2017 IEEE Global Commun. Conf., 2017, pp.
Parallel Distrib Syst, vol. 32, no. 1, pp. 242–253, 2021. 1–6.
[257] L. Huang, S. Bi, and Y.-J. A. Zhang, “Deep rein- [272] X. He, H. Dai, and P. Ning, “Improving learning and
forcement learning for online computation offloading adaptation in security games by exploiting information
in wireless powered mobile-edge computing networks,” asymmetry,” in 2015 IEEE Conf. Comput. Commun.
(INFOCOM), 2015, pp. 1787–1795. [287] Q. Chen, Z. Zheng, C. Hu, D. Wang, and F. Liu, “On-
[273] D. C. Nguyen, P. N. Pathirana, M. Ding, and A. Senevi- edge multi-task transfer learning: Model and practice
ratne, “Privacy-preserved task offloading in mobile with data-driven task allocation,” IEEE Trans Parallel
blockchain with deep reinforcement learning,” IEEE Distrib Syst, vol. 31, no. 6, pp. 1357–1371, 2020.
Trans Netw Service Manag, vol. 17, no. 4, pp. 2536– [288] Z. Peng, D. Cui, J. Zuo, Q. Li, B. Xu, and W. Lin,
2549, 2020. “Random task scheduling scheme based on reinforce-
[274] K. Wang, X. Yu, W. Lin, Z. Deng, and X. Liu, “Com- ment learning in cloud computing,” Cluster comput.,
puting aware scheduling in mobile edge computing vol. 18, no. 4, pp. 1595–1607, 2015.
system,” Wirel Netw, pp. 1–17, 2019. [289] J. G. Shanthikumar, S. Ding, and M. T. Zhang, “Queue-
[275] N. Li, L. Hu, Z.-L. Deng, T. Su, and J.-W. Liu, “Re- ing theory for semiconductor manufacturing systems: A
search on gru neural network satellite traffic prediction survey and open problems,” IEEE Trans Autom Sci Eng,
based on transfer learning,” Kluw Commun, vol. 118, vol. 4, no. 4, pp. 513–522, 2007.
no. 1, pp. 815–827, 2021. [290] H. Khazaei, J. Misic, and V. B. Misic, “Performance
[276] W. Sun, J. Liu, and Y. Yue, “Ai-enhanced offloading in analysis of cloud computing centers using m/g/m/m+r
edge computing: When machine learning meets indus- queuing systems,” IEEE Trans Parallel Distrib Syst,
trial iot,” IEEE Netw, vol. 33, no. 5, pp. 68–74, 2019. vol. 23, no. 5, pp. 936–943, 2012.
[277] M. H. Moghadam and S. M. Babamir, “Makespan [291] A. M. Kintsakis, F. E. Psomopoulos, and P. A. Mitkas,
reduction for dynamic workloads in cluster-based data “Reinforcement learning based scheduling in a work-
grids using reinforcement-learning based scheduling,” J flow management system,” Eng Appl Artif Intel, vol. 81,
Comput Sci, vol. 24, pp. 402 – 412, 2018. pp. 94 – 106, 2019.
[278] R.-S. Chang, J.-S. Chang, and S.-Y. Lin, “Job schedul- [292] W. Cui, K. Shen, and W. Yu, “Spatial deep learning
ing and data replication on data grids,” Future Gener for wireless scheduling,” IEEE J Sel Areas Commun,
Comp Sy, vol. 23, no. 7, pp. 846–860, 2007. vol. 37, no. 6, pp. 1248–1261, 2019.
[279] T. T. Sung, J. Ha, J. Kim, A. Yahja, C.-B. Sohn, [293] J. Zhou, “Real-time task scheduling and network device
and B. Ryu, “Deepsocs: A neural scheduler for het- security for complex embedded systems based on deep
erogeneous system-on-chip (soc) resource scheduling,” learning networks,” Microprocess Microsy, vol. 79, p.
Electronics, vol. 9, no. 6, p. 936, 2020. 103282, 2020.
[280] T. Dong, F. Xue, C. Xiao, and J. Li, “Task schedul- [294] D. Cui, W. Ke, Z. Peng, and J. Zuo, “Multiple dags
ing based on deep reinforcement learning in a cloud workflow scheduling algorithm based on reinforcement
manufacturing environment,” Concurrency Computa- learning in cloud computing,” in Comput. Intell. Intell.
tion: Pract. Experience, vol. 32, no. 11, p. e5654, 2020. Syst., 2016, pp. 305–311.
[281] H. Topcuoglu, S. Hariri, and M.-y. Wu, “Performance- [295] C. Morariu, O. Morariu, S. Răileanu, and T. Borangiu,
effective and low-complexity task scheduling for hetero- “Machine learning for predictive scheduling and re-
geneous computing,” IEEE Trans Parallel Distrib Syst, source allocation in large scale manufacturing systems,”
vol. 13, no. 3, pp. 260–274, 2002. Comput Ind, vol. 120, p. 103244, 2020.
[282] R. Grandl, S. Kandula, S. Rao, A. Akella, and [296] G. Rjoub, J. Bentahar, O. Abdel Wahab, and
J. Kulkarni, “Graphene: Packing and dependency-aware A. Saleh Bataineh, “Deep and reinforcement learning
scheduling for data-parallel clusters,” in Proc. 12th for automated task scheduling in large-scale cloud com-
USENIX Conf. Operating Syst. Des. Implementation, puting systems,” Concurrency Comput.: Pract. Experi-
ser. OSDI’16. USA: USENIX Association, 2016, p. ence, vol. 33, no. 23, p. e5919, 2021.
81–97. [297] M.-H. Chen, B. Liang, and M. Dong, “Joint offloading
[283] Y. Wang, H. Liu, W. Zheng, Y. Xia, Y. Li, P. Chen, decision and resource allocation for multi-user multi-
K. Guo, and H. Xie, “Multi-objective workflow schedul- task mobile cloud,” in 2016 IEEE Int. Conf. Commun.
ing with deep-q-network-based multi-agent reinforce- (ICC), 2016, pp. 1–6.
ment learning,” IEEE Access, vol. 7, pp. 39 974–39 982, [298] F. Jiang, K. Wang, L. Dong, C. Pan, W. Xu, and
2019. K. Yang, “Deep-learning-based joint resource schedul-
[284] A. Chowdhury, S. A. Raut, and H. S. Narman, “Da- ing algorithms for hybrid mec networks,” IEEE Internet
drls: Drift adaptive deep reinforcement learning based Things J, vol. 7, no. 7, pp. 6252–6265, 2020.
scheduling for iot resource management,” J Netw Com- [299] M.-S. Yang, “A survey of fuzzy clustering,” Math
put Appl, vol. 138, pp. 51 – 65, 2019. Comput modelling, vol. 18, no. 11, pp. 1–16, 1993.
[285] Z. Wei, F. Liu, Y. Zhang, J. Xu, J. Ji, and Z. Lyu, [300] Z. Ning, P. Dong, X. Wang, J. J. P. C. Rodrigues, and
“A q-learning algorithm for task scheduling based on F. Xia, “Deep reinforcement learning for vehicular edge
improved svm in wireless sensor networks,” Comm Com computing: An intelligent offloading system,” ACM
Inf Sc, vol. 161, pp. 138 – 149, 2019. Trans Intell Syst Technol, vol. 10, no. 6, Oct. 2019.
[286] K. Shah and M. Kumar, “Distributed independent rein- [301] A. Shahidinejad and M. Ghobaei-Arani, “Joint compu-
forcement learning (dirl) approach to resource manage- tation offloading and resource provisioning for e dge-
ment in wireless sensor networks,” in 2007 IEEE Int. cloud computing environment: A machine learning-
Conf. Mobile Adhoc Sensor Syst., 2007, pp. 1–9. based approach,” Softw. Pract. Experience, vol. 50,
no. 12, pp. 2212–2230, 2020. and V. C. S. Lee, “Multi-armed bandit learning for
[302] H. Flores and S. Srirama, “Adaptive code offloading computation-intensive services in mec-empowered ve-
for mobile cloud applications: Exploiting fuzzy sets and hicular networks,” IEEE Trans. Veh. Technol., vol. 69,
evidence-based learning,” in Proc. 4th ACM workshop no. 7, pp. 7821–7834, 2020.
Mobile cloud comput. services, 2013, pp. 9–16. [317] N. Zhao, Y.-C. Liang, D. Niyato, Y. Pei, M. Wu, and
[303] J. Xu, L. Chen, and S. Ren, “Online learning for Y. Jiang, “Deep reinforcement learning for user asso-
offloading and autoscaling in energy harvesting mobile ciation and resource allocation in heterogeneous cellu-
edge computing,” IEEE Trans on Cogn Commun Netw, lar networks,” IEEE Trans Wireless Commun, vol. 18,
vol. 3, no. 3, pp. 361–373, 2017. no. 11, pp. 5141–5152, 2019.
[304] J. Zhang, X. Hu, Z. Ning, E. C.-H. Ngai, L. Zhou, [318] F. Xu, F. Yang, S. Bao, and C. Zhao, “Dqn inspired joint
J. Wei, J. Cheng, and B. Hu, “Energy-latency tradeoff computing and caching resource allocation approach for
for energy-aware offloading in mobile edge computing software defined information-centric internet of things
networks,” IEEE Internet Things J, vol. 5, no. 4, pp. network,” IEEE Access, vol. 7, pp. 61 987–61 996, 2019.
2633–2645, 2018. [319] Y. Fan, Z. Zhang, and H. Li, “Message passing based
[305] S. Xu, Q. Liu, B. Gong, F. Qi, S. Guo, X. Qiu, distributed learning for joint resource allocation in
and C. Yang, “Rjcc: Reinforcement-learning-based millimeter wave heterogeneous networks,” IEEE Trans
joint communicational-and-computational resource al- Wireless Commun, vol. 18, no. 5, pp. 2872–2885, 2019.
location mechanism for smart city iot,” IEEE Internet [320] Y. Wu, E. Dobriban, and S. Davidson, “Deltagrad:
Things J, vol. 7, no. 9, pp. 8059–8076, 2020. Rapid retraining of machine learning models,” in Int.
[306] L. Bracciale and P. Loreti, “Lyapunov drift-plus-penalty Conf. Mach. Learn. PMLR, 2020, pp. 10 355–10 366.
optimization for queues with finite capacity,” IEEE [321] R. Kolcun, D. A. Popescu, V. Safronov, P. Yadav, A. M.
Commun Lett, vol. 24, no. 11, pp. 2555–2558, 2020. Mandalari, Y. Xie, R. Mortier, and H. Haddadi, “The
[307] G. Zhang, W. Zhang, Y. Cao, D. Li, and L. Wang, case for retraining of ml models for iot device identi-
“Energy-delay tradeoff for dynamic offloading in fication at the edge,” arXiv preprint arXiv:2011.08605,
mobile-edge computing system with energy harvesting 2020.
devices,” IEEE Trans Ind Informat, vol. 14, no. 10, pp. [322] H. Djigal, F. Jun, and J. Lu, “Secure framework for
4642–4655, 2018. future smart city,” in 2017 IEEE 4th International
[308] C. J. Watkins and P. Dayan, “Q-learning,” Mach learn, Conference on Cyber Security and Cloud Computing
vol. 8, no. 3-4, pp. 279–292, 1992. (CSCloud), 2017, pp. 76–83.
[309] N. Kiran, C. Pan, S. Wang, and C. Yin, “Joint resource [323] Z. Hong, Z. Wang, and W. Cai, “Blockchain-empowered
allocation and computation offloading in mobile edge fair computational resource sharing system in the d2d
computing for sdn based wireless networks,” J Commun network,” Future Internet, vol. 9, p. 85, 11 2017.
Netw, vol. 22, no. 1, pp. 1–11, 2020. [324] Y. Guo and C. Liang, “Blockchain application and out-
[310] L. Lei, H. Xu, X. Xiong, K. Zheng, and W. Xiang, look in the banking industry,” Financial Innov., vol. 2,
“Joint computation offloading and multiuser scheduling no. 1, p. 24, 2016.
using approximate dynamic programming in nb-iot edge [325] A. Tapscott and D. Tapscott, “How blockchain is chang-
computing system,” IEEE Internet Things J, vol. 6, ing finance,” Harv Bus Rev, vol. 1, no. 9, pp. 2–5, 2017.
no. 3, pp. 5345–5362, 2019. [326] X. Liang, S. Shetty, D. Tosh, C. Kamhoua, K. Kwiat,
[311] G. Tesauro et al., “Temporal difference learning and and L. Njilla, “Provchain: A blockchain-based data
td-gammon,” Commun ACM, vol. 38, no. 3, pp. 58–68, provenance architecture in cloud environment with
1995. enhanced privacy and availability,” in Proc. 17th
[312] F. Jiang, K. Wang, L. Dong, C. Pan, and K. Yang, IEEE/ACM int. symp. cluster cloud grid comput. IEEE
“Stacked autoencoder-based deep reinforcement learn- Press, 2017, pp. 468–477.
ing for online resource scheduling in large-scale mec [327] E. J. De Aguiar, B. S. Faiçal, B. Krishnamachari, and
networks,” IEEE Internet Things J, vol. 7, no. 10, pp. J. Ueyama, “A survey of blockchain-based strategies for
9278–9290, 2020. healthcare,” ACM Comput Surv, vol. 53, no. 2, Mar.
[313] H. Ye and G. Y. Li, “Deep reinforcement learning based 2020.
distributed resource allocation for v2v broadcasting,” in [328] M. Liu, F. R. Yu, Y. Teng, V. C. M. Leung, and M. Song,
2018 14th Int. Wireless Commun. Mobile Comput. Conf. “Distributed resource allocation in blockchain-based
(IWCMC), 2018, pp. 440–445. video streaming systems with mobile edge comput-
[314] K. Wang, Y. Tan, Z. Shao, S. Ci, and Y. Yang, ing,” IEEE Transactions on Wireless Communications,
“Learning-based task offloading for delay-sensitive ap- vol. 18, no. 1, pp. 695–708, 2019.
plications in dynamic fog networks,” IEEE Trans. Veh. [329] J. Xie, H. Tang, T. Huang, F. R. Yu, R. Xie, J. Liu,
Technol., vol. 68, no. 11, pp. 11 399–11 403, 2019. and Y. Liu, “A survey of blockchain technology applied
[315] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time to smart cities: Research issues and challenges,” IEEE
analysis of the multiarmed bandit problem,” Mach. Commun. Surveys Tuts., vol. 21, no. 3, pp. 2794–2830,
learn., vol. 47, no. 2, pp. 235–256, 2002. thirdquarter 2019.
[316] P. Dai, Z. Hang, K. Liu, X. Wu, H. Xing, Z. Yu, [330] X. Wang, X. Ren, C. Qiu, Z. Xiong, H. Yao, and
V. Leung, “Integrating edge intelligence and blockchain: Linfeng Liu (M’13) received the B. S. and Ph.
What, why, and how,” 2022. D. degrees in computer science from the Southeast
University, Nanjing, China, in 2003 and 2008, re-
[331] Y. Liu, K. Wang, Y. Lin, and W. Xu, “LightChain: A spectively. At present, he is a Professor in the School
lightweight blockchain system for industrial internet of of Computer Science and Technology, Nanjing Uni-
things,” IEEE Transactions on Industrial Informatics, versity of Posts and Telecommunications, China.
His main research interests include the areas of
vol. 15, no. 6, pp. 3571–3581, 2019. vehicular ad hoc networks, wireless sensor networks
[332] A. Sufian, A. Ghosh, A. S. Sadiq, and F. Smarandache, and multi-hop mobile wireless networks. He has
“A survey on deep transfer learning to edge computing published more than 80 peer-reviewed papers in
some technical journals or conference proceedings,
for mitigating the covid-19 pandemic,” J Syst Architect, such as IEEE TMC, IEEE TPDS, ACM TAAS, IEEE TSC, IEEE TVT, IEEE
vol. 108, p. 101830, 2020. IoTJ, Computer Networks, Elsevier JPDC.
[333] B. Yang, X. Cao, C. Yuen, and L. Qian, “Offloading op-
timization in edge computing for deep-learning-enabled Yan Zhang (F’19) received the Ph.D. degree from
the School of Electrical and Electronics Engineering,
target tracking by internet of uavs,” IEEE Internet Nanyang Technological University, Singapore. He is
Things J, vol. 8, no. 12, pp. 9878–9893, 2021. currently a Full Professor with the Department of
[334] T. V. Phan, S. Sultana, T. G. Nguyen, and T. Bauschert, Informatics, University of Oslo, Oslo, Norway. His
research interests include next-generation wireless
“Qq - transfer: A novel framework for efficient deep networks leading to 5G beyond/6G, green and secure
transfer learning in networking,” in 2020 Int. Conf. Artif. cyber-physical systems (e.g., smart grid and trans-
Intell. Inf. Commun. (ICAIIC), 2020, pp. 146–151. port). Dr. Zhang is an Editor for several IEEE pub-
lications, including the IEEE COMMUNICATIONS
[335] S. J. Pan and Q. Yang, “A survey on transfer learning,” MAGAZINE, IEEE NETWORK, IEEE TRANSAC-
IEEE Trans Knowl Data Eng, vol. 22, no. 10, pp. 1345– TIONS ON VEHICULAR TECHNOLOGY, IEEE TRANSACTIONS ON IN-
1359, 2010. DUSTRIAL INFORMATICS, IEEE COMMUNICATIONS SURVEYS AND
TUTORIALS, and IEEE INTERNET OF THINGS JOURNAL. He is a Chair
[336] K. I-Kai Wang, X. Zhou, W. Liang, Z. Yan, and in a number of conferences, including the IEEE GLOBECOM and IEEE
J. She, “Federated transfer learning based cross-domain PIMRC. He is an IEEE Vehicular Technology Society Distinguished Lecturer.
prediction for smart manufacturing,” IEEE Trans Ind He is a Fellow of IET. He is the Chair of IEEE Communications Society
Technical Committee on Green Communications and Computing. Prof. Zhang
Informat, pp. 1–1, 2021. was a recipient of the Highly Cited Researcher Award (Web of Science top
[337] Y. Pathak, P. Shukla, A. Tiwari, S. Stalin, S. Singh, and 1% most cited) by Clarivate Analytics.
P. Shukla, “Deep transfer learning based classification
model for covid-19 disease,” IRBM, 2020.
[338] S. Shao, S. McAleer, R. Yan, and P. Baldi, “Highly
accurate machine fault diagnosis using deep transfer
learning,” IEEE Trans Ind Informat, vol. 15, no. 4, pp.
2446–2455, 2019.
[339] R. Dong, C. She, W. Hardjawana, Y. Li, and B. Vucetic,
“Deep learning for radio resource allocation with di-
verse quality-of-service requirements in 5g,” IEEE
Trans Wireless Commun, vol. 20, no. 4, pp. 2309–2324,
2021.
Hamza Djigal received the BSc degree in mathe-

matics from Cheikh Anta Diop University, Dakar,
Senegal, in 2009, the MSc degree in software en-
gineering from Central China Normal University,
Wuhan, China, in 2014, and the PhD degree in
computer science and technology from Hohai Uni-
versity, Nanjing, China, in 2020. He is currently a
postdoctoral researcher with the School of Computer
Science, Nanjing University of Posts and Telecom-
munications, China. His main research interests in-
clude parallel and distributed computing, MEC, deep
learning, and resource allocation in heterogeneous MEC networks.
Jia Xu (SM’21) received the M.S. degree in School

of Information and Engineering from Yangzhou Uni-
versity, Jiangsu, China, in 2006 and the PhD. degree
in School of Computer Science and Engineering
from Nanjing University of Science and Technology,
Jiangsu, China, in 2010. He is currently a professor
in the School of Computer Science at Nanjing Uni-
versity of Posts and Telecommunications, China. He
was a visiting Scholar in the Department of Electri-
cal Engineering & Computer Science at Colorado
School of Mines from Nov. 2014 to May. 2015. His
main research interests include crowdsourcing, edge computing and wireless
sensor networks.
View publication stats

Machine and Deep Learning For Resource Allocation in Multi-Access Edge Computing - A Survey

Uploaded by

Copyright:

Available Formats

Machine and Deep Learning For Resource Allocation in Multi-Access Edge Computing - A Survey

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine and Deep Learning For Resource Allocation in Multi-Access Edge Computing - A Survey

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Machine and Deep Learning for Resource Allocation in Multi-Access Edge

Article in IEEE Communications Surveys & Tutorials · August 2022

Hamza Djigal Jia Xu

SEE PROFILE SEE PROFILE

BlockChain View project

Incentive mechanism design for mobile crowdsensing View project

The user has requested enhancement of the downloaded file.

Machine and Deep Learning for Resource

W ITH the recent progress in information and mobile

predict unknown QoS or QoE requirements by learning from

Learning historical data.

sions. In the intelligent MEC layer, an ML algorithm selects

Task queue Learning

Device prioritization and assignment). Generally, the DL training

process is done in the intelligent cloud layer because it requires

A. Existing Surveys on Resource Allocation

III. ML/DL ENABLED MEC: IV. ENABLING TECHNOLOGIES

B. Machine Learning B. ML/DL-Enabled MEC: B. ML/DL Tasks on Edge

C. ML/DL Tasks across

V. ML AND DL FOR RESOURCE VII. ML/DL-Based VIII. ML/DL-Based

B. Machine Learning for B. Minimization of Energy B. Minimization of Energy B. Minimization of

Privacy, Execution Delay, E. Lessons Learned From

IX. Challenges and Future Research Directions

A. Trade-off Between D. Integration of E. Resource Allocation

Datasets and Models Caching for Resource Wireless Channel

H. Resource Allocation Inference on J. Federated Deep

More Computing Allocation on Hybrid

Fig. 2: Structure of the survey.

TABLE III: Acronyms Used in the Paper

Input Input Output

Convolution Pooling Fully-Connected

Fig. 4: Popular Neural Networks

Transaction Task Result

concensus acquisition device. In this scenario, the scheduling length (makespan) is

EDGE DEVICE MEC SERVERS

Task priority Makespan = 60

(a) Full task offloading and scheduling

EDGE DEVICE MEC SERVERS

Mobile T1 are processed by edge server S2

Task priority Makespan = 45

(b) Partial task offloading and scheduling

Fig. 7: Use case of task offloading and scheduling in MEC

recognition detection Resource

Fig. 8: ML/DL-Enabled MEC: Use Cases from three Perspectives

5G Network A. ML/DL Tasks on-Edge-Devices

Data DNN Model

(b) Pruning 2) Hardware Designing: Although model compression is a

(label_1, score), ...

Fig. 13: Semantic cache to perform AI inference in edge [198]

enable model inference in the edge network. In [198], the

1. Receive Partitioned DNN

2. Perform rest of DNN tasks

3. Return result

offloading algorithm for intelligent IoT systems in cloud-edge Policy aggregation

environments. The main objective is to minimize the overall

model compression, which reduces the DNN model inference

for a specific DNN model as explained in the previous section.

TABLE V: Summary of Works focused on Enabling ML/DL Tasks in MEC