Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
When Large Language Models Meet Optical Networks: Paving the Way for Automation
Previous Article in Journal
Mitigation of Low Harmonic Ripples Based on the Three-Phase Dual Active Bridge Converter in Charging Station Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dependent Task Offloading and Resource Allocation via Deep Reinforcement Learning for Extended Reality in Mobile Edge Networks

by
Xiaofan Yu
,
Siyuan Zhou
* and
Baoxiang Wei
School of Information Science and Engineering, Hohai University, Nanjing 211100, China
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(13), 2528; https://doi.org/10.3390/electronics13132528
Submission received: 21 May 2024 / Revised: 21 June 2024 / Accepted: 24 June 2024 / Published: 27 June 2024

Abstract

:
Extended reality (XR) is an immersive technology widely applied in various fields. Due to the real-time interaction required between users and virtual environments, XR applications are highly sensitive to latency. Furthermore, handling computationally intensive tasks on wireless XR devices leads to energy consumption, which is a critical performance constraint for XR applications. It has been noted that the XR task can be decoupled to several subtasks with mixed serial–parallel relationships. Furthermore, the evaluation of XR application performance involves both subjective assessments from users and objective evaluations, such as of energy consumption. Therefore, in edge computing environments, ways to integrate task offloading for XR subtasks to meet users’ demands for XR applications is a complex and challenging issue. To address this issue, this paper constructs a wireless XR system based on mobile edge computing (MEC) and conducts research on the joint optimization of multi-user communication channel access and task offloading. Specifically, we consider the migration of partitioned XR tasks to MEC servers and formulate a joint optimization problem for communication channel access and task offloading. The objective is to maximize the ratio of quality of experience (QoE) to energy consumption while meeting the user QoE requirements. Subsequently, we introduce a deep reinforcement learning-based algorithm to address this optimization problem. The simulation results demonstrate the effectiveness of this algorithm in meeting user QoE demands and improving energy conversion efficiency, regardless of the XR task partitioning strategies employed.

1. Introduction

Extended reality (XR) is an advanced immersive technology in human–computer interaction, encompassing augmented reality (AR), virtual reality (VR), mixed reality (MR), and other emerging immersive technologies. AR overlays digital objects onto the real world, VR fully immerses users in a virtual environment, and MR combines elements of both, offering varying degrees of immersion and interactivity [1]. XR applications demand extremely low latency. Typically, XR applications are based on 360° panoramic videos, which require a motion-to-photons latency of less than 20 ms [2]. Furthermore, as XR applications often involve computationally intensive tasks processed on mobile devices, device power consumption becomes a critical factor limiting XR application performance. Multi-access edge computing (MEC) provides computational capabilities at the network edge near the user, within the radio access network (RAN), enabling lower latency and reduced backhaul network traffic [3]. Ways to fully utilize the computational resources of MEC servers and user equipment (UE) in a highly dynamic edge network to provide stable and reliable services for XR applications is a highly challenging problem.
Recently, there have been multiple initiatives exploring the use of MEC for task offloading in XR applications. A common approach is to employ binary offloading, where the entire XR task is offloaded to the MEC server as a unit [4,5,6,7]. Optimizing the joint rendering offloading and downlink power control in bandwidth-rich terahertz communication environments to minimize the long-term energy consumption is the focus of [4]. The prediction of the field of view of VR users and the migration of rendering tasks to MEC servers are employed to maximize the long-term quality of experience (QoE) [6]. Another approach involves partitioning XR tasks into two arbitrary portions for partial offloading [8,9]. A data-sharing model for AR tasks is established in [9], aiming to optimize both task-partitioning strategy and resource allocation, ultimately improving the quality of experience for AR device users. In order to achieve effective computation offloading strategies, a binary-based approach utilizing convolutional neural networks trained through experiential learning has been implemented to generate optimal offloading actions while maintaining the best computational efficiency [10].
Offloading XR tasks as a unit in a binary fashion is a straightforward approach, but it may not fully utilize the computational resources available in the edge network. Partial offloading schemes, on the other hand, enable the simultaneous computation of XR tasks on both MEC servers and user equipment, but the arbitrary partitioning of XR tasks can be challenging to implement effectively in practice. According to [11,12], XR tasks are divided into five subtasks based on functional components, and then offloaded at the granularity of subtasks to increase the computational resource utilization efficiency and reduce user energy consumption. However, the former does not consider dependencies among subtasks, and the latter designs a task-offloading strategy only for the requirements of serial execution subtasks, without exploring the possibility of parallel execution. Dependence tasks have higher uninstallation dynamics [13], focusing on changes in inter-task dependency constraints to improve the quality of service of task graphs. However, they do not consider changes in the computing capabilities of edge computing devices. In dynamic MEC systems, Markov decision processes are applied to task graph scheduling. Agents observe changes in the MEC system and intelligently make decisions regarding the scheduling of dependent tasks. They provide computing services for application task execution in a first-come, first-served manner, thereby enhancing user experience [14]. However, they specify that the computation results of each task must be transmitted back to the user, i.e., there are no data transferred between tasks. In mobile user scenarios, users can adaptively offload computing tasks with dependency constraints to MEC or the cloud to enhance user experience. In this context, the problem of minimizing task completion time has been proven to be non-deterministic polynomial (NP)-hard [15]. When scheduling correlated tasks using priority awareness, tasks are categorized based on whether they have deadlines. The offloading process is modeled using a directed acyclic graph (DAG), incorporating the impact of user mobility on task-offloading decisions to select the optimal edge processor, thereby reducing completion times and enhancing task satisfaction [16]. However, these articles only evaluate superiority or inferiority based on objective data or performance metrics, ignoring the subjective user. For XR subtasks, it is only required to transmit the final computation results, thus avoiding unnecessary communication overhead. Therefore, when designing XR task-offloading methods, it is necessary to consider both the mixed serial–parallel relationships and the content transfer in subtasks. In previous works, system performance metrics have primarily focused on latency and energy consumption. However, in the case of XR applications, which are known for their immersive nature, the subjective user experience should be considered the primary evaluation metric. QoE, as an indicator of video application performance, can capture user satisfaction with video services and has been shown to be a superior metric for optimizing network resource allocation [17].
In this paper, we focus on the collaborative utilization of computational resources in an edge network to enhance XR application computations. Given the limited computational capabilities of XR devices, offloading subtasks to MEC servers through an appropriate strategy can improve the QoE. Wireless communication is employed between terminals and MEC servers to facilitate this process. To address this requirement, we model the XR task as a DAG to represent the dependencies among subtasks. We propose a joint optimization problem that considers both communication channel access and task offloading to maximize energy conversion efficiency, which is defined as the ratio of QoE to energy consumption. Since XR task partitioning involves a complex joint optimization problem, we model it as a Markov decision process (MDP). In a real edge computing environment, the system is continuously non-discrete. When tasks are assigned to processors, in order to ensure the effective processing of user requests, the establishment of Markov decision processes is achieved by dividing the continuous-time state transitions into multiple continuous slots [18,19]. This method can bridge the gap between continuous-time models and discrete-time models. To solve this problem, we introduce a channel access strategy and a task offloading strategy based on deep reinforcement learning (DRL), specifically leveraging the multi-agent deep deterministic policy gradient (MADDPG) method [20]. This approach significantly enhances user experience quality while improving energy conversion efficiency for user equipment. The remainder of the paper is organized as follows. Section 2 constructs the edge computing network model and describes the XR task model. Section Section 3 formalizes the optimization problem. Section Section 4 presents the proposed algorithm for solving the problem. Section Section 5 showcases simulation results and provides evaluations. Finally, in Section Section 6, we summarize the paper.

2. System Model

We consider a wireless edge computing network model serving XR applications, as illustrated in Figure 1. The system comprises a base station equipped with a MEC server and U XR terminal devices. The MEC server deploys a computational model to provide computation capabilities to the XR terminal devices within its coverage area. We denote the set of XR devices using U = { 1 , 2 , , U } . Next, we will provide an explanation from three perspectives: XR task model, communication model, and task offloading model.

2.1. XR Task Model

We assume that each XR device processes only one XR task at a time, and the XR task can be portioned into N subtasks, denoted by N = { 1 , 2 , , N } . We employ DAG to represent the dependencies among these subtasks, where vertices represent subtasks, and arcs represent dependencies between the input and output of subtasks. The subtask located at the arc head requires the computation results of the subtask located at the arc tail as input. Therefore, a subtask can only be computed after its predecessor subtasks (i.e., located at the arc tail) have been executed. Figure 2 illustrates three different DAGs resulting from various task partitioning methods.
We define a tuple for the n th subtask of the u th XR user, which is denoted by ϕ u , n = ( ω u , n , η u , n , φ u , n , Γ u , n ) . Specifically, ω u , n represents the input data size, η u , n represents the computational complexity, i.e., the number of Central Processing Unit (CPU) cycles required to process each bit of data, φ u , n represents the ratio of output data size to input data size, and Γ u , n { 0 , 1 , 2 } indicates the type of the subtask. Γ u , n = 0 signifies that the subtask is an entry subtask, which is not dependent on other subtasks. Γ u , n = 1 signifies an intermediate subtask, which depends on other subtasks and is also depended upon by other subtasks. Γ u , n = 2 signifies an exit subtask, which is dependent on other subtasks. In our task model, it is assumed that there is only one entry subtask, while there can be multiple exit subtasks. The entry subtask is typically responsible for receiving the entire input data for the task and portioning the whole task, which means it can only be executed at the user’s terminal. All the computational results from the exit subtasks need to be transmitted to the user’s terminal, forming the computational result of the XR task.
Based on the analysis above, we can conclude that the input data size for the n th subtask is equals to the sum of the output data sizes of all the subtasks on which it depends, which is expressed as follows:
ω u , n = m p r e ( n ) φ u , m ω u , m
where p r e ( n ) represents the set of predecessor subtasks on which the n th subtask depends. Therefore, as long as we are aware of the input data size for the entry subtask, we can deduce the input data sizes of all subtasks.

2.2. Communication Model

We model the communication scenario as follows. There is a wireless network connection between the base station and the XR devices. We assume the use of orthogonal frequency division multiplexing (OFDM) for multiple access, where users share K orthogonal subchannels, and the set of subchannels is denoted by K = { 1 , 2 , , K } . During the user’s channel access process, the binary channel selection vector for the u th user is expressed by [21], as follows:
c u = ( c u 1 , , c u k , , c u K ) , k K , u U
where c u k 0 , 1 , c u k = 1 indicates that the user is accessing the k th channel, and c u k = 0 indicates not accessing the k th channel. This assumes that each user can choose only one channel at the same time, i.e., k = 1 K c u k 1 .
When a user sends messages to the base station, it may encounter interference from other users, which can be expressed as follows:
I u , k ul = i = 1 , i u U c i k h i , k P i Tx
where h i , k represents the Rayleigh channel gain when i th user is using the k th channel, and P u Tx represents the transmit power of the user. The signal-to-interference-plus-noise ratio (SINR) for the uplink and the signal-to-noise ratio (SNR) for the downlink can be represented as follows:
γ u , k ul = c u k h u , k P u Tx I u , k ul + B ul n 0
γ u , k dl = c u k h u , k P 0 B dl n 0
where P 0 represents the transmit power of the base station, B ul and B dl respectively represent the uplink and downlink channel bandwidths, and n 0 represents the noise power spectral density. Additionally, using ϵ to represent the threshold of bit error rate (BER), the uplink and downlink data transmission rates for the u th user can be calculated as follows [22]
r u ul = k = 1 K B ul log 2 ( 1 1.5 γ u , k ul ln ( 5 ϵ ) )
r u dl = k = 1 K B dl log 2 ( 1 1.5 γ u , k dl ln ( 5 ϵ ) )
Therefore, the delay for uploading the computation result of the n th subtask for the u th XR user is equal to the size of the computation result divided by the uplink transmission rate, which is expressed as follows:
t u , n ul = φ u , n ω u , n r u ul
The transmission energy consumption is equal to the transmit power multiplied by the transmission time, expressed as follows:
e u , n ul = P u Tx t u , n ul = P u Tx φ u , n ω u , n r u ul
Correspondingly, the delay and receive energy consumption for downloading the computation result are respectively expressed as follows:
t u , n dl = φ u , n ω u , n r u dl
e u , n dl = P u Rx t u , n dl = P u Rx φ u , n ω u , n r u dl
where P u Rx represents the receiving power of the user.

2.3. Task Offloading Model

In this paper, we investigate the collaborative execution of tasks between XR user terminals and the MEC server. We use x u , n { 0 , 1 } to denote the offloading strategy for the subtask, where x u , n = 0 indicates that the subtask is executed on the user terminal, and x u , n = 1 indicates it is executed on the MEC server. We specified earlier that the entry subtask can only be executed on user equipment, so that x u , 1 0 .
(1) Local Execution: Let F u represent the computational resources of the u th user. When the n th subtask of the u th user is executed on the user terminal, the required computation time is expressed as follows:
t u , n UE = ω u , n η u , n F u
Before the n th subtask begins execution, all its predecessor subtasks must have already been completed, and their computation results must have been returned to the user terminal. Therefore, the readiness time for n th subtask is expressed as follows:
RT u , n UE = max m p r e ( n ) FT u , m UE , max FT u , m MEC , M u , m dl + t u , m dl
where FT u , m UE and FT u , m MEC respectively represent the finish times of subtasks executed on the user terminal and MEC server, and M u , m dl represents the earliest available time for downlink communication between the u th user and the base station. In other words, if the downlink is currently transmitting data for other subtasks, it is necessary to wait until the transmission of those subtasks is completed before transmitting the data for the current subtask. Therefore, when the n th subtask of the u th user is executed on the user terminal, the corresponding finish time FT u , n UE and terminal energy consumption E u , n UE [23] are respectively expressed as follows:
FT u , n UE = max RT u , n UE , M u , n UE + t u , n UE
E u , n UE = 1 2 κ ( F u ) 3 ω u , n η u , n F u = 1 2 κ ( F u ) 2 ω u , n η u , n
where M u , n UE represents the earliest available time for the user processor, meaning that, if there are other subtasks currently executing, it is necessary to wait for their completion before processing the current subtask. κ is an energy factor that is contingent upon the CPU chip architecture.
(2) MEC Server Execution: Using F 0 to represent the total computational resources of the MEC server, the computational resources that users accessing the server can obtain are expressed as follows:
f u = F 0 U
When the n th subtask of the u th user is executed on the MEC server, the required computation time is expressed as follows:
t u , n MEC = ω u , n η u , n f u
The readiness time for the subtask is expressed as follows:
RT u , n MEC = max m p r e ( n ) max FT u , m UE , M u , m ul + t u , m ul , FT u , m MEC
where M u , m ul represents the earliest available time for uplink communication. Therefore, when the n th subtask is executed on the MEC server, the corresponding finish time FT u , n MEC is expressed as follows:
FT u , n MEC = max RT u , n MEC , M u , n MEC + t u , n MEC
where M u , n MEC represents the earliest available time for computational resources that the MEC server allocates to the user. In the paper, we do not consider the energy consumption associated with executing XR tasks on MEC servers, as MEC servers are typically powered directly by alternating current and have sufficient energy resources to handle offloaded tasks.

3. Problem Formulation

Our goal is to simultaneously optimize the channel access strategy and task offloading strategy, aiming to enhance the energy conversion efficiency of user equipment while ensuring the satisfaction of user QoE requirements. Based on the analysis in Section 2, the total finish time and total energy consumption for u th user to complete the task are expressed as follows:
FT u = max n N , Γ u , n = 2 FT u , n UE , max FT u , n MEC , M u , n dl + t u , n dl
E u = n N , x u , n = 0 E u , n UE + n N , x u , n = 0 I m s u c ( n ) x u , m 0 P u Tx t u , n ul + n N , x u , n = 1 I m s u c ( n ) x u , m = 0 P u Rx t u , n dl + n N , Γ u , n = 2 , x u , n = 1 P u Rx t u , n dl
where I ( ) is an indicator function and s u c ( n ) represents the set of successor subtasks that depend on the n th subtask. The finish time for XR task depends on the exit subtask that returns the computation results to the user equipment at the latest. The total energy consumption is composed of the following parts.
  • The computational energy consumption incurred by the execution of subtasks on user equipment.
  • The transmitting energy consumption when the user transmits the computation results to the base station.
  • The receiving energy consumption when the user receives the computation results from the base station.
According to International Telecommunication Union Telecommunication Standardization Sector (ITU-T) Rec. P.1203.1 [24], we use the mean opinion score (MOS) as the measurement standard for QoE. Generally, MOS is divided into five levels: poor, fair, good, very good, and excellent, corresponding to ratings of one to five, respectively. We model the MOS of the u th user as follows [25]:
MOS u = C 1 ln ( FT u ) + C 2 dis
where C 1 and C 2 are constants to ensure that MOS falls within a reasonable range of values, and [ ] dis represents the discrete mapping to the five QoE levels.
Our research objective is to maximize energy conversion efficiency while ensuring a satisfactory QoE for XR users. As in [26,27], we have selected optimization goals that achieve the desired QoE for users, specifically the weighted sum of total task energy consumption and MOS, which can be expressed as follows:
Q u = λ M MOS u - λ E E u s . t MOS u MOS u th
where MOS u th is the minimum QoE threshold set by the u th user. λ M ,   λ E 0 , 1 are scalar weights for MOS and energy consumption, respectively.
Let X u = { x u , 1 , , x u , n , , x u , N } represent the task offloading decisions for the user. To maximize the energy conversion efficiency, the joint optimization problem for communication channel access and task offloading studied in this paper can be formulated as follows:
max c u , X u Q u u U s . t ( C 1 ) : k = 1 K c u k 1 ( C 2 ) : c u k { 0 , 1 } , x u , n { 0 , 1 } ( C 3 ) : MOS u MOS u th
Constraint (C1) indicates that the user can access a maximum of one channel at a time. Constraint (C2) places restrictions on the actions of XR users in selecting channels and offloading tasks. Constraint (C3) represents the minimum MOS threshold set by the user.

4. DRL-Based Joint Optimization

In this section, we initially model the optimization problem as an MDP. Subsequently, we introduce an algorithm that utilizes MADDPG to support XR users in making channel selections and task offloading decisions.

4.1. Background

Due to the combination of channel access and task-offloading strategy, and as it involves a multi-user competitive problem with mixed integers, the optimization problem (24) is non-convex and NP-hard. Moreover, it is a challenge to make joint optimization decisions in rapidly changing and highly dynamic wireless networks. In recent years, DRL has seen rapid development. It is an emerging artificial intelligence technology that has been employed to address a wide range of decision and computational problems. Additionally, it is particularly recognized as an effective tool for tackling non-convex and complex optimization problems in highly dynamic wireless environments. Given the sensitivity of XR applications to latency, it is crucial for user devices to make prompt decisions. MADDPG adopts the architecture of centralized training and decentralized execution to address this requirement. After the training is completed, agents can make decisions independently, relying on their own observations. Additionally, MADDPG enables each agent to learn its action policy cooperatively, without requiring global knowledge or a central controller. This allows agents to dynamically adjust their behavior in response to changes in the environment, better adapting to complex environments and improving the ability to model potential nonlinear relationships during the task offloading process to achieve global optimality. Therefore, we use the MADDPG algorithm to train the agents.
The MDP is widely used for modeling reinforcement learning problems. It is typically represented by a tuple S , A , P , R , γ . S denotes the set of states, A denotes the set of actions, P denotes the set of state transition probabilities, R denotes the reward, and γ is the reward discount factor. We model the joint optimization problem as the following MDP model.
(1) State Space: In our system, we assume that each XR device is running an XR application. The input data size of the u th user is ω u , 1 , and the local computational resources available are represented by F u . The XR device acquires the channel gain H u = h u , k | k K between itself and the base station using sensing technology. Additionally, the base station needs to broadcast the computational resources F 0 of the MEC server to all users. Therefore, the observation of u th user can be represented as follows:
o u = ω u , 1 , H u , F u , F 0
The state can be represented as the collective observation set of all users, denoted as s = o u | u U .
(2) Action Space: In the edge network, XR devices need to select a channel for communication with the base station and decide whether each subtask should be computed on the local device or on the MEC server. The action for the u th user can be represented as follows:
a u = c u , X u
Therefore, the set of actions can be represented as a = a u | u U .
(3) Reward Function: Our optimization objective is to maximize user energy conversion efficiency, so we define the reward for the n th user as follows:
r u a | s = Q u + X Δ MOS u ε Δ MOS u ζ
Δ MOS u = MOS u th MOS u
where X is the punishment coefficient, controlling the magnitude of the punishment when the agent violates QoE constraint. ε ( ) is a step function. ζ is the reward scaling factor, which can be used to adjust rewards without altering the reward function, facilitating faster learning and improved asymptotic performance [28].

4.2. The Proposed Algorithm

Next, we will describe the specific details of the proposed algorithm. Based on the MADDPG framework, we treat each XR device as an agent. Each agent has two types of networks: policy network and value network. The policy network determines the next action based on the agent’s own observation, and the value network evaluates the goodness of the current action based on the global state and the actions of all agents. Additionally, target networks are used to mitigate the problem of the overestimation of Q-values during the training process. Using μ u and μ u to represent the policy network and target policy network, their parameters are θ u and θ u , respectively. Using q u and q u to represent the value network and target value network, their parameters are w u and w u , respectively.
MADDPG is an off-policy algorithm, so we can utilize experience replay to reuse past experiences. We denote the replay buffer as Φ , which stores data like S t , A t , S t + 1 , R t , representing the current system state, actions of all agents, the next system state, and rewards. Additionally, we use an ϵ -greedy policy to control agents’ interactions with the environment, facilitating exploration.
(1) Centralized Training: During training, we deploy policy networks to the MEC server. At this point, XR devices cannot make decisions independently and must follow the instructions sent by the base station. Training begins when a sufficient amount of data has been collected in the replay pool. In each training iteration, a small batch of data is randomly sampled from Φ to train the neural networks. The target value network provides the expected Q-value y t , u , based on the action generated by the target policy network, as follows:
y t , u = r t , u a t | s t + γ q u s t + 1 , μ u o t + 1 , u | θ u | w u
where γ is the reward discount factor. The value network updates its parameters w u through gradient descent using a mean squared error loss function, as follows:
L u w u = 1 M i = 1 M y t , u q u s t , a t , u | w u 2
where M is the size of a small batch. With the assistance of the value network, the policy network updates its parameters θ u using gradient ascent, as follows:
θ u J 1 M i = 1 M θ u μ u o t , u | θ u a t , u q u s t , a t | w u
Finally, the parameters of the target policy network and target value network are softly updated, as follows:
θ u = τ θ u + 1 τ θ u
w u = τ w u + 1 τ w u
where τ 0 , 1 is the soft update rate.
(2) Decentralized Execution: After training is completed, the value network is no longer needed. Then, we deploy the policy network to the corresponding XR devices. At this point, the u th user can make decisions independently for channel access and task offloading based on its own observations, achieving real-time decision-making.
Based on the MDP and analysis described above, the training process for the proposed algorithm is shown in Algorithm 1. The MADDPG algorithm utilizes actor–critic (AC) networks with deep neural networks composed of input layers, output layers, and hidden layers. The actor network takes as input a state space composed of input data size, local available resources, channel gains, and MEC available resources, producing channel selection and task-offloading decisions as outputs. The critic network takes the concatenation of state and action spaces as inputs and outputs Q-values. During each training step, MADDPG samples and stores experiences from T agents, then batches M experiences for training each agent.
Algorithm 1. Training Process of Our Proposed Model
1Initialize the experience replay buffer Φ ;
2Initialize the edge network environment with U agents;
3for episode = 1, 2, … do
4for  t = 1 , 2 , , T  do
5  for each UE u U  do
6   Observe an observation o t , u from the environment;
7   Select an action a t , u based on ϵ -greedy and policy network;
8  end for
9  Execute action A t = a t , 1 , , a t , U ;
10  Observe reward R t = r t , 1 , , r t , U and next state S t + 1 ;
11  Store S t , A t , S t + 1 , R t in buffer Φ ;
12  if buffer is ready and time to learn then
13   Randomly sample a mini-batch of M samples from Φ ;
14   for each UE u U do
15    Update the value network by minimizing the loss function in (30);
16    Update the policy network using gradient ascent with (31);
17    Update target networks parameters with (32) and (33);
18   end for
19  end if
20end for
21end for
The primary factor influencing time complexity is the dimensionality of the network structure. The computational complexity required to train each actor with a segment of experiences is O a = O R H + H 2 + H A , where R represents the dimensionality of the state space, H represents the number of neurons in the hidden layers, and A represents the dimensionality of the action space. Similarly, the time complexity of the critic network can be represented as O c = O R + H + H 2 + H . Given that the target AC networks share the same network structure as the AC networks, the algorithmic complexity for a single agent can be expressed as O s = O 2 R H + H 2 + H A + 2 R + A H + H 2 + H . Ultimately, the overall time complexity of the algorithm is determined to be O = O 2 T M R H + H 2 + H A + R + A H + H 2 + H [29].

5. Simulation Results and Analysis

In this section, we initially present the experimental setup and specific simulation parameter configurations, followed by the presentation of simulation results and the subsequent analysis.

5.1. Simulation Settings

We consider a cellular cell with four XR users, where a MEC server operates on six orthogonal channels deployed by the base station, and N user devices are randomly generated within the coverage area of the base station. Each XR user runs an XR application with a data rate of 30 Mbps at 60 frames-per-second. We assume that processing one frame constitutes a XR task, and the size of XR video frames ω u , 1 is modeled as truncated Gaussian distributions, following the guidelines outlined in the third Generation Partnership Project Technical Report (3GPP TR) 38.838 [30]. Unless otherwise specified, we perform XR task splitting according to the topology of Task 2 in Figure 2. Other parameters in the simulation scenario are presented in Table 1.
The proposed algorithm utilizes a neural network with three hidden layers employing Rectified Linear Unit (ReLU) activation functions, featuring 64, 128, and 64 neurons in the respective layers. The policy and value networks employ learning rates of 0.01 and 0.0008, respectively, updated using the Adam gradient optimization method. Target network parameters are updated softly at the rate of 0.01. The ϵ of ϵ -greedy policy is 0.9. The reward discount factor is set to 0.99. The capacity of the replay buffer is 2048, and it samples 64 data points in a batch. The training process spans 2000 episodes, each comprising 20 steps.

5.2. Performance of Our Proposed Algorithm

In the simulation, we compared our proposed algorithm with the following schemes or algorithms.
(1) Local: All XR tasks are executed on the local devices.
(2) Random Access and Execute in MEC (RA-MEC): XR users randomly select a channel to access the base station, after which all subtasks are offloaded to the MEC server for execution.
(3) Strategy Access and Execute in MEC (SA-MEC): In the SA-MEC scheme, our proposed algorithm’s access strategy is used to minimize interference, followed by the offloading of all subtasks to the MEC server for execution.
(4) Independent Q-Learning algorithm (IQL): This scheme employs the independent Q-learning algorithm [31] to train channel access and task offloading strategies. Independent Q-learning is a decentralized multi-agent reinforcement learning algorithm where each agent learns a policy based solely on its own actions and observations, treating other agents as part of the environment.
(5) Double Deep Q-Network (DDQN): The approach utilizes DDQN [32] to train channel access and task offloading strategies. This algorithm is an enhanced version of the deep Q-network (DQN) algorithm. By alternating training between two neural networks—one for selecting optimal actions and the other for evaluating the Q-values of these optimal actions—the method achieves individual agent optimization.
The trends in rewards and losses of our proposed algorithm are shown in Figure 3. Due to the non-stationary nature of the environment, rewards exhibit significant fluctuations. To better observe the convergence trend, we have smoothed the rewards, as indicated by the red line. From the rewards and losses, it can be observed that the channel access and task-offloading strategies continuously improve with increasing training epochs, and that the algorithm tends to converge after approximately 700 iterations.
The performance comparisons are shown in Figure 4 and Figure 5 in terms of reward and energy consumption. From Figure 4, regardless of whether the MEC server has limited or abundant computational resources, the MADDPG algorithm consistently achieves a higher reward. When there are sufficient computing resources in the MEC server and under favorable channel conditions, offloading all subtasks to the MEC server is evidently the optimal solution. However, due to the SA-MEC scheme only optimizing the channel access strategy without considering the impact of channel access on task offloading, its performance consistently remains lower than that of the proposed algorithm. The RA-MEC scheme employing random access strategy may suffer from poor performance due to the significant interference that XR devices encounter during communication with base stations, leading to increased task-processing costs. As for the IQL algorithm, agents ignore the observations and actions of other agents, so the state transition probabilities are non-stationary, resulting in a less effective algorithm than our proposed algorithm. The DDQN algorithm builds upon the former by adding a separate Q-value network to estimate the optimal action, reducing bias in action-value function estimation and improving the accuracy of optimal action selection to some extent. Therefore, this algorithm tends to outperform IQL, but due to the absence of inter-agent cooperation, it falls slightly short compared to MADDPG.
From Figure 5, as the computing resources of MEC servers continue to increase, algorithms based on MADDPG can offload more subtasks to MEC servers for execution, thereby continuously reducing the energy consumption of user devices. For users located farther from the base station, our proposed algorithm tends to execute subtasks locally. As a result, the energy consumption of our proposed algorithm is higher than the schemes that offload all subtasks. In the RA-MEC scheme, XR devices do not perform subtasks, resulting in zero computational energy consumption. However, due to communication interference, the transmission energy consumption is significantly increased, resulting in an overall high energy consumption.
For the different task partitioning approaches shown in Figure 2, when the computing resources of the MEC server are set at 12 GHz, the relative rewards achieved by each scheme are illustrated in Figure 6. When the dependencies between subtasks are relatively simple, there is little difference in the rewards achieved by various schemes. The algorithms proposed achieve higher rewards when splitting tasks according to task 2 or task 3. This is because, in task 1, all subtasks must be executed sequentially, resulting in only one computing node working at any given time on the MEC server and user devices. In contrast, task 2 and 3 allow for the parallel execution of subtasks, thus offering a higher upper limit on application performance. In task 3, due to the increased number of subtasks and more complex dependencies, it becomes more challenging to formulate channel access and task-offloading strategies. Therefore, compared to task 2, varying degrees of performance decline are exhibited. As tasks become increasingly complex, the rewards obtained by the MADDPG algorithm proposed in this paper consistently outperform those of other approaches.
In summary, our proposed algorithm effectively meets users’ MOS requirements while reducing XR device energy consumption and improving energy conversion efficiency. It demonstrates strong adaptability, as it consistently delivers favorable outcomes across varying MEC server computing resource levels. Additionally, our proposed algorithm demonstrates notable proficiency in handling various XR task-portioning methods, particularly for subtasks that involve complex dependencies.

6. Conclusions

In the paper, we focus on the operation of XR devices in edge networks, where tasks are offloaded to MEC servers using wireless communication. Our objective is to enhance service quality and improve user experiences. Within our system architecture, we conduct joint optimization of communication channel access and task offloading, aiming to maximize energy conversion efficiency while ensuring that user QoE requirements are met. To address this joint optimization problem, we have introduced an algorithm based on MADDPG. This algorithm has been designed to optimize the decision-making process for channel selection and task offloading. The simulation results have demonstrated the excellent performance of our algorithm across different XR task portioning methods. It effectively meets the user’s QoE demands and enhances energy conversion efficiency, thereby validating the effectiveness of our proposed approach.
This paper only investigates the problem of task offloading for multiple XR users under a single base station. When the MEC servers deployed locally at the base station are heavily loaded, offloading tasks to idle MEC servers at nearby base stations is another strategy. However, this inevitably increases the task completion latency. Therefore, in the future, we will explore the collaborative completion of computing tasks by multiple XR users and multiple MEC servers to achieve better XR application performance.

Author Contributions

Conceptualization, X.Y., S.Z. and B.W.; methodology, X.Y., S.Z. and B.W.; validation, X.Y., S.Z. and B.W.; formal analysis, X.Y., S.Z. and B.W.; investigation, X.Y., S.Z. and B.W.; writing—original draft preparation, X.Y., S.Z. and B.W.; writing—review and editing, X.Y., S.Z. and B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (U2340221), and the Fundamental Research Funds for the Central Universities (grant number B230201057).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Borhani, Z.; Sharma, P.; Ortega, F.R. Survey of Annotations in Extended Reality Systems. IEEE Trans. Vis. Comput. Graph. 2023, 1–20. [Google Scholar] [CrossRef] [PubMed]
  2. Dai, J.; Zhang, Z.; Mao, S.; Liu, D. A View Synthesis-based 360°VR Caching System over MEC-enabled C-RAN. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 3843–3855. [Google Scholar] [CrossRef]
  3. Trinh, B.; Muntean, G.-M. A Deep Reinforcement Learning-based Resource Management Scheme for SDN-MEC-supported XR Applications. In Proceedings of the 2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 8–11 January 2022; pp. 790–795. [Google Scholar]
  4. Du, J.; Yu, F.R.; Lu, G.; Wang, J.; Jiang, J.; Chu, X. MEC-Assisted Immersive VR Video Streaming Over Terahertz Wireless Networks: A Deep Reinforcement Learning Approach. IEEE Internet Things J. 2020, 7, 9517–9529. [Google Scholar] [CrossRef]
  5. Luo, J.; Liu, B.; Gao, H.; Su, X. Distributed Deep Reinforcement Learning Based Mode Selection and Resource Allocation for VR Transmission in Edge Networks. In Proceedings of the International Conference on Communications and Networking in China (ChinaCom), Virtual Event, 21–22 November 2021; pp. 153–167. [Google Scholar]
  6. Liu, X.; Deng, Y. Learning-Based Prediction, Rendering and Association Optimization for MEC-Enabled Wireless Virtual Reality (VR) Networks. IEEE Trans. Wirel. Commun. 2021, 20, 6356–6370. [Google Scholar] [CrossRef]
  7. Chen, M.; Liu, W.; Wang, T.; Liu, A.; Zeng, Z. Edge intelligence computing for mobile augmented reality with deep reinforcement learning approach. Comput. Netw. 2021, 195, 108186. [Google Scholar] [CrossRef]
  8. Goh, Y.; Choi, M.; Jung, J.; Chung, J.M. Partial Offloading MEC Optimization Scheme using Deep Reinforcement Learning for XR Real-Time M&S Devices. In Proceedings of the 2022 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 7–9 January 2022; pp. 1–3. [Google Scholar]
  9. Liu, W.; Ren, J.; Huang, G.; He, Y.; Yu, G. Data offloading and sharing for latency minimization in augmented reality based on mobile-edge computing. In Proceedings of the 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 27–30 August 2018; pp. 1–5. [Google Scholar]
  10. Mustafa, E.; Shuja, J.; Rehman, F.; Riaz, A.; Maray, M.; Bilal, M.; Khan, M.K. Deep Neural Networks meet computation offloading in mobile edge networks: Applications, taxonomy, and open issues. J. Netw. Comput. Appl. (JNCA) 2024, 226, 103886. [Google Scholar] [CrossRef]
  11. Hao, Y.; Chen, M.; Hu, L.; Hossain, M.S.; Ghoneim, A. Energy efficient task caching and offloading for mobile edge computing. IEEE Access 2018, 6, 11365–11373. [Google Scholar] [CrossRef]
  12. Chen, X.; Liu, G. Energy-efficient task offloading and resource allocation via deep reinforcement learning for augmented reality in mobile edge networks. IEEE Internet Things J. 2021, 8, 10843–10856. [Google Scholar] [CrossRef]
  13. Wang, J.; Hu, J.; Min, G.; Zhan, W.; Zomaya, A.Y.; Georgalas, N. Dependent task offloading for edge computing based on deep reinforcement learning. IEEE Trans. Comput. 2021, 71, 2449–2461. [Google Scholar] [CrossRef]
  14. Liu, J.; Mi, Y.; Zhang, X.; Li, X. Task graph offloading via deep reinforcement learning in mobile edge computing. Future Gener. Comput. Syst. Future Gener. Comput. Syst. 2024, 158, 545–555. [Google Scholar] [CrossRef]
  15. Liu, J.; Ren, J.; Zhang, Y.; Peng, X.; Zhang, Y.; Yang, Y. Efficient dependent task offloading for multiple applications in MEC-cloud system. IEEE Trans. Mob. Comput. 2021, 22, 2147–2162. [Google Scholar] [CrossRef]
  16. Maray, M.; Mustafa, E.; Shuja, J.; Bilal, M. Dependent task offloading with deadline-aware scheduling in mobile edge networks. Internet Things 2023, 23, 100868. [Google Scholar] [CrossRef]
  17. Zhang, S.; He, P.; Suto, K.; Yang, P.; Zhao, L.; Shen, X. Cooperative edge caching in user-centric clustered mobile networks. IEEE Trans. Mob. Comput. 2017, 17, 1791–1805. [Google Scholar] [CrossRef]
  18. Kang, H.; Chang, X.; Mišić, J.; Mišić, V.B.; Fan, J.; Liu, Y. Cooperative UAV resource allocation and task offloading in hierarchical aerial computing systems: A MAPPO based approach. IEEE Internet Things J. 2023, 10, 10497–10509. [Google Scholar] [CrossRef]
  19. Shinde, S.S.; Tarchi, D. A markov decision process solution for energy-saving network selection and computation offloading in vehicular networks. IEEE Trans. Veh. Technol. 2023, 72, 12031–12046. [Google Scholar] [CrossRef]
  20. Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 2017, 30, 6379–6390. [Google Scholar]
  21. Zhao, N.; Liang, Y.C.; Niyato, D.; Pei, Y.; Wu, M.; Jiang, Y. Deep reinforcement learning for user association and resource allocation in heterogeneous cellular networks. IEEE Trans. Wirel. Commun. 2019, 18, 5141–5152. [Google Scholar] [CrossRef]
  22. Chung, S.T.; Goldsmith, A.J. Degrees of freedom in adaptive modulation: A unified view. IEEE Trans. Commun. 2001, 49, 1561–1571. [Google Scholar] [CrossRef]
  23. Datta, A.K.; Patel, R. Cpu scheduling for power/energy management on multicore processors using cache miss and context switch data. IEEE Trans. Parallel Distrib. Syst. 2013, 25, 1190–1199. [Google Scholar] [CrossRef]
  24. ITU-T. P.1203.3: Parametric Bitstream-Based Quality Assessment of Progressive Download and Adaptive Audiovisual Streaming Services over Reliable Transport-Quality Integration Module; International Telecommunication Union: Geneva, Switzerland, 2017. [Google Scholar]
  25. Zhou, Y.; Ma, X.; Hu, S.; Zhou, D.; Cheng, N.; Lu, N. QoE-driven adaptive deployment strategy of multi-UAV networks based on hybrid deep reinforcement learning. IEEE Internet Things J. 2021, 9, 5868–5881. [Google Scholar] [CrossRef]
  26. Yan, J.; Bi, S.; Zhang, Y.J.; Tao, M. Optimal task offloading and resource allocation in mobile-edge computing with inter-user task dependency. IEEE Trans. Wirel. Commun. 2019, 19, 235–250. [Google Scholar] [CrossRef]
  27. Tang, Z.; Lou, J.; Zhang, F.; Jia, W. Dependent task offloading for multiple jobs in edge computing. In Proceedings of the 2020 29th International Conference on Computer Communications and Networks (ICCCN), Honolulu, HI, USA, 3–6 August 2020; pp. 1–9. [Google Scholar]
  28. Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning Research, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
  29. Du, J.; Kong, Z.; Sun, A.; Kang, J.; Niyato, D.; Chu, X.; Yu, F.R. MADDPG-based joint service placement and task offloading in MEC empowered air-ground integrated networks. IEEE Internet Things J. 2023, 11, 10600–10615. [Google Scholar] [CrossRef]
  30. 3GPP TR 38.838, Study on XR (Extended Reality) Evaluations for NR. Rel-17 V1.0.1. 4 November 2021.
  31. Tan, M. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, USA, 27–29 July 1993; pp. 330–337. [Google Scholar]
  32. Wang, Y.; Huang, Z.; Wei, Z.; Zhao, J. MADDPG-Based Offloading Strategy for Timing-Dependent Tasks in Edge Computing. Future Internet 2024, 16, 181. [Google Scholar] [CrossRef]
Figure 1. System model of the extended reality application in mobile edge network.
Figure 1. System model of the extended reality application in mobile edge network.
Electronics 13 02528 g001
Figure 2. Three DAGs under different task partitioning strategies.
Figure 2. Three DAGs under different task partitioning strategies.
Electronics 13 02528 g002
Figure 3. Reward and loss in the training process of the proposed algorithm.
Figure 3. Reward and loss in the training process of the proposed algorithm.
Electronics 13 02528 g003
Figure 4. Performance of rewards.
Figure 4. Performance of rewards.
Electronics 13 02528 g004
Figure 5. Performance of energy.
Figure 5. Performance of energy.
Electronics 13 02528 g005
Figure 6. Performance comparison under different task division modes.
Figure 6. Performance comparison under different task division modes.
Electronics 13 02528 g006
Table 1. Simulation parameters.
Table 1. Simulation parameters.
ParameterValue
Uplink   channel   bandwidth ,   B u l 2.286 MHz
Downlink   channel   bandwidth ,   B d l 7.429 MHz
Noise   power   spectral   density ,   n 0 −174 dBm/Hz
Path loss model127 + 30 log(d)
Threshold   of   BER ,   ϵ 10−4
Sending   power   of   XR   user ,   P u T x 0.5 W
Receiving power of XR user, P u R x 0.5 W
Sending   power   of   base   station ,   P 0 1 W
Computing   resources   of   XR   user ,   F u [2.6, 4.3] GHz
Computing   resources   of   MEC   server ,   F 0 [9, 22] GHz
Energy   consumption   factor   of   XR   user ,   κ 10−27
MOS   threshold   of   XR   user ,   M O S u t h 3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, X.; Zhou, S.; Wei, B. Dependent Task Offloading and Resource Allocation via Deep Reinforcement Learning for Extended Reality in Mobile Edge Networks. Electronics 2024, 13, 2528. https://doi.org/10.3390/electronics13132528

AMA Style

Yu X, Zhou S, Wei B. Dependent Task Offloading and Resource Allocation via Deep Reinforcement Learning for Extended Reality in Mobile Edge Networks. Electronics. 2024; 13(13):2528. https://doi.org/10.3390/electronics13132528

Chicago/Turabian Style

Yu, Xiaofan, Siyuan Zhou, and Baoxiang Wei. 2024. "Dependent Task Offloading and Resource Allocation via Deep Reinforcement Learning for Extended Reality in Mobile Edge Networks" Electronics 13, no. 13: 2528. https://doi.org/10.3390/electronics13132528

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop