1. Introduction
Extended reality (XR) is an advanced immersive technology in human–computer interaction, encompassing augmented reality (AR), virtual reality (VR), mixed reality (MR), and other emerging immersive technologies. AR overlays digital objects onto the real world, VR fully immerses users in a virtual environment, and MR combines elements of both, offering varying degrees of immersion and interactivity [
1]. XR applications demand extremely low latency. Typically, XR applications are based on 360° panoramic videos, which require a motion-to-photons latency of less than 20 ms [
2]. Furthermore, as XR applications often involve computationally intensive tasks processed on mobile devices, device power consumption becomes a critical factor limiting XR application performance. Multi-access edge computing (MEC) provides computational capabilities at the network edge near the user, within the radio access network (RAN), enabling lower latency and reduced backhaul network traffic [
3]. Ways to fully utilize the computational resources of MEC servers and user equipment (UE) in a highly dynamic edge network to provide stable and reliable services for XR applications is a highly challenging problem.
Recently, there have been multiple initiatives exploring the use of MEC for task offloading in XR applications. A common approach is to employ binary offloading, where the entire XR task is offloaded to the MEC server as a unit [
4,
5,
6,
7]. Optimizing the joint rendering offloading and downlink power control in bandwidth-rich terahertz communication environments to minimize the long-term energy consumption is the focus of [
4]. The prediction of the field of view of VR users and the migration of rendering tasks to MEC servers are employed to maximize the long-term quality of experience (QoE) [
6]. Another approach involves partitioning XR tasks into two arbitrary portions for partial offloading [
8,
9]. A data-sharing model for AR tasks is established in [
9], aiming to optimize both task-partitioning strategy and resource allocation, ultimately improving the quality of experience for AR device users. In order to achieve effective computation offloading strategies, a binary-based approach utilizing convolutional neural networks trained through experiential learning has been implemented to generate optimal offloading actions while maintaining the best computational efficiency [
10].
Offloading XR tasks as a unit in a binary fashion is a straightforward approach, but it may not fully utilize the computational resources available in the edge network. Partial offloading schemes, on the other hand, enable the simultaneous computation of XR tasks on both MEC servers and user equipment, but the arbitrary partitioning of XR tasks can be challenging to implement effectively in practice. According to [
11,
12], XR tasks are divided into five subtasks based on functional components, and then offloaded at the granularity of subtasks to increase the computational resource utilization efficiency and reduce user energy consumption. However, the former does not consider dependencies among subtasks, and the latter designs a task-offloading strategy only for the requirements of serial execution subtasks, without exploring the possibility of parallel execution. Dependence tasks have higher uninstallation dynamics [
13], focusing on changes in inter-task dependency constraints to improve the quality of service of task graphs. However, they do not consider changes in the computing capabilities of edge computing devices. In dynamic MEC systems, Markov decision processes are applied to task graph scheduling. Agents observe changes in the MEC system and intelligently make decisions regarding the scheduling of dependent tasks. They provide computing services for application task execution in a first-come, first-served manner, thereby enhancing user experience [
14]. However, they specify that the computation results of each task must be transmitted back to the user, i.e., there are no data transferred between tasks. In mobile user scenarios, users can adaptively offload computing tasks with dependency constraints to MEC or the cloud to enhance user experience. In this context, the problem of minimizing task completion time has been proven to be non-deterministic polynomial (NP)-hard [
15]. When scheduling correlated tasks using priority awareness, tasks are categorized based on whether they have deadlines. The offloading process is modeled using a directed acyclic graph (DAG), incorporating the impact of user mobility on task-offloading decisions to select the optimal edge processor, thereby reducing completion times and enhancing task satisfaction [
16]. However, these articles only evaluate superiority or inferiority based on objective data or performance metrics, ignoring the subjective user. For XR subtasks, it is only required to transmit the final computation results, thus avoiding unnecessary communication overhead. Therefore, when designing XR task-offloading methods, it is necessary to consider both the mixed serial–parallel relationships and the content transfer in subtasks. In previous works, system performance metrics have primarily focused on latency and energy consumption. However, in the case of XR applications, which are known for their immersive nature, the subjective user experience should be considered the primary evaluation metric. QoE, as an indicator of video application performance, can capture user satisfaction with video services and has been shown to be a superior metric for optimizing network resource allocation [
17].
In this paper, we focus on the collaborative utilization of computational resources in an edge network to enhance XR application computations. Given the limited computational capabilities of XR devices, offloading subtasks to MEC servers through an appropriate strategy can improve the QoE. Wireless communication is employed between terminals and MEC servers to facilitate this process. To address this requirement, we model the XR task as a DAG to represent the dependencies among subtasks. We propose a joint optimization problem that considers both communication channel access and task offloading to maximize energy conversion efficiency, which is defined as the ratio of QoE to energy consumption. Since XR task partitioning involves a complex joint optimization problem, we model it as a Markov decision process (MDP). In a real edge computing environment, the system is continuously non-discrete. When tasks are assigned to processors, in order to ensure the effective processing of user requests, the establishment of Markov decision processes is achieved by dividing the continuous-time state transitions into multiple continuous slots [
18,
19]. This method can bridge the gap between continuous-time models and discrete-time models. To solve this problem, we introduce a channel access strategy and a task offloading strategy based on deep reinforcement learning (DRL), specifically leveraging the multi-agent deep deterministic policy gradient (MADDPG) method [
20]. This approach significantly enhances user experience quality while improving energy conversion efficiency for user equipment. The remainder of the paper is organized as follows.
Section 2 constructs the edge computing network model and describes the XR task model. Section
Section 3 formalizes the optimization problem. Section
Section 4 presents the proposed algorithm for solving the problem. Section
Section 5 showcases simulation results and provides evaluations. Finally, in Section
Section 6, we summarize the paper.
2. System Model
We consider a wireless edge computing network model serving XR applications, as illustrated in
Figure 1. The system comprises a base station equipped with a MEC server and
XR terminal devices. The MEC server deploys a computational model to provide computation capabilities to the XR terminal devices within its coverage area. We denote the set of XR devices using
. Next, we will provide an explanation from three perspectives: XR task model, communication model, and task offloading model.
2.1. XR Task Model
We assume that each XR device processes only one XR task at a time, and the XR task can be portioned into N subtasks, denoted by
. We employ DAG to represent the dependencies among these subtasks, where vertices represent subtasks, and arcs represent dependencies between the input and output of subtasks. The subtask located at the arc head requires the computation results of the subtask located at the arc tail as input. Therefore, a subtask can only be computed after its predecessor subtasks (i.e., located at the arc tail) have been executed.
Figure 2 illustrates three different DAGs resulting from various task partitioning methods.
We define a tuple for the th subtask of the th XR user, which is denoted by . Specifically, represents the input data size, represents the computational complexity, i.e., the number of Central Processing Unit (CPU) cycles required to process each bit of data, represents the ratio of output data size to input data size, and indicates the type of the subtask. signifies that the subtask is an entry subtask, which is not dependent on other subtasks. signifies an intermediate subtask, which depends on other subtasks and is also depended upon by other subtasks. signifies an exit subtask, which is dependent on other subtasks. In our task model, it is assumed that there is only one entry subtask, while there can be multiple exit subtasks. The entry subtask is typically responsible for receiving the entire input data for the task and portioning the whole task, which means it can only be executed at the user’s terminal. All the computational results from the exit subtasks need to be transmitted to the user’s terminal, forming the computational result of the XR task.
Based on the analysis above, we can conclude that the input data size for the
th subtask is equals to the sum of the output data sizes of all the subtasks on which it depends, which is expressed as follows:
where
represents the set of predecessor subtasks on which the
th subtask depends. Therefore, as long as we are aware of the input data size for the entry subtask, we can deduce the input data sizes of all subtasks.
2.2. Communication Model
We model the communication scenario as follows. There is a wireless network connection between the base station and the XR devices. We assume the use of orthogonal frequency division multiplexing (OFDM) for multiple access, where users share
orthogonal subchannels, and the set of subchannels is denoted by
. During the user’s channel access process, the binary channel selection vector for the
th user is expressed by [
21], as follows:
where
,
indicates that the user is accessing the
th channel, and
indicates not accessing the
th channel. This assumes that each user can choose only one channel at the same time, i.e.,
.
When a user sends messages to the base station, it may encounter interference from other users, which can be expressed as follows:
where
represents the Rayleigh channel gain when
th user is using the
th channel, and
represents the transmit power of the user. The signal-to-interference-plus-noise ratio (SINR) for the uplink and the signal-to-noise ratio (SNR) for the downlink can be represented as follows:
where
represents the transmit power of the base station,
and
respectively represent the uplink and downlink channel bandwidths, and
represents the noise power spectral density. Additionally, using
to represent the threshold of bit error rate (BER), the uplink and downlink data transmission rates for the
th user can be calculated as follows [
22]
Therefore, the delay for uploading the computation result of the
th subtask for the
th XR user is equal to the size of the computation result divided by the uplink transmission rate, which is expressed as follows:
The transmission energy consumption is equal to the transmit power multiplied by the transmission time, expressed as follows:
Correspondingly, the delay and receive energy consumption for downloading the computation result are respectively expressed as follows:
where
represents the receiving power of the user.
2.3. Task Offloading Model
In this paper, we investigate the collaborative execution of tasks between XR user terminals and the MEC server. We use to denote the offloading strategy for the subtask, where indicates that the subtask is executed on the user terminal, and indicates it is executed on the MEC server. We specified earlier that the entry subtask can only be executed on user equipment, so that .
(1) Local Execution: Let
represent the computational resources of the
th user. When the
th subtask of the
th user is executed on the user terminal, the required computation time is expressed as follows:
Before the
th subtask begins execution, all its predecessor subtasks must have already been completed, and their computation results must have been returned to the user terminal. Therefore, the readiness time for
th subtask is expressed as follows:
where
and
respectively represent the finish times of subtasks executed on the user terminal and MEC server, and
represents the earliest available time for downlink communication between the
th user and the base station. In other words, if the downlink is currently transmitting data for other subtasks, it is necessary to wait until the transmission of those subtasks is completed before transmitting the data for the current subtask. Therefore, when the
th subtask of the
th user is executed on the user terminal, the corresponding finish time
and terminal energy consumption
[
23] are respectively expressed as follows:
where
represents the earliest available time for the user processor, meaning that, if there are other subtasks currently executing, it is necessary to wait for their completion before processing the current subtask.
is an energy factor that is contingent upon the CPU chip architecture.
(2) MEC Server Execution: Using
to represent the total computational resources of the MEC server, the computational resources that users accessing the server can obtain are expressed as follows:
When the
th subtask of the
th user is executed on the MEC server, the required computation time is expressed as follows:
The readiness time for the subtask is expressed as follows:
where
represents the earliest available time for uplink communication. Therefore, when the
th subtask is executed on the MEC server, the corresponding finish time
is expressed as follows:
where
represents the earliest available time for computational resources that the MEC server allocates to the user. In the paper, we do not consider the energy consumption associated with executing XR tasks on MEC servers, as MEC servers are typically powered directly by alternating current and have sufficient energy resources to handle offloaded tasks.
3. Problem Formulation
Our goal is to simultaneously optimize the channel access strategy and task offloading strategy, aiming to enhance the energy conversion efficiency of user equipment while ensuring the satisfaction of user QoE requirements. Based on the analysis in
Section 2, the total finish time and total energy consumption for
th user to complete the task are expressed as follows:
where
is an indicator function and
represents the set of successor subtasks that depend on the
th subtask. The finish time for XR task depends on the exit subtask that returns the computation results to the user equipment at the latest. The total energy consumption is composed of the following parts.
The computational energy consumption incurred by the execution of subtasks on user equipment.
The transmitting energy consumption when the user transmits the computation results to the base station.
The receiving energy consumption when the user receives the computation results from the base station.
According to International Telecommunication Union Telecommunication Standardization Sector (ITU-T) Rec. P.1203.1 [
24], we use the mean opinion score (MOS) as the measurement standard for QoE. Generally, MOS is divided into five levels: poor, fair, good, very good, and excellent, corresponding to ratings of one to five, respectively. We model the MOS of the
th user as follows [
25]:
where
and
are constants to ensure that MOS falls within a reasonable range of values, and
represents the discrete mapping to the five QoE levels.
Our research objective is to maximize energy conversion efficiency while ensuring a satisfactory QoE for XR users. As in [
26,
27], we have selected optimization goals that achieve the desired QoE for users, specifically the weighted sum of total task energy consumption and MOS, which can be expressed as follows:
where
is the minimum QoE threshold set by the
th user.
are scalar weights for MOS and energy consumption, respectively.
Let
represent the task offloading decisions for the user. To maximize the energy conversion efficiency, the joint optimization problem for communication channel access and task offloading studied in this paper can be formulated as follows:
Constraint (C1) indicates that the user can access a maximum of one channel at a time. Constraint (C2) places restrictions on the actions of XR users in selecting channels and offloading tasks. Constraint (C3) represents the minimum MOS threshold set by the user.
4. DRL-Based Joint Optimization
In this section, we initially model the optimization problem as an MDP. Subsequently, we introduce an algorithm that utilizes MADDPG to support XR users in making channel selections and task offloading decisions.
4.1. Background
Due to the combination of channel access and task-offloading strategy, and as it involves a multi-user competitive problem with mixed integers, the optimization problem (24) is non-convex and NP-hard. Moreover, it is a challenge to make joint optimization decisions in rapidly changing and highly dynamic wireless networks. In recent years, DRL has seen rapid development. It is an emerging artificial intelligence technology that has been employed to address a wide range of decision and computational problems. Additionally, it is particularly recognized as an effective tool for tackling non-convex and complex optimization problems in highly dynamic wireless environments. Given the sensitivity of XR applications to latency, it is crucial for user devices to make prompt decisions. MADDPG adopts the architecture of centralized training and decentralized execution to address this requirement. After the training is completed, agents can make decisions independently, relying on their own observations. Additionally, MADDPG enables each agent to learn its action policy cooperatively, without requiring global knowledge or a central controller. This allows agents to dynamically adjust their behavior in response to changes in the environment, better adapting to complex environments and improving the ability to model potential nonlinear relationships during the task offloading process to achieve global optimality. Therefore, we use the MADDPG algorithm to train the agents.
The MDP is widely used for modeling reinforcement learning problems. It is typically represented by a tuple . denotes the set of states, denotes the set of actions, denotes the set of state transition probabilities, denotes the reward, and is the reward discount factor. We model the joint optimization problem as the following MDP model.
(1) State Space: In our system, we assume that each XR device is running an XR application. The input data size of the
th user is
, and the local computational resources available are represented by
. The XR device acquires the channel gain
between itself and the base station using sensing technology. Additionally, the base station needs to broadcast the computational resources
of the MEC server to all users. Therefore, the observation of
th user can be represented as follows:
The state can be represented as the collective observation set of all users, denoted as .
(2) Action Space: In the edge network, XR devices need to select a channel for communication with the base station and decide whether each subtask should be computed on the local device or on the MEC server. The action for the
th user can be represented as follows:
Therefore, the set of actions can be represented as
.
(3) Reward Function: Our optimization objective is to maximize user energy conversion efficiency, so we define the reward for the
th user as follows:
where
is the punishment coefficient, controlling the magnitude of the punishment when the agent violates QoE constraint.
is a step function.
is the reward scaling factor, which can be used to adjust rewards without altering the reward function, facilitating faster learning and improved asymptotic performance [
28].
4.2. The Proposed Algorithm
Next, we will describe the specific details of the proposed algorithm. Based on the MADDPG framework, we treat each XR device as an agent. Each agent has two types of networks: policy network and value network. The policy network determines the next action based on the agent’s own observation, and the value network evaluates the goodness of the current action based on the global state and the actions of all agents. Additionally, target networks are used to mitigate the problem of the overestimation of Q-values during the training process. Using and to represent the policy network and target policy network, their parameters are and , respectively. Using and to represent the value network and target value network, their parameters are and , respectively.
MADDPG is an off-policy algorithm, so we can utilize experience replay to reuse past experiences. We denote the replay buffer as , which stores data like , representing the current system state, actions of all agents, the next system state, and rewards. Additionally, we use an -greedy policy to control agents’ interactions with the environment, facilitating exploration.
(1) Centralized Training: During training, we deploy policy networks to the MEC server. At this point, XR devices cannot make decisions independently and must follow the instructions sent by the base station. Training begins when a sufficient amount of data has been collected in the replay pool. In each training iteration, a small batch of data is randomly sampled from
to train the neural networks. The target value network provides the expected Q-value
, based on the action generated by the target policy network, as follows:
where
is the reward discount factor. The value network updates its parameters
through gradient descent using a mean squared error loss function, as follows:
where
is the size of a small batch. With the assistance of the value network, the policy network updates its parameters
using gradient ascent, as follows:
Finally, the parameters of the target policy network and target value network are softly updated, as follows:
where
is the soft update rate.
(2) Decentralized Execution: After training is completed, the value network is no longer needed. Then, we deploy the policy network to the corresponding XR devices. At this point, the th user can make decisions independently for channel access and task offloading based on its own observations, achieving real-time decision-making.
Based on the MDP and analysis described above, the training process for the proposed algorithm is shown in Algorithm 1. The MADDPG algorithm utilizes actor–critic (AC) networks with deep neural networks composed of input layers, output layers, and hidden layers. The actor network takes as input a state space composed of input data size, local available resources, channel gains, and MEC available resources, producing channel selection and task-offloading decisions as outputs. The critic network takes the concatenation of state and action spaces as inputs and outputs Q-values. During each training step, MADDPG samples and stores experiences from
agents, then batches
experiences for training each agent.
Algorithm 1. Training Process of Our Proposed Model |
1 | Initialize the experience replay buffer ; |
2 | Initialize the edge network environment with agents; |
3 | for episode = 1, 2, … do |
4 | for do |
5 | for each UE do |
6 | Observe an observation from the environment; |
7 | Select an action based on -greedy and policy network; |
8 | end for |
9 | Execute action ; |
10 | Observe reward and next state ; |
11 | Store in buffer ; |
12 | if buffer is ready and time to learn then |
13 | Randomly sample a mini-batch of samples from ; |
14 | for each UE do |
15 | Update the value network by minimizing the loss function in (30); |
16 | Update the policy network using gradient ascent with (31); |
17 | Update target networks parameters with (32) and (33); |
18 | end for |
19 | end if |
20 | end for |
21 | end for |
The primary factor influencing time complexity is the dimensionality of the network structure. The computational complexity required to train each actor with a segment of experiences is
, where
represents the dimensionality of the state space,
represents the number of neurons in the hidden layers, and
represents the dimensionality of the action space. Similarly, the time complexity of the critic network can be represented as
. Given that the target AC networks share the same network structure as the AC networks, the algorithmic complexity for a single agent can be expressed as
. Ultimately, the overall time complexity of the algorithm is determined to be
[
29].
5. Simulation Results and Analysis
In this section, we initially present the experimental setup and specific simulation parameter configurations, followed by the presentation of simulation results and the subsequent analysis.
5.1. Simulation Settings
We consider a cellular cell with four XR users, where a MEC server operates on six orthogonal channels deployed by the base station, and
user devices are randomly generated within the coverage area of the base station. Each XR user runs an XR application with a data rate of 30 Mbps at 60 frames-per-second. We assume that processing one frame constitutes a XR task, and the size of XR video frames
is modeled as truncated Gaussian distributions, following the guidelines outlined in the third Generation Partnership Project Technical Report (3GPP TR) 38.838 [
30]. Unless otherwise specified, we perform XR task splitting according to the topology of Task 2 in
Figure 2. Other parameters in the simulation scenario are presented in
Table 1.
The proposed algorithm utilizes a neural network with three hidden layers employing Rectified Linear Unit (ReLU) activation functions, featuring 64, 128, and 64 neurons in the respective layers. The policy and value networks employ learning rates of 0.01 and 0.0008, respectively, updated using the Adam gradient optimization method. Target network parameters are updated softly at the rate of 0.01. The of -greedy policy is 0.9. The reward discount factor is set to 0.99. The capacity of the replay buffer is 2048, and it samples 64 data points in a batch. The training process spans 2000 episodes, each comprising 20 steps.
5.2. Performance of Our Proposed Algorithm
In the simulation, we compared our proposed algorithm with the following schemes or algorithms.
(1) Local: All XR tasks are executed on the local devices.
(2) Random Access and Execute in MEC (RA-MEC): XR users randomly select a channel to access the base station, after which all subtasks are offloaded to the MEC server for execution.
(3) Strategy Access and Execute in MEC (SA-MEC): In the SA-MEC scheme, our proposed algorithm’s access strategy is used to minimize interference, followed by the offloading of all subtasks to the MEC server for execution.
(4) Independent Q-Learning algorithm (IQL): This scheme employs the independent Q-learning algorithm [
31] to train channel access and task offloading strategies. Independent Q-learning is a decentralized multi-agent reinforcement learning algorithm where each agent learns a policy based solely on its own actions and observations, treating other agents as part of the environment.
(5) Double Deep Q-Network (DDQN): The approach utilizes DDQN [
32] to train channel access and task offloading strategies. This algorithm is an enhanced version of the deep Q-network (DQN) algorithm. By alternating training between two neural networks—one for selecting optimal actions and the other for evaluating the Q-values of these optimal actions—the method achieves individual agent optimization.
The trends in rewards and losses of our proposed algorithm are shown in
Figure 3. Due to the non-stationary nature of the environment, rewards exhibit significant fluctuations. To better observe the convergence trend, we have smoothed the rewards, as indicated by the red line. From the rewards and losses, it can be observed that the channel access and task-offloading strategies continuously improve with increasing training epochs, and that the algorithm tends to converge after approximately 700 iterations.
The performance comparisons are shown in
Figure 4 and
Figure 5 in terms of reward and energy consumption. From
Figure 4, regardless of whether the MEC server has limited or abundant computational resources, the MADDPG algorithm consistently achieves a higher reward. When there are sufficient computing resources in the MEC server and under favorable channel conditions, offloading all subtasks to the MEC server is evidently the optimal solution. However, due to the SA-MEC scheme only optimizing the channel access strategy without considering the impact of channel access on task offloading, its performance consistently remains lower than that of the proposed algorithm. The RA-MEC scheme employing random access strategy may suffer from poor performance due to the significant interference that XR devices encounter during communication with base stations, leading to increased task-processing costs. As for the IQL algorithm, agents ignore the observations and actions of other agents, so the state transition probabilities are non-stationary, resulting in a less effective algorithm than our proposed algorithm. The DDQN algorithm builds upon the former by adding a separate Q-value network to estimate the optimal action, reducing bias in action-value function estimation and improving the accuracy of optimal action selection to some extent. Therefore, this algorithm tends to outperform IQL, but due to the absence of inter-agent cooperation, it falls slightly short compared to MADDPG.
From
Figure 5, as the computing resources of MEC servers continue to increase, algorithms based on MADDPG can offload more subtasks to MEC servers for execution, thereby continuously reducing the energy consumption of user devices. For users located farther from the base station, our proposed algorithm tends to execute subtasks locally. As a result, the energy consumption of our proposed algorithm is higher than the schemes that offload all subtasks. In the RA-MEC scheme, XR devices do not perform subtasks, resulting in zero computational energy consumption. However, due to communication interference, the transmission energy consumption is significantly increased, resulting in an overall high energy consumption.
For the different task partitioning approaches shown in
Figure 2, when the computing resources of the MEC server are set at 12 GHz, the relative rewards achieved by each scheme are illustrated in
Figure 6. When the dependencies between subtasks are relatively simple, there is little difference in the rewards achieved by various schemes. The algorithms proposed achieve higher rewards when splitting tasks according to task 2 or task 3. This is because, in task 1, all subtasks must be executed sequentially, resulting in only one computing node working at any given time on the MEC server and user devices. In contrast, task 2 and 3 allow for the parallel execution of subtasks, thus offering a higher upper limit on application performance. In task 3, due to the increased number of subtasks and more complex dependencies, it becomes more challenging to formulate channel access and task-offloading strategies. Therefore, compared to task 2, varying degrees of performance decline are exhibited. As tasks become increasingly complex, the rewards obtained by the MADDPG algorithm proposed in this paper consistently outperform those of other approaches.
In summary, our proposed algorithm effectively meets users’ MOS requirements while reducing XR device energy consumption and improving energy conversion efficiency. It demonstrates strong adaptability, as it consistently delivers favorable outcomes across varying MEC server computing resource levels. Additionally, our proposed algorithm demonstrates notable proficiency in handling various XR task-portioning methods, particularly for subtasks that involve complex dependencies.
6. Conclusions
In the paper, we focus on the operation of XR devices in edge networks, where tasks are offloaded to MEC servers using wireless communication. Our objective is to enhance service quality and improve user experiences. Within our system architecture, we conduct joint optimization of communication channel access and task offloading, aiming to maximize energy conversion efficiency while ensuring that user QoE requirements are met. To address this joint optimization problem, we have introduced an algorithm based on MADDPG. This algorithm has been designed to optimize the decision-making process for channel selection and task offloading. The simulation results have demonstrated the excellent performance of our algorithm across different XR task portioning methods. It effectively meets the user’s QoE demands and enhances energy conversion efficiency, thereby validating the effectiveness of our proposed approach.
This paper only investigates the problem of task offloading for multiple XR users under a single base station. When the MEC servers deployed locally at the base station are heavily loaded, offloading tasks to idle MEC servers at nearby base stations is another strategy. However, this inevitably increases the task completion latency. Therefore, in the future, we will explore the collaborative completion of computing tasks by multiple XR users and multiple MEC servers to achieve better XR application performance.