Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
Dynamic Asset Allocation and Retirement Decision with Consumption Ratcheting and Effort Choice
Previous Article in Journal
The Effect of Leachate Recycling on the Dynamics of Two Competing Bacteria with an Obligate One-Way Beneficial Relationship in a Chemostat
Previous Article in Special Issue
On the Optimization of Kubernetes toward the Enhancement of Cloud Computing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Collaborative Optimization Strategy for Dependent Task Offloading in Vehicular Edge Computing

1
School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China
2
Liaoning Liaohe Laboratory, Shenyang 110033, China
3
Shenyang Key Laboratory of Advanced Computing and Application Innovation, Shenyang 110870, China
4
School of Artificial Intelligence, Shenyang University of Technology, Shenyang 110870, China
5
Shenyang Industrial Smart Chip and Network System Innovation Application Key Laboratory, Shenyang 110084, China
6
School of Information and Electronic Engineering, Advanced Institute of Industrial Technology, Tokyo 140-0011, Japan
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(23), 3820; https://doi.org/10.3390/math12233820
Submission received: 19 October 2024 / Revised: 23 November 2024 / Accepted: 1 December 2024 / Published: 2 December 2024
(This article belongs to the Special Issue Advanced Computational Intelligence in Cloud/Edge Computing)

Abstract

:
The advancement of the Internet of Autonomous Vehicles has facilitated the development and deployment of numerous onboard applications. However, the delay-sensitive tasks generated by these applications present enormous challenges for vehicles with limited computing resources. Moreover, these tasks are often interdependent, preventing parallel computation and severely prolonging completion times, which results in substantial energy consumption. Task-offloading technology offers an effective solution to mitigate these challenges. Traditional offloading strategies, however, fall short in the highly dynamic environment of the Internet of Vehicles. This paper proposes a task-offloading scheme based on deep reinforcement learning to optimize the strategy between vehicles and edge computing resources. The task-offloading problem is modeled as a Markov Decision Process, and an improved twin-delayed deep deterministic policy gradient algorithm, LT-TD3, is introduced to enhance the decision-making process. The integration of LSTM and a self-attention mechanism into the LT-TD3 network boosts its capability for feature extraction and representation. Additionally, considering task dependency, a topological sorting algorithm is employed to assign priorities to subtasks, thereby improving the efficiency of task offloading. Experimental results demonstrate that the proposed strategy significantly reduces task delays and energy consumption, offering an effective solution for efficient task processing and energy saving in autonomous vehicles.

1. Introduction

With the ongoing development and application of the Internet of Vehicles (IoV) and autonomous driving technology, the Internet of Autonomous Vehicles (IoAV) has gained increasing attention [1,2]. This concept includes advanced automation, intelligent sensing, and decision-making capabilities, and it presents technological complexities and cost challenges. Meanwhile, the continued advancement of IoAV has enabled the deployment of a wide range of related applications such as augmented reality (AR), real-time video analysis, and human behavior recognition [3,4]. These applications are compute-intensive and delay-sensitive, requiring substantial computing resources for processing multi-dimensional and diverse data, posing challenges for deployment on resource-limited vehicles. Moreover, these applications exacerbate inter-task dependencies, extending overall task execution times and increasing energy consumption. This can prevent tasks from being completed within designated times, negatively impacting the quality of experience (QoE) and potentially endangering lives. For instance, real-time video analysis often relies on preliminary information from human action recognition for downstream tasks, which must wait until the information arrives. If the task is not completed within the specified time, it will affect the subsequent decision-making in the autonomous driving vehicle, which may lead to serious safety accidents.
Mobile cloud computing (MCC), a technical mode for handling large-scale dynamic data streams, provides a viable solution to the challenges mentioned above [5]. This technology divides vehicle tasks into multiple sub-tasks and offloads them to cloud servers with powerful computing capabilities for execution, offering mobile users ample computing resources and efficient online resource management. However, cloud computing centers are often centralized in specific geographic locations to consider cost and stable energy supply, typically far from densely populated areas. This geographical separation inevitably leads to higher communication costs and increased latency, which can compromise the reliability of autonomous vehicles and heighten safety risks for users [6,7]. The emergence of mobile edge computing (MEC), which provides computing services by deploying distributed infrastructure near data sources, effectively bridges the gap with MCC [8,9,10]. In IoAV, MEC servers are deployed in roadside units (RSUs) along both sides of roads in dense traffic areas and sections prone to frequent accidents, or at base stations (BS) that cover large areas. Through vehicle-to-infrastructure (V2I) or vehicle-to-vehicle (V2V) communication, autonomous vehicles offload computationally intensive tasks to edge devices or nearby vehicles with spare resources for processing, which effectively shares the computational pressure, reducing latency and bandwidth consumption and improving task execution efficiency [11,12]. However, edge servers and service vehicles with constrained computing resources often struggle to meet the demands of large-scale applications in scenarios with dense traffic flow [13]. This limitation can lead to communication congestion and network paralysis, which seriously affect traffic order and the safety of autonomous vehicles. Moreover, in-vehicle applications such as autonomous driving, Internet of Vehicles (IoV) services, and advanced driver assistance systems (ADAS) exhibit significant task dependencies. Improper offloading strategies can directly impact system performance, response speed, and resource utilization [14]. Therefore, efficiently utilizing edge-side resources and ensuring task execution priority has become a critical challenge.
The task offloading of autonomous vehicles is commonly framed as a decision-making issue [15]. Traditional methods such as dynamic programming (DP) and the genetic algorithm (GA) struggle to adapt to dynamically changing environments, making it difficult to converge to global optima in complex and highly dynamic IoAV scenarios. Additionally, these traditional methods lack energy optimization and learning capabilities, suffering from high computational complexity and increased latency when multiple complex dependencies are involved. This can be catastrophic for tasks with strict latency requirements [16]. Deep Reinforcement Learning (DRL), a machine learning algorithm that utilizes a trial-and-error learning strategy, has pioneered an innovative research direction for offloading decision-making [17]. It can learn complex dependencies and adaptively adjust the task-offloading strategy through continuous interaction and learning with the environment. This enables flexible real-time scheduling and allocation in highly dynamic and uncertain IoAV scenarios, effectively handling offloading decision issues with non-convex optimization constraints. With the increasing complexity of task scenarios, traditional single neural networks (NNs) often face limitations in complex, dynamic, and resource-constrained environments. The multi-layer perceptron (MLP), in particular, has limited learning ability and struggles with high-dimensionality and nonlinear complex problems [18]. In the highly dynamic, task-dependent vehicle network, these traditional networks often lack strong adaptability and generalization, failing to conduct accurate and precise operational strategies in response to emergencies, thus reducing system performance and efficiency. In scenarios with high safety requirements, such as autonomous driving, they may be incapable of responding in time to unexpected situations, compromising the safety and reliability of the system. Therefore, optimizing the DRL network architecture to enhance adaptability and robustness is essential for managing complex and dynamic vehicle task-offloading scenarios.
In this context, we propose an innovative task-offloading scheme that models the vehicle task-offloading optimization issues as a constrained Markov decision process (CMDP). Subsequently, we introduce an improved version of the twin-delayed deep deterministic policy gradient (TD3) algorithm, termed LT-TD3, which integrates a long short-term memory (LSTM) network [19] and a self-attention mechanism [20] to enhance network performance and efficiency. Considering the strong task dependencies, we employ a topological sorting algorithm to allocate execution priorities for subtasks, thereby optimizing task offloading efficiency.
The main contributions of this paper are as follows:
  • We consider the issues of vehicle task offloading and task dependence during offloading in an IoAV scenario within a densely populated urban area, which is modeled as a Markov decision process to minimize the delay and improve the offloading efficiency.
  • The proposed innovative deep reinforcement learning (DRL) algorithm, LT-TD3, integrates the TD3 algorithm with LSTM networks and the self-attention mechanism to enhance algorithm performance and efficiency.
  • To address the issue of strongly dependent tasks in the offloading process, we employ a topological sorting algorithm to assign offloading priorities to subtasks, optimizing the task-offloading sequence and reducing the impact of task dependencies on the overall offloading efficiency.
  • Simulation results demonstrate the effectiveness of the proposed algorithm in highly dynamic, densely populated scenarios with strongly dependent tasks. The proposed algorithm significantly improves convergence, energy consumption, and latency compared with baseline methods.
The remainder of this paper is organized as follows. Section 2 introduces related work. Section 3 describes the system model and problem formulation. Section 4 presents the design and implementation of LT-TD3. Section 5 evaluates the simulation results of the proposed approach and compares it with existing methods. Section 6 concludes with a summary of the article and discusses potential future research areas.

2. Related Work

2.1. Mobile Edge Offloading

The development of IoAV has led scholars to explore optimal task-offloading strategies. Feng Zeng et al. proposed a volunteer-assisted vehicular edge computing model that encourages volunteer vehicles to assist VEC servers. They identified the optimal strategy based on the Stackelberg game. Subsequently, they designed a fast search algorithm based on a genetic algorithm to determine the best pricing strategy for VEC servers [21]. To reduce the performance delay of data-intensive tasks on edge service nodes, a novel task allocation optimization framework, Folo, was introduced. This framework utilizes a combination of linear programming and particle swarm optimization algorithms for dynamic task allocation, shortening the average service delay while reducing quality loss [22]. Considering various vehicular network scenarios, ref.  [23] proposed a joint task-offloading and resource allocation scheme to optimize resource management in vehicular edge computing (VEC). Focusing on the diversity and flexibility of tasks and scenarios, a heuristic algorithm based on generalized bender’s decomposition (GBD) and the reformulation linearization (RL) technique was designed to minimize the total task processing delay, which demonstrated preferable generalization and low complexity. In addition, for cascade optimization, the improved genetic algorithm (IGA) [24], a two-stage heuristic optimization algorithm based on genetic algorithms [25], and a bilevel optimization approach, BiJOR [26] have presented superior performance.
Although traditional task-offloading algorithms demonstrate good real-time performance and low complexity, they fall short in addressing multi-objective optimization and global dynamic adaptation in highly dynamic and complex IoAV environments. Therefore, developing a novel algorithm is essential to overcome these limitations and enhance the efficiency and stability of task offloading.

2.2. DRL for Mobile Edge Offloading

The emergence of DRL offers an innovative solution for optimizing task-offloading strategies. Its dynamic adaptive learning and global optimization capabilities can effectively tackle the complexities and variations of the autonomous driving environment, facilitating the intelligent advancement of task-offloading technology in IoAV [27].
Ning et al. [28] proposed a three-layer offloading framework to reduce energy consumption while meeting the latency requirements for in-vehicle entertainment service applications by leveraging the computing resources of moving and parked vehicles. Faced with the complex resource allocation problem, the issue was decomposed into traffic redirection and task-offloading decisions, which were addressed using the Edmonds–Karp algorithm and a DDQN-based algorithm. Sun et al. [29] studied task migration in vehicular clouds to minimize total system response time and developed DQN-based algorithms. Qi et al. [30] focused on the issue of resource scheduling and offloading under multiple constraints, training across multiple edge nodes using the A3C algorithm to maximize long-term system rewards. Lee et al. [31] addressed service delay issues by utilizing the computing resources of parked vehicles, extracting time and location resource patterns through RNN-based methods, and allocating resources through the PPO algorithm. Recent improvements to traditional single-DRL networks have further enhanced algorithm performance and efficiency. Ref. [32] proposed a task-offloading method based on meta-reinforcement learning, which can quickly adapt to the environment under small samples, reducing system network traffic and service delay. Ref. [33] integrated LSTM with the traditional deep Q-network (DQN) to substantially reduce task loss rates and average latency. In addition, with the development of neural networks (NNs), various network architectures integrated with deep reinforcement learning (DRL) have achieved significant advancements [34,35,36,37].
However, the aforementioned studies may suffer from slow convergence and reward instability during training due to traffic density and task characteristics. Moreover, they overlook task interdependencies during scheduling, where the execution order and task prioritization are critical for tasks with strong dependencies. Therefore, it is necessary to develop an innovative task-offloading framework that enhances algorithmic convergence and stability, while accounting for task dependencies to optimize offloading decisions and minimize latency.

2.3. Adaptive DRL for Mobile Edge Offloading

In addition, considering the limitations posed by vehicle mobility, energy efficiency, and network fluctuations in dynamically complex environments, enhancing the dynamic adaptability of the algorithm is essential. Considering the movement state of vehicles, Cheng et al. [38] combined deep adaptive reinforcement learning with an improved BinBRO algorithm to utilize parked vehicles to allocate limited fog resources, aiming to minimize service latency. Focusing on the dynamics and heterogeneity of vehicles, Ref. [39] proposed a graph neural network-augmented deep reinforcement learning strategy, GA-DRL. This strategy employs a multi-head graph attention network (GAT) to extract features and augment them into a double deep Q-network learning module, which achieves the timely scheduling of dynamic vehicle tasks. To improve the efficiency of resource management under the constraints of vehicle mobility and task arrival randomness, ref. [40] constructed an end-edge-cloud offloading model in a bidirectional road scenario and proposed a deep deterministic policy gradient-based adaptive computation offloading and power allocation algorithm to improve the decision-making performance in the dynamic heterogeneous VEC environment. Moreover, the deployment of UAVs to assist vehicle edge computing in highly dynamic environments also involves addressing the adaptive challenges of the algorithm. The success rate of task offloading is raised by enhancing the adaptability of the algorithm [41]. By combining deep reinforcement learning (DRL) with traditional algorithms, the adaptive capability of the algorithm in complex scenarios can be significantly enhanced [42,43]. It improves the task completion rate, reduces latency, and optimizes resource allocation in highly dynamic and heterogeneous environments.
In highly dynamic environments with numerous time-sensitive tasks, using traditional algorithms to improve its dynamic adaptability may struggle to fully adapt. These methods commonly lack the flexibility and real-time decision-making needed to handle sudden events or emergencies, which can cause performance degradation and increased time delays. Therefore, it is necessary to explore the internal adaptive optimization of the network to improve the decision-making efficiency and performance of the algorithm.

3. System Model and Problem Formulation

The task dependence and overall utilization optimization of available computing resources are considered when constructing the system model for autonomous vehicles.

3.1. Vehicle Network Model

In the designed cloud environment for autonomous vehicles, key components such as RSUs, autonomous vehicles, conventional vehicles, and cloud control centers are primarily involved, as shown in Figure 1. Among these, the cloud control center is deployed on the RSU, which facilitates wireless communication with vehicles through the V2I transmission mode and is equipped with an MEC server to augment computational capabilities. Moreover, we introduce the concepts of resource units and resource pools to better represent the computing resources within the autonomous vehicle cloud. The resource unit is defined as the smallest unit of resources in the autonomous vehicle cloud, and all resource units form a resource pool, which is centrally managed by the cloud control center.
Vehicles that generate computation tasks within the coverage area of the vehicle cloud are referred to as task vehicles. When a task vehicle with limited computing resources potentially fails to meet high task requirements, it uploads task request information to the cloud control center via a wireless link. The control center evaluates its capability to process the task based on the received information and the available resources in the resource pool. If the task can be processed, the task vehicle uploads the task details to the autonomous vehicle cloud, and the control center allocates the necessary resource units for processing.
In addition, the scenario involves more complex vehicle tasks with dependencies, namely, the completion of a certain task depends on the data provided by other tasks. The tasks set to be processed in the autonomous vehicles cloud are denoted as K = k 1 , k 2 , , k n ; each task has triad properties T n = D n , S n , t n m a x , where D n denotes the task data size, S n denotes the number of resources required to complete the task, and t n m a x denotes the maximum tolerable delay of the task. In relation to the presence of K, each task can be expressed as A n = A n , 1 , A n , 2 , , A n , t n , where A n , t n represents the t n subtask of applying K n , and t n represents the total number of tasks of task K n . To simplify representation, we express each subtask A n , i with a triple d n , i , s n , i , t n , i m a x . Here, d n , i denotes the data size of task A n , i . Moreover, s n , i represents the computing resources required by task A n , i , and t n , i m a x represents the maximum task delay that A n , i can tolerate.

3.2. Task Dependency Model

A directed acyclic graph (DAG) is a structure consisting of nodes connected by directed edges, where no cycles are present—meaning it is impossible to start at one node, follow a sequence of directed edges, and return to the same node. This property enables us to model tasks in a way that inherently respects the order of dependencies. Considering the DAG can comprehensively describe a task process sequence, we utilize the DAG to model vehicle tasks with dependencies. In the paper, tasks with dependencies are denoted as Y n = A n , H n , where A n represents the subtask set, where each subtask is a vertex of the DAG, H n denotes the set of dependencies, with the dependencies between each subtask forming a DAG edge. Assuming the DAG edge h A n , i , A n , j H , which includes A n , i , A n , j A n , representing task A n , i , A n , j , and dependencies between. In this case, A n , j needs to obtain the output data from A n , i to proceed, namely, A n , i denotes the predecessor task of A n , j , and A n , j denotes the successor task of A n , i . It is noted that a subtask is called the entry or start task if it has no predecessor task. Similarly, a subtask is called the exit or end task if it has no successor task. Figure 2a specifically illustrates a vehicle task with dependencies using an augmented reality (AR) program. The video source and renderer cannot be offloaded, which must be processed locally within the vehicle, whereas the tracker, mapper, and object recognizer can be offloaded. The dependencies between components are visible. For instance, the object recognizer requires the mapper’s output to execute, and this output is subsequently used as input for the renderer. Figure 2b shows the DAG modeling of vehicle-dependent tasks, where task 1 is the predecessor to task 4, and task 5 is the successor to task 4. In the DAG, task 1 is the start task, while task 13 is the end task.

3.3. Transport Model and Computation Model

In autonomous vehicle cloud task offloading, vehicles typically offload tasks to RSUs through V2I communication, in which the task communication link remains temporarily stable. The transmission rate in the V2I mode can be expressed using the Shannon formula:
R V 2 I = W log 2 ( 1 + P t h 2 N V 2 I ( d T V , R S U ) v )
where W represents the bandwidth size, P t represents the transmission power, h and v represent the channel gain and the path loss exponent, respectively, N V 2 I denotes the noise power, and d T V , R S U represents the distance between the task vehicle and RSU. In this case, the task transmission time that the task vehicle needs to offload is as follows:
t c = d i R V 2 I
where d i denotes the data size of the subtask. The execution time t i p r o c for task i is denoted as follows:
t i p r o c = 1 u N t u n i t
If the task is processed locally, the resource pool does not allocate resources for it; instead, only the local computing resources of the task vehicle are used for processing. The local processing time, t i l o c , is denoted by the following:
t i l o c = S n f l o c
where S n denotes the number of resources required to complete the task, and f l o c denotes the locally available resources.

3.4. Energy Consumption Model

The transmission and processing of vehicle tasks require a certain amount of energy to minimize energy consumption while adhering to the maximum tolerable task delay. The local processing energy consumption, E l o c , is expressed as follows:
E l o c = P l o c t i l o c
where P l o c denotes the local processing power.
If the task is transmitted to the autonomous vehicle cloud for processing, the energy consumed E a v c c is expressed as follows:
E a v c c = P a v c c t i p r o c
where P a v c c denotes the cloud processing power of autonomous vehicles.

3.5. Dependency Task-Offloading Problem Formulation

In the DAG model used for modeling dependent tasks, time-sensitive or resource-constrained tasks are given additional priority by the topological sorting algorithm. This ensures that critical, latency-sensitive operations are not delayed when resource availability is limited. Thanks to this approach, the LT-TD3 can more effectively manage complex, interdependent task sequences, which improves the overall responsiveness of the system and ensures robust performance in real-time vehicular environments when resources are scarce. In this paper, the control center within the autonomous vehicle cloud allocates resource units to complete the vehicle tasks with dependencies, ensuring that tasks are completed within the maximum tolerable delay while reducing energy consumption. The formulation is as follows:
m a x N u n i t n = 1 N i = 1 N i X i n ( t i n m a x t i n t o t a l ) E a v c c Y i n ( t i n m a x t i n m a x ) E l o c
where t i n m a x denotes the maximum tolerable task delay, and t i n t o t a l denotes the total task completion time, which can be expressed as t i n t o t a l = X i n ( t c + t p r o c ) + Y i n t l o c . It has the following constraints:
C 1 : X i n , Y i n [ 0 , 1 ] C 2 : t i n m a x t i n t o t a l , n N , i N i
where constraint C 1 represents the offloading location of the task, X i n and Y i n are marking bits in which X i n = 1 indicates execution in the autonomous vehicle cloud, and Y i n = 1 indicates local execution in the task vehicle, meaning a subtask can only be executed in one location. Constraint C 2 indicates that all task delays n cannot exceed their maximum tolerable delays.

4. Algorithm Design and Implementation

Aimed at the DAG task model in the autonomous vehicle cloud, this paper proposes an improved twin-delayed deep deterministic policy gradient (TD3) algorithm. The deep deterministic policy gradient (DDPG) algorithm [44] is based on the deep Q-network (DQN) algorithm [45], as DQN cannot address continuous action control. While DDPG resolves this limitation, it still suffers from certain drawbacks, which TD3 [46] aims to overcome. Moreover, we have enhanced the foundational TD3 network by integrating the self-attention mechanism and LSTM to further boost network performance and efficiency. The self-attention mechanism focuses on key sections of the input data, dynamically adjusting weights to emphasize critical information. This approach effectively manages long-term dependencies, enabling the model to prioritize relevant details and filter out less essential data. LSTM is designed to capture short-term dependencies in sequential data, and is particularly effective for long-term tasks. LSTM addresses the vanishing gradient problem, retaining important information over long sequences, which satisfies dynamic task requirements

4.1. Environment Model

Firstly, the agent and environment are defined: the state space S t , the action space A t , and the reward function R t .

4.1.1. State Space

The state space S t is defined as follows:
S t = D t , S t , t t m a x , Q , f l o c , f c 1 , f c 2
where D t , S t , a n d t t m a x represent the triple-attribute task information, f c 1 represents the available resource units of the autonomous vehicles, f c 2 represents the available computing resources of traditional vehicles, f l o c denotes the local computing resources, and Q represents the task-sorting results in DAG.

4.1.2. Action Space

The action space A t is defined as follows:
A t = a 1 , a 2 , , a n a 0 , 1 , 2 , , k
where a n refers to the resource units allocated to the nth task, and a n = 0 represents that the task is processed locally. Moreover, allocated resource units cannot exceed the total number k of resource units in the current autonomous vehicle cloud.

4.1.3. Reward Function

The objective of this paper is to complete the tasks within the maximum tolerable delay, while possibly reducing energy consumption. The reward R t is defined as follows:
R t = K log 2 ( 1 + ( t n m a x t n t o t a l ) ) ( X i n E v c c + Y i n E l o c ) , t n m a x t n t o t a l R t = T , t n m a x < t n t o t a l
where t n m a x denotes the maximum acceptable task delay, t n t o t a l denotes the total time consumed in task processing, K denotes the positive constant value, X i n and Y i n denote flag bits representing the task-offloading positions. E v c c and E l o c represent the total energy consumption values generated by the tasks in the autonomous vehicle cloud and during local processing, respectively. In addition, a penalty should be given when the total task delay exceeds the maximum tolerable task delay, where the penalty value is T = 1000 .

4.2. LT-TD3-Based Task-Offloading Algorithm

To address the challenge of task offloading with dependencies, tasks are decomposed into subtasks that are processed using corresponding resource units allocated by the control center. This paper utilizes an improved TD3 algorithm, introducing key enhancements to DDPG and integrating the self-attention mechanism and LSTM to manage these tasks. Firstly, adopting the double DQN (DDQN) approach, utilizing two critic networks to estimate the Q-value, with the smaller critic network serving as the target network, mitigates the overestimation issue in DDPG. Secondly, TD3 implements a delayed update mechanism, where the actor is updated after multiple updates to the critic, which reduces the fluctuation of the algorithm optimization, ensuring a more stable policy is obtained. Finally, introducing target policy smoothing, which injects noise into the target action space, promotes the network to explore diverse actions, which prevents the policy network from prematurely focusing on narrow action choices, enhancing sample efficiency and accelerating algorithm convergence.
Moreover, integrating the self-attention mechanism and LSTM into the critic and actor improves the long-dependence capture ability and interoperability of the network, allowing it to effectively capture temporal and contextual dependencies in real-time. By evaluating and weighing time series data, attention focuses on the data key information to conduct informed decision-making. Unlike traditional DRL methods, LT-TD3 utilizes LSTM to model long-term temporal relationships for vehicle mobility and task demands, allowing the system to predict and adjust strategy with greater stability under environmental fluctuations, which enhances its capability to dynamically adapt to vehicular environments. Furthermore, the self-attention mechanism allows the algorithm to dynamically focus on critical features and pay close attention to critical changes in the network environment, enhancing the decision-making of LT-TD3 in scenarios where multiple factors are rapidly changing. This combination enables LT-TD3 to offer more precise and efficient task-offloading decisions under highly dynamic conditions. The network parameter optimization details are described below.
Following the principles of DDQN, TD3 employs two critic networks in both the main and target networks. The smaller Q-value from the two critic networks is chosen to compute the target value y t , described as follows:
y t = R ( s t a t ) + m i n i = 1 , 2 Q θ i ( s t + 1 a ˜ t )
where a ˜ t denotes the combination of action a t and random noise, which is introduced to avoid falling into local optima during exploration of the action space. This is similar to the ε -greedy principles in the DQN algorithm. a ˜ t is specified as follows:
a ˜ t = π ϕ ( s t + 1 ) + c l i p ( η , τ , τ )
where η N 0 , ξ , τ denotes the cut-off value.
The actor network updates its parameters using deterministic policy gradients, which are expressed as follows:
ϕ J ( ϕ ) = 1 N n = 1 N ( a Q θ 1 ( s , a ) s = s t , a = a t ) ϕ π ϕ ( s t )
The parameters are updated according to the following formulation:
γ ω γ + ( 1 ω ) γ
ϕ ω ϕ + ( 1 ω ) ϕ
Considering the time-varying variables in the autonomous vehicle environments, this section proposes LT-TD3, a dependent task-offloading algorithm. LT-TD3 comprises three components: the main network, target network, and experience pool. Both the main network and target network consist of three deep neural networks: two critic networks and one actor. The actor maps the state in the autonomous vehicle cloud to a specific action (for example, the allocation of resource units), exploring and identifying the optimal policy. The two critic networks evaluate the performance of the current policy, providing feedback to support the actor learning process. Additionally, the self-attention mechanism and LSTM are integrated into both the critic and actor networks to capture long-term dependencies, which accelerate network convergence, enhancing system decision-making capabilities. The specific structure of the LT-TD3 algorithm is illustrated in Figure 3, and its pseudocode is provided in Algorithm 1.
Algorithm 1: LT-TD3 algorithm.
Input 
System state information
Output 
Resource unit allocation
1:
Initialize the experience pool D
2:
Initialize critic Q θ 1 , Q θ 2 and actor π ϕ with random parameters θ 1 , θ 2 , ϕ
3:
Initialize the target network θ 1 θ 1 , θ 2 θ 2 , ϕ ϕ
4:
for for each episode do
5:
    for  t = 1 to T do
6:
          Observe the state s t and reward R t .
7:
          Choose the action a t with exploration noise
8:
          Update state s t + 1
9:
          Store quadruple ( s t , a t , R t , s t + 1 ) in D
10:
        Sample mini-batch s ρ , a ρ , R ρ , s ρ + 1 from D
11:
        Calculate target Q:
12:
         y ρ = R ρ + m i n i = 1 , 2 Q θ i ( s ρ + 1 a ˜ ρ )
13:
        Update the parameters θ i :
14:
         θ i a r g m i n θ i 1 N ( y ρ Q θ i ( s ρ , a ρ ) ) 2
15:
        if t mode d = = 0  then
16:
           Update θ with the deterministic policy gradient:
17:
            ϕ J ( ϕ ) = 1 N n = 1 N ( a Q θ 1 ( s , a ) s = s t , a = a t )
18:
            ϕ π ϕ ( s t )
19:
           Updating parameters
20:
            γ ω γ + ( 1 ω ) γ
21:
            ϕ ω ϕ + ( 1 ω ) ϕ
22:
        end if
23:
    end for
24:
end for

5. Simulation Results and Analysis

5.1. Experimental Environments

Data collected using the VISSIM simulator was employed for the experiment. The experimental environment included Windows 10, an i7-6700HQ CPU, and 16 GB of memory. To enhance algorithm stability and speed up convergence, an adaptive cosine annealing learning rate strategy with warm-up training was used to guide network parameters progressively and smoothly toward optimal values. The learning rate was set at 0.01, with a discount factor δ and ε value of 0.9. Moreover, the memory and batch size were configured to 500 and 32, respectively, to reduce memory consumption during training, thus improving the efficiency of algorithm training. It is noteworthy that VISSIM offers a detailed and controllable environment for simulating traffic flow and network conditions, but it may not fully capture the complexities of real-world vehicular networks, including environmental interference, hardware variability, and unpredictable congestion patterns. Therefore, we integrated real-world vehicular environment data and added vehicular network conditions to VISSIM to further bridge the gap. Furthermore, the modular characteristics of the LT-TD3 allowed it to change components such as reward functions and resource constraints to adapt to alien scenarios without redesign.

5.2. Experimental Results and Analysis

To evaluate the performance of the proposed LT-TD3 algorithm, this paper compares it against foundational models such as TD3, the DDPG algorithm, and local-only processing—namely, no offloading. The experimental metrics include average task delay and task completion rate, which significantly impact the performance and user experience of real-world vehicles. Vehicle tasks with dependencies are generally more complex. A lower average task delay typically indicates better performance and faster task completion, ensuring that the vehicle receives processing results in a timely manner, especially in scenarios that require rapid responses, such as sudden obstacle avoidance and traffic signal recognition. This significantly improves the real-time performance and stability of the system. Similarly, the task completion rate is a crucial metric, measuring the percentage of tasks completed within their deadlines. A higher task completion rate reflects more efficient resource utilization and fewer tasks exceeding their maximum tolerable delay. A high task completion rate enables vehicles to continuously perform various computation-intensive tasks, including path planning, object detection, and driving decisions, thus enhancing the safety and reliability of driving.
Table 1 and Figure 4 illustrate the average task delay under varying numbers of vehicles in the autonomous vehicle cloud, consisting of 50% autonomous vehicles and 50% conventional vehicles. The experimental results indicate that an increasing number of vehicles makes the complexity of the state and action space rise, while the presence of more nearby vehicles introduces additional resource units, providing a larger resource pool for allocation. Compared to baselines, LT-TD3 can allocate the optimal number of resource units, effectively reducing the task delay with the increase in vehicles, where the average task delay is maintained at about 1 s. However, the worst delay of the baseline model reaches 2.90 s, which affects the decision-making efficiency of the system.
Table 2 and Figure 5 display the average task delay for different task data sizes in the autonomous vehicle cloud. The task data size is a critical factor affecting the average task delay—the larger the task data, the greater the transmission time and computing resources required, making the task more challenging to complete. In this experiment, the initial task size (100%) is set between 10 and 40 MB. As the task size increases, the average task latency for all algorithms rises. When the task size increases to 120%, the delay of baseline models is over 2 s. However, LT-TD3 consistently maintains a lower average task latency compared to the other algorithms, with the task delay being 1.59 s. This demonstrates that LT-TD3 effectively allocates an appropriate number of resource units, ensuring the timely completion of tasks even when handling large volumes of task data. Table 3 and Figure 6 illustrate the task completion rates under varying maximum tolerable delays for different tasks. The task completion rate reflects the proportion of tasks completed within the maximum acceptable delay, which is critical in determining whether a task can be completed on time. The experiment simulates different acceptable delay limits (with the initial acceptable delay (100%) set at 2–3 s). As the tolerable delay decreases, the task completion rate declines for all algorithms and the lowest completion rate in the baseline model reaches 49.6%. However, the LT-TD3 algorithm consistently maintains a 99.2% task completion rate compared to baselines. Conversely, as the acceptable delay increases, indicating greater tolerance for delay, the highest completion rate of baseline models improves to 98.5%, but it still falls short of the LT-TD3 algorithm. Despite lower available local resources, which limit task completion rates across all algorithms, LT-TD3 demonstrates the critical importance of efficient task offloading. In summary, LT-TD3 maintains a consistently high task completion rate, whether handling tasks with strict or relaxed delay requirements.
Table 4 and Figure 7 present the average task delay under varying resource availability at edge devices. This study focuses on vehicle tasks with dependencies, aiming to identify an optimal resource allocation strategy that ensures efficient task completion. The number of resource units provided by edge devices, such as those from roadside units (RSUs), traditional vehicles, and autonomous vehicles, is a critical factor influencing whether offloaded tasks can be completed within the required time. Initially, the number of resource units is set to 100%. As resources decrease, the average task delay rises across all algorithms. The average task delay of the baseline models all exceeded 2 s, while the LT-TD3 algorithm always maintained a lower delay, with an average task delay of 1.81 s. Even when edge device resources increased to 120%, the LT-TD3 algorithm still achieved the lowest average task delay of 1.03 s. while the baseline model had less improvement with the lowest delay of 1.49 s. These experimental results highlight that the LT-TD3 algorithm can dynamically adapt to different levels of edge device resources by selecting the optimal resource allocation strategy, ensuring lower task delay, improved vehicle safety, and enhanced user driving experience.
While the proposed LT-TD3 algorithm demonstrates promising results in optimizing task offloading and resource allocation in vehicular networks, the system performance may be degraded when confronting various extreme emergencies in large-scale networks with a high density of edge nodes and vehicles. Moreover, managing task dependencies using DAGs and optimizing them with topological sorting algorithms may not be optimal as the network size increases. In future work, we will enhance the LT-TD3 network by designing powerful feature extraction modules and integrating V2I and V2V networks to provide more options for the system; this expansion aims to broaden its applicability, especially in scenarios with high dynamic complexity. Furthermore, exploring alternative solutions for managing task dependencies will refine task dependency management, enabling more effective handling of intricate dependency structures, which contributes to greater system stability and adaptability.

6. Conclusions

Addressing the issue of delay and energy consumption in vehicle task offloading within the autonomous vehicle cloud environment, this paper utilizes RSUs and nearby vehicles (both traditional and autonomous) to form an autonomous driving network that provides resource units for task vehicles. Additionally, the task-offloading issue is formulated as a Markov decision process, and the LT-TD3 algorithm is proposed to solve it, which jointly optimizes delay and energy consumption under the constraint of limited resource unit availability. To further enhance agent performance and efficiency, LSTM and a self-attention mechanism are integrated into the LT-TD3 algorithm. This enables the network to capture long-term dependencies, which accelerates network convergence and enhances feature extraction capabilities. Considering the strongly dependent tasks, a topological sorting algorithm is employed to prioritize subtasks with dependencies, allowing for execution in a logical order, enhancing the efficiency of task scheduling, and minimizing processing delays and energy consumption. Comprehensive experiments demonstrate that the average task delay and completion rate of LT-TD3 outperform those of baseline models under various conditions. In the future, we will further enhance the algorithm by designing reward functions and resource constraints tailored for various autonomous environments. This will be validated in diverse settings, including rural areas with sparse edge nodes and urban scenarios with high communication interference, with the aim to increase the robustness and flexibility of the task-offloading strategy to meet the demands of high-bandwidth, low-latency applications required by autonomous vehicles. Moreover, the development of 5G and 6G technologies allows the algorithm to be extended to leverage these networks, further improving offloading efficiency and reducing task latency. Increasing compatibility with different vehicle communication standards and handling interoperability with transportation systems to ensure that the LT-TD3 can be readily deployed in real-world networks is also a challenge we will explore in the future.

Author Contributions

Writing—original draft, Y.Z.; writing—review and editing, X.P. and X.Z.; project administration, C.Z.; supervision, W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported in part by the Natural Science Foundation of Liaoning Province (grant No. 2024-bs-102), the Research Program of the Liaoning Liaohe Laboratory (grant No. LLL24KF-01-01), and the Basic Scientific Research Project of the Education Department of Liaoning Province (grant No. LJ222410142043).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data are unavailable due to privacy concerns.

Acknowledgments

The authors are thankful to the anonymous reviewers and editors for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hildebrand, B.; Baza, M.; Salman, T.; Tabassum, S.; Konatham, B.; Amsaad, F.; Razaque, A. A comprehensive review on blockchains for Internet of Vehicles: Challenges and directions. Comput. Sci. Rev. 2023, 48, 100547. [Google Scholar] [CrossRef]
  2. Jameel, F.; Chang, Z.; Huang, J.; Ristaniemi, T. Internet of autonomous vehicles: Architecture, features, and socio-technological challenges. IEEE Wirel. Commun. 2019, 26, 21–29. [Google Scholar] [CrossRef]
  3. Cui, Y.; Li, H.; Zhang, D.; Zhu, A.; Li, Y.; Qiang, H. Multiagent reinforcement learning-based cooperative multitype task offloading strategy for internet of vehicles in B5G/6G network. IEEE Internet Things J. 2023, 10, 12248–12260. [Google Scholar] [CrossRef]
  4. Sun, Y.; Wu, Z.; Meng, K.; Zheng, Y. Vehicular task offloading and job scheduling method based on cloud-edge computing. IEEE Trans. Intell. Transp. Syst. 2023, 24, 14651–14662. [Google Scholar] [CrossRef]
  5. Mahdizadeh, M.; Montazerolghaem, A.; Jamshidi, K. Task Scheduling and Load Balancing in SDN-based Cloud Computing: A Review of Relevant Research. J. Eng. Res. 2024; in press. [Google Scholar] [CrossRef]
  6. Zhuang, W.; Ye, Q.; Lyu, F.; Cheng, N.; Ren, J. SDN/NFV-empowered future IoV with enhanced communication, computing, and caching. Proc. IEEE 2019, 108, 274–291. [Google Scholar] [CrossRef]
  7. Jangjou, M.; Sohrabi, M.K. A comprehensive survey on security challenges in different network layers in cloud computing. Arch. Comput. Methods Eng. 2022, 29, 3587–3608. [Google Scholar] [CrossRef]
  8. Liu, L.; Chen, C.; Pei, Q.; Maharjan, S.; Zhang, Y. Vehicular edge computing and networking: A survey. Mob. Netw. Appl. 2021, 26, 1145–1168. [Google Scholar] [CrossRef]
  9. Bai, J.; Gui, J.; Huang, G.; Dong, M.; Wang, T.; Zhang, S.; Liu, A. A lowcost UAV task offloading scheme based on trustable and trackable data routing. IEEE Trans. Intell. Veh. 2023; early access. [Google Scholar] [CrossRef]
  10. Liu, L.; Feng, J.; Pei, Q.; Chen, C.; Ming, Y.; Shang, B.; Dong, M. Blockchain-enabled secure data sharing scheme in mobile-edge computing: An asynchronous advantage actor–critic learning approach. IEEE Internet Things J. 2020, 8, 2342–2353. [Google Scholar] [CrossRef]
  11. Montazerolghaem, A.; Yaghmaee, M.H.; Leon-Garcia, A. Green Cloud Multimedia Networking: NFV/SDN based Energy-efficient Resource Allocation. IEEE Trans. Green Commun. Netw. 2020, 4, 873–889. [Google Scholar] [CrossRef]
  12. Salehnia, T.; Montazerolghaem, A.; Mirjalili, S.; Khayyambashi, M.R.; Abualigah, L. SDN-based optimal task scheduling method in Fog-IoT network using combination of AO and WOA. In Handbook of Whale Optimization Algorithm; Elsevier: Amsterdam, The Netherlands, 2024; pp. 109–128. [Google Scholar]
  13. Montazerolghaem, A. Efficient resource allocation for multimedia streaming in software-defined internet of vehicles. IEEE Trans. Intell. Transp. Syst. 2023, 24, 14718–14731. [Google Scholar] [CrossRef]
  14. Guo, J.; Kurup, U.; Shah, M. Is it safe to drive? An overview of factors, metrics, and datasets for driveability assessment in autonomous driving. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3135–3151. [Google Scholar] [CrossRef]
  15. Guo, H.; Liu, J.; Lv, J. Toward intelligent task offloading at the edge. IEEE Netw. 2019, 34, 128–134. [Google Scholar] [CrossRef]
  16. Chen, R.; Fan, Y.; Yuan, S.; Hao, Y. Vehicle Collaborative Partial Offloading Strategy in Vehicular Edge Computing. Mathematics 2024, 12, 1466. [Google Scholar] [CrossRef]
  17. Liu, J.; Ahmed, M.; Mirza, M.A.; Khan, W.U.; Xu, D.; Li, J.; Aziz, A.; Han, Z. RL/DRL meets vehicular task offloading using edge and vehicular cloudlet: A survey. IEEE Internet Things J. 2022, 9, 8315–8338. [Google Scholar] [CrossRef]
  18. Sun, D.; Chen, Y.; Li, H. Intelligent Vehicle Computation Offloading in Vehicular Ad Hoc Networks: A Multi-Agent LSTM Approach with Deep Reinforcement Learning. Mathematics 2024, 12, 424. [Google Scholar] [CrossRef]
  19. Imanpour, S.; Montazerolghaem, A.; Afshari, S. Load Balancing of Servers in Software-defined Internet of Multimedia Things using the Long Short-Term Memory Prediction Algorithm. In Proceedings of the 2024 10th International Conference on Web Research (ICWR), Tehran, Iran, 24–25 April 2024; pp. 291–296. [Google Scholar]
  20. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.u.; Polosukhin, I. Attention is All you Need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
  21. Zeng, F.; Chen, Q.; Meng, L.; Wu, J. Volunteer assisted collaborative offloading and resource allocation in vehicular edge computing. IEEE Trans. Intell. Transp. Syst. 2020, 22, 3247–3257. [Google Scholar] [CrossRef]
  22. Zhu, C.; Tao, J.; Pastor, G.; Xiao, Y.; Ji, Y.; Zhou, Q.; Li, Y.; Ylä-Jääski, A. Folo: Latency and quality optimized task allocation in vehicular fog computing. IEEE Internet Things J. 2018, 6, 4150–4161. [Google Scholar] [CrossRef]
  23. Fan, W.; Su, Y.; Liu, J.; Li, S.; Huang, W.; Wu, F.; Liu, Y. Joint task offloading and resource allocation for vehicular edge computing based on V2I and V2V modes. IEEE Trans. Intell. Transp. Syst. 2023, 24, 4277–4292. [Google Scholar] [CrossRef]
  24. Zhu, A.; Wen, Y. Computing offloading strategy using improved genetic algorithm in mobile edge computing system. J. Grid Comput. 2021, 19, 38. [Google Scholar] [CrossRef]
  25. Li, H.; Xu, H.; Zhou, C.; Lü, X.; Han, Z. Joint optimization strategy of computation offloading and resource allocation in multi-access edge computing environment. IEEE Trans. Veh. Technol. 2020, 69, 10214–10226. [Google Scholar] [CrossRef]
  26. Huang, P.Q.; Wang, Y.; Wang, K.; Liu, Z.Z. A bilevel optimization approach for joint offloading decision and resource allocation in cooperative mobile edge computing. IEEE Trans. Cybern. 2019, 50, 4228–4241. [Google Scholar] [CrossRef]
  27. Wang, W.; Deng, H.; Sun, M.; Pan, Z. A cloud-connected autonomous driving system. In Proceedings of the 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, 10–13 April 2020; pp. 96–102. [Google Scholar]
  28. Ning, Z.; Dong, P.; Wang, X.; Rodrigues, J.J.; Xia, F. Deep reinforcement learning for vehicular edge computing: An intelligent offloading system. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–24. [Google Scholar] [CrossRef]
  29. Sun, F.; Cheng, N.; Zhang, S.; Zhou, H.; Gui, L.; Shen, X. Reinforcement learning based computation migration for vehicular cloud computing. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–6. [Google Scholar]
  30. Qi, Q.; Wang, J.; Ma, Z.; Sun, H.; Cao, Y.; Zhang, L.; Liao, J. Knowledge-driven service offloading decision for vehicular edge computing: A deep reinforcement learning approach. IEEE Trans. Veh. Technol. 2019, 68, 4192–4203. [Google Scholar] [CrossRef]
  31. Lee, S.S.; Lee, S. Resource allocation for vehicular fog computing using reinforcement learning combined with heuristic information. IEEE Internet Things J. 2020, 7, 10450–10464. [Google Scholar] [CrossRef]
  32. Wang, J.; Hu, J.; Min, G.; Zomaya, A.Y.; Georgalas, N. Fast adaptive task offloading in edge computing based on meta reinforcement learning. IEEE Trans. Parallel Distrib. Syst. 2020, 32, 242–253. [Google Scholar] [CrossRef]
  33. Tang, M.; Wong, V.W. Deep reinforcement learning for task offloading in mobile edge computing systems. IEEE Trans. Mob. Comput. 2020, 21, 1985–1997. [Google Scholar] [CrossRef]
  34. Cao, Z.; Deng, X.; Yue, S.; Jiang, P.; Ren, J.; Gui, J. Dependent Task Offloading in Edge Computing Using GNN and Deep Reinforcement Learning. IEEE Internet Things J. 2024, 11, 21632–21646. [Google Scholar] [CrossRef]
  35. Wang, S.; Bi, S.; Zhang, Y.J.A. Deep reinforcement learning with communication transformer for adaptive live streaming in wireless edge networks. IEEE J. Sel. Areas Commun. 2021, 40, 308–322. [Google Scholar] [CrossRef]
  36. Gao, Z.; Yang, L.; Dai, Y. Fast adaptive task offloading and resource allocation via multiagent reinforcement learning in heterogeneous vehicular fog computing. IEEE Internet Things J. 2022, 10, 6818–6835. [Google Scholar] [CrossRef]
  37. Lu, H.; Gu, C.; Luo, F.; Ding, W.; Liu, X. Optimization of lightweight task offloading strategy for mobile edge computing based on deep reinforcement learning. Future Gener. Comput. Syst. 2020, 102, 847–861. [Google Scholar] [CrossRef]
  38. Cheng, Y.; Vijayaraj, A.; Pokkuluri, K.S.; Salehnia, T.; Montazerolghaem, A.; Rateb, R. Vehicular Fog Resource Allocation Approach for VANETs Based on Deep Adaptive Reinforcement Learning Combined with Heuristic Information. IEEE Access 2024, 12, 139056–139075. [Google Scholar] [CrossRef]
  39. Liu, Z.; Huang, L.; Gao, Z.; Luo, M.; Hosseinalipour, S.; Dai, H. GA-DRL: Graph Neural Network-Augmented Deep Reinforcement Learning for DAG Task Scheduling over Dynamic Vehicular Clouds. IEEE Trans. Netw. Serv. Manag. 2024, 21, 4226–4242. [Google Scholar] [CrossRef]
  40. Qiu, B.; Wang, Y.; Xiao, H.; Zhang, Z. Deep Reinforcement Learning-Based Adaptive Computation Offloading and Power Allocation in Vehicular Edge Computing Networks. IEEE Trans. Intell. Transp. Syst. 2024, 25, 13339–13349. [Google Scholar] [CrossRef]
  41. Liao, Z.; Yuan, C.; Zheng, B.; Tang, X. An Adaptive Deployment Scheme of Unmanned Aerial Vehicles in Dynamic Vehicle Networking for Complete Offloading. IEEE Internet Things J. 2024, 11, 23509–23520. [Google Scholar] [CrossRef]
  42. Tang, H.; Du, M.; Wu, H.; Jiao, P. Link Topology-Adaptive Offloading Method On Vehicular Edge Computing. In Proceedings of the IEEE INFOCOM 2024-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vancouver, BC, Canada, 20 May 2024; pp. 1–2. [Google Scholar]
  43. Yang, Y.; Shao, C.; Zuo, J.; Shi, C. Energy Efficient Algorithm for Multi-User Adaptive Edge Computing Offloading in Vehicular Networks Based on Meta Reinforcement Learning. In Proceedings of the 2024 7th World Conference on Computing and Communication Technologies (WCCCT), Chengdu, China, 12–14 April 2024; pp. 250–254. [Google Scholar]
  44. Lillicrap, T. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
  45. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
  46. Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
Figure 1. Autonomous vehicle cloud offloading scenario.
Figure 1. Autonomous vehicle cloud offloading scenario.
Mathematics 12 03820 g001
Figure 2. Subtask dependency. (a) Example AR application model. (b) DAG applies subtask dependencies.
Figure 2. Subtask dependency. (a) Example AR application model. (b) DAG applies subtask dependencies.
Mathematics 12 03820 g002
Figure 3. LT-TD3 Algorithm architecture diagram.
Figure 3. LT-TD3 Algorithm architecture diagram.
Mathematics 12 03820 g003
Figure 4. Experimental results of the average task delay for varying numbers of vehicles in the vehicle cloud.
Figure 4. Experimental results of the average task delay for varying numbers of vehicles in the vehicle cloud.
Mathematics 12 03820 g004
Figure 5. Experimental results of the average task delay in the vehicle cloud by data size.
Figure 5. Experimental results of the average task delay in the vehicle cloud by data size.
Mathematics 12 03820 g005
Figure 6. Experimental results of the task completion rate under different acceptable delays in vehicle cloud.
Figure 6. Experimental results of the task completion rate under different acceptable delays in vehicle cloud.
Mathematics 12 03820 g006
Figure 7. Experimental results of the average task delay under different edge device resources in the vehicle cloud.
Figure 7. Experimental results of the average task delay under different edge device resources in the vehicle cloud.
Mathematics 12 03820 g007
Table 1. Comparison of the average task delays for varying numbers of vehicles in the vehicle cloud.
Table 1. Comparison of the average task delays for varying numbers of vehicles in the vehicle cloud.
Algorithm4 Vehicles8 Vehicles10 Vehicles16 Vehicles
DQN2.76 s2.79 s1.90 s1.46 s
PPO2.29 s2.90 s2.27 s2.13 s
DDPG2.22 s2.85 s2.08 s1.78 s
Local only4.72 s4.72 s4.72 s4.72 s
TD31.33 s1.37 s1.39 s1.28 s
LT-TD30.64 s0.83 s1.10 s1.01 s
Table 2. Comparison of the average task delay in the vehicle cloud by data size.
Table 2. Comparison of the average task delay in the vehicle cloud by data size.
Algorithm70% × [10, 40] Mb100% × [10, 40] Mb120% × [10, 40] Mb
DQN2.53 s1.90 s2.02 s
PPO1.87 s2.27 s2.72 s
DDPG1.7 s2.08 s2.65 s
Local only2.86 s4.72 s5.58 s
TD31.62 s1.39 s2.08 s
LT-TD30.81 s1.10 s1.59 s
Table 3. Comparison of task completion rate under different acceptable delays in the vehicle cloud.
Table 3. Comparison of task completion rate under different acceptable delays in the vehicle cloud.
Algorithm70% × [2, 3] s100% × [2, 3] s120% × [2, 3] s
DQN59.6%75.3%82.2%
PPO57.5%70.9%86.3%
DDPG49.6%66%82.7%
Local only16.3%16.4%16.9%
TD380.3%95.4%98.5%
LT-TD399.2%98.4%99.1%
Table 4. Comparison of the average task delay under different edge device resources in the vehicle cloud.
Table 4. Comparison of the average task delay under different edge device resources in the vehicle cloud.
Algorithm70% Resources100% Resources120% Resources
DQN2.12 s1.90 s1.80 s
PPO2.93 s2.27 s1.70 s
DDPG2.58 s2.08 s2.06 s
Local only4.72 s4.72 s4.72 s
TD32.02 s1.39 s1.49 s
LT-TD31.81 s1.10 s1.03 s
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Peng, X.; Zhang, Y.; Zhang, X.; Zhang, C.; Yang, W. Collaborative Optimization Strategy for Dependent Task Offloading in Vehicular Edge Computing. Mathematics 2024, 12, 3820. https://doi.org/10.3390/math12233820

AMA Style

Peng X, Zhang Y, Zhang X, Zhang C, Yang W. Collaborative Optimization Strategy for Dependent Task Offloading in Vehicular Edge Computing. Mathematics. 2024; 12(23):3820. https://doi.org/10.3390/math12233820

Chicago/Turabian Style

Peng, Xiting, Yandi Zhang, Xiaoyu Zhang, Chaofeng Zhang, and Wei Yang. 2024. "Collaborative Optimization Strategy for Dependent Task Offloading in Vehicular Edge Computing" Mathematics 12, no. 23: 3820. https://doi.org/10.3390/math12233820

APA Style

Peng, X., Zhang, Y., Zhang, X., Zhang, C., & Yang, W. (2024). Collaborative Optimization Strategy for Dependent Task Offloading in Vehicular Edge Computing. Mathematics, 12(23), 3820. https://doi.org/10.3390/math12233820

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop