Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Multi-Agent DRL-Based Dynamic Task Offloading in D2D-MEC Network to Minimize Average Task Delay with Deadline Constraints

Version 1 : Received: 7 June 2024 / Approved: 7 June 2024 / Online: 7 June 2024 (10:51:36 CEST)

A peer-reviewed article of this Preprint also exists.

He, H.; Yang, X.; Mi, X.; Shen, H.; Liao, X. Multi-Agent Deep Reinforcement Learning Based Dynamic Task Offloading in a Device-to-Device Mobile-Edge Computing Network to Minimize Average Task Delay with Deadline Constraints. Sensors 2024, 24, 5141. He, H.; Yang, X.; Mi, X.; Shen, H.; Liao, X. Multi-Agent Deep Reinforcement Learning Based Dynamic Task Offloading in a Device-to-Device Mobile-Edge Computing Network to Minimize Average Task Delay with Deadline Constraints. Sensors 2024, 24, 5141.

Abstract

Device to Device (D2D) is a pivotal technology in the next generation of communication, allowing for direct task offloading between mobile devices (MDs) to improve the efficient utilization of idle resources. This paper proposes a novel algorithm for dynamic task offloading between active MDs and idle MDs in the D2D-MEC (Mobile Edge Computing) system by deploying multi-agent deep reinforcement learning (DRL) to minimize the long-term average delay of delay-sensitive tasks under deadline constraints. Our core innovation is a dynamic partitioning scheme for idle and active devices in the D2D-MEC system, accounting for stochastic task arrivals and multi-time-slot task execution, which has been insufficiently addressed in existing literature. We adopt a queue-based system to formulate a dynamic task offloading optimization problem. To address the challenges of large action space and coupling of actions across time slots, we model the problem as a Markov Decision Process (MDP) and perform the multi-agent DRL through Multi-Agent Proximal Policy Optimization (MAPPO). We employ a centralized training with decentralized execution (CTDE) framework to enable each MD to make offloading decisions solely based on its local system state. Extensive simulations demonstrate the efficiency and fast convergence of our algorithm. In comparison with the existing sub-optimal results deploying single-agent DRL, our algorithm reduces the average task completion delay by 11.0% and the ratio of dropped tasks by 17.0%.

Keywords

mobile edgecomputing; dynamicmatching; D2D; delayconstraint; multi-agentreinforcementlearning

Subject

Computer Science and Mathematics, Computer Networks and Communications

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.