Proximal Policy Optimization-Based Hierarchical Decision-Making Mechanism for Resource Allocation Optimization in UAV Networks
Abstract
:1. Introduction
2. Related Works
3. System Model
3.1. Network Model
3.2. Link Model
3.3. Problem Formulation
4. Time-Frequency Resource Allocation Based on PPO
4.1. Algorithm Formulation
Algorithm 1 The PPO-based Algorithm for Multi-UAV Systems Resource Allocation |
|
4.1.1. State Space
4.1.2. Action Space
4.1.3. Reward Function
4.2. Proximal Policy Optimization
4.3. Time-Frequency Resource Allocation Based on PPO
5. Simulation Results
5.1. Simulation Settings
5.2. Performance Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
UAV | Unmanned aerial vehicle |
PPO | Proximal Policy Optimization |
BS | Base Stations |
UAV-BS | unmanned aerial vehicle base station |
References
- Shakhatreh, H.; Sawalmeh, A.H.; Al-Fuqaha, A.; Dou, Z.; Almaita, E.; Khalil, I.; Othman, N.S.; Khreishah, A. and Guizani, M. Unmanned Aerial Vehicles (UAVs): A Survey on Civil Applications and Key Research Challenges. IEEE Access 2019, 7, 48572–48634. [Google Scholar] [CrossRef]
- Shahzadi, R.; Ali, M.; Khan, H.Z.; Naeem, M. UAV Assisted 5G and Beyond Wireless Networks: A Survey. J. Netw. Comput. Appl. 2021, 189, 103114. [Google Scholar] [CrossRef]
- Gu, X.; Zhang, G. A Survey on UAV-Assisted Wireless Communications: Recent Advances and Future Trends. Comput. Commun. 2023, 208, 44–78. [Google Scholar] [CrossRef]
- Jasim, M.A.; Shakhatreh, H.; Siasi, N.; Sawalmeh, A.H.; Aldalbahi, A.; Al-Fuqaha, A. A Survey on Spectrum Management for Unmanned Aerial Vehicles (UAVs). IEEE Access 2022, 10, 11443–11499. [Google Scholar] [CrossRef]
- Zhou, L.; Leng, S.; Wang, Q.; Quek, T.Q.S.; Guizani, M. Cooperative Digital Twins for UAV-Based Scenarios. IEEE Commun. Mag. 2024. [Google Scholar] [CrossRef]
- Bithas, P.S.; Michailidis, E.T.; Nomikos, N.; Vouyioukas, D.; Kanatas, A.G. A Survey on Machine-Learning Techniques for UAV-Based Communications. Sensors 2019, 19, 5170. [Google Scholar] [CrossRef]
- Razzaq, S.; Xydeas, C.; Mahmood, A.; Ahmed, S.; Ratyal, N.I.; Iqbal, J. Efficient optimization techniques for resource allocation in UAVs mission framework. PLoS ONE 2023, 18, e0283923. [Google Scholar] [CrossRef]
- Emami, Y.; Gao, H.; Li, K.; Almeida, L.; Tovar, E.; Han, Z. Age of Information Minimization Using Multi-Agent UAVs Based on AI-Enhanced Mean Field Resource Allocation. IEEE Trans. Veh. Technol. 2024, 73, 13368–13380. [Google Scholar] [CrossRef]
- Qi, W.; Song, Q.; Guo, L.; Jamalipour, A. Energy-Efficient Resource Allocation for UAV-Assisted Vehicular Networks With Spectrum Sharing. IEEE Trans. Veh. Technol. 2022, 71, 7691–7702. [Google Scholar] [CrossRef]
- Zhou, X.; Lin, Y.; Tu, Y.; Mao, S.; Dou, Z. Dynamic Channel Allocation for Multi-UAVs: A Deep Reinforcement Learning Approach. Proceedings of IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar]
- Chen, T.; Dong, F.; Ye, H.; Wang, Y.; Wu, B. Data Collection Mechanism for UAV-Assisted Cellular Network Based on PPO. Electronics 2023, 12, 1376. [Google Scholar] [CrossRef]
- Morozs, N.; Clarke, T.; Grace, D. Distributed Heuristically Accelerated Q-Learning for Robust Cognitive Spectrum Management in LTE Cellular Systems. IEEE Trans. Mob. Comput. 2016, 15, 817–825. [Google Scholar] [CrossRef]
- Gao, Z.; Wen, B.; Huang, L.; Chen, C.; Su, Z. Q-Learning-Based Power Control for LTE Enterprise Femtocell Networks. IEEE Syst. J. 2016, 11, 2699–2707. [Google Scholar] [CrossRef]
- Naparstek, O.; Cohen, K. Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access. IEEE Trans. Wirel. Commun. 2019, 18, 310–323. [Google Scholar] [CrossRef]
- Zhang, L.; Zhao, H.; Hou, S.; Zhao, Z.; Xu, H.; Wu, X.; Wu, Q.; Zhang, R. A Survey on 5G Millimeter Wave Communications for UAV-Assisted Wireless Networks. IEEE Access 2019, 7, 117460–117504. [Google Scholar] [CrossRef]
- Wang, B.; Ji, Z.; Liu, K.R.; Clancy, T.C. Primary-Prioritized Markov Approach for Dynamic Spectrum Allocation. IEEE Trans. Wirel. Commun. 2009, 8, 1854–1865. [Google Scholar] [CrossRef]
- Tu, W. Efficient Resource Utilization for Multi-Flow Wireless Multicasting Transmissions. IEEE J. Sel. Areas Commun. 2012, 30, 1246–1258. [Google Scholar] [CrossRef]
- Deb, S.; Chaporkar, P.; Karandikar, A. Stability Analysis of Device-to-Device Relay Assisted Cellular Networks. arXiv 2018, arXiv:1808.03881. [Google Scholar]
- Tu, W. Resource-efficient seamless transitions for high-performance multi-hop UAV multicasting. Comput. Netw. 2022, 213, 109051. [Google Scholar] [CrossRef]
- Tu, W. Efficient Wireless Multimedia Multicast in Multi-Rate Multi-Channel Mesh Networks. IEEE Trans. Signal Inf. Process. Over Netw. 2016, 2, 376–390. [Google Scholar] [CrossRef]
- Wang, H.; Wang, J.; Ding, G.; Xue, Z.; Zhang, L.; Xu, Y. Robust Spectrum Sharing in Air-Ground Integrated Networks: Opportunities and Challenges. IEEE Wirel. Commun. 2020, 27, 148–155. [Google Scholar] [CrossRef]
- Chen, C.; Song, M.; Xin, C.; Backens, J. A Game-Theoretical Anti-Jamming Scheme for Cognitive Radio Networks. IEEE Netw. 2013, 27, 22–27. [Google Scholar] [CrossRef]
- Yao, F.; Jia, L. A Collaborative Multi-Agent Reinforcement Learning Anti-Jamming Algorithm in Wireless Networks. IEEE Wirel. Commun. Lett. 2019, 8, 1024–1027. [Google Scholar] [CrossRef]
Parameter | Value |
---|---|
Lower frequency | 1440 MHz |
Number of iterations | 900 |
Upper frequency | 1443 MHz |
Policynet learning rate | |
Number of time slots | 10 |
Valuenet learning rate | |
Transmission power | 1 W |
Clipping parameter | 0.2 |
Area of the region | |
Training epochs | 10 |
Power spectral density | W/Hz |
Number of hidden layers | 256 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, K.; Yang, J.; Li, J.; Yang, B.; Ding, S. Proximal Policy Optimization-Based Hierarchical Decision-Making Mechanism for Resource Allocation Optimization in UAV Networks. Electronics 2025, 14, 747. https://doi.org/10.3390/electronics14040747
Sun K, Yang J, Li J, Yang B, Ding S. Proximal Policy Optimization-Based Hierarchical Decision-Making Mechanism for Resource Allocation Optimization in UAV Networks. Electronics. 2025; 14(4):747. https://doi.org/10.3390/electronics14040747
Chicago/Turabian StyleSun, Kun, Jianyong Yang, Jinglei Li, Bo Yang, and Shuman Ding. 2025. "Proximal Policy Optimization-Based Hierarchical Decision-Making Mechanism for Resource Allocation Optimization in UAV Networks" Electronics 14, no. 4: 747. https://doi.org/10.3390/electronics14040747
APA StyleSun, K., Yang, J., Li, J., Yang, B., & Ding, S. (2025). Proximal Policy Optimization-Based Hierarchical Decision-Making Mechanism for Resource Allocation Optimization in UAV Networks. Electronics, 14(4), 747. https://doi.org/10.3390/electronics14040747