research-article

Multi-Agent Deep Reinforcement Learning for Coordinated Multipoint in Mobile Networks

Authors:

Stefan Schneider,

Artur HeckerAuthors Info & Claims

IEEE Transactions on Network and Service Management, Volume 21, Issue 1

Pages 908 - 924

https://doi.org/10.1109/TNSM.2023.3300962

Published: 01 August 2023 Publication History

Abstract

Macrodiversity is a key technique to increase the capacity of mobile networks. It can be realized using coordinated multipoint (CoMP), simultaneously connecting users to multiple overlapping cells. Selecting which users to serve by how many and which cells is NP-hard but needs to happen continuously in real time as users move and channel state changes. Existing approaches often require strict assumptions about or perfect knowledge of the underlying radio system, its resource allocation scheme, or user movements, none of which is readily available in practice. Instead, we propose three novel self-learning and self-adapting approaches using model-free deep reinforcement learning (DRL): DeepCoMP, DD-CoMP, and D3-CoMP. DeepCoMP leverages central control and observations of all users to select cells almost optimally. DD-CoMP and D3-CoMP use multi-agent DRL, which allows distributed, robust, and highly scalable coordination. All three approaches learn from experience and self-adapt to varying scenarios, reaching 2x higher Quality of Experience than other approaches. They have very few built-in assumptions and do not need prior system knowledge, making them more robust to change and better applicable in practice than existing approaches.

References

[1]

“LTE release 11,” 3GPP, Sophia Antipolis, France, 3GPP Rep. TR 36.819, version 11.1.0, 2012. [Online]. Available: https://www.3gpp.org/specifications/releases/69-release-11

[2]

Y. Al-Eryani and E. Hossain, “The D-OMA method for massive multiple access in 6G: Performance, security, and challenges,” IEEE Veh. Technol. Mag., vol. 14, no. 3, pp. 92–99, Sep. 2019.

[3]

M. Fiedler and T. Hoßfeld, “Quality of experience-related differential equations and provisioning-delivery hysteresis,” in ITC Specialist Seminar on Multimedia Applications-Traffic, Performance and QoE. Tokyo, Japan: IEICE, 2010.

[4]

S. Khirman and P. Henriksen, “Relationship between quality-of-service and quality-of-experience for public Internet service,” in Proc. Workshop Passive Act. Meas., 2002, pp. 1–6.

[5]

D. Amzallag, R. Bar-Yehuda, D. Raz, and G. Scalosub, “Cell selection in 4G cellular networks,” IEEE Trans. Mobile Comput., vol. 12, no. 7, pp. 1443–1455, Jul. 2013.

Digital Library

[6]

R. Vijayarani and L. Nithyanandan, “Dynamic cooperative base station selection scheme for downlink CoMP in LTE-advanced networks,” Wireless Pers. Commun., vol. 92, no. 2, pp. 667–679, 2017.

Digital Library

[7]

P. Marsch and G. Fettweis, “Static clustering for cooperative multi-point (CoMP) in mobile communications,” in Proc. IEEE Int. Conf. Commun. (ICC), 2011, pp. 1–6.

[8]

A. Beylerian and T. Ohtsuki, “Multi-point fairness in resource allocation for C-RAN downlink CoMP transmission,” EURASIP J. Wireless Commun. Netw., vol. 2016, pp. 1–10, 2016.

[9]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 2018.

Digital Library

[10]

X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji, and M. Bennis, “Optimized computation offloading performance in virtual edge computing systems via deep reinforcement learning,” IEEE Internet Things J., vol. 6, no. 3, pp. 4005–4018, Jun. 2019.

[11]

L. You and D. Yuan, “Joint CoMP-cell selection and resource allocation in fronthaul-constrained C-RAN,” in Proc. Int. Symp. Model. Optim. Mobile Ad Hoc, Wireless Netw. (WiOpt), 2017, pp. 1–6.

[12]

Y. Zhou, C. Shen, and M. van der Schaar, “A non-stationary Online learning approach to mobility management,” IEEE Trans. Wireless Commun., vol. 18, no. 2, pp. 1434–1446, Feb. 2019.

Digital Library

[13]

R. Thomas. “Google’s AutoML: Cutting through the hype.” 2018. Accessed: Jun. 15, 2021. https://www.fast.ai/2018/07/23/auto-ml-3/

[14]

M. A. Alsheikh, D. T. Hoang, D. Niyato, H.-P. Tan, and S. Lin, “Markov decision processes with applications in wireless sensor networks: A survey,” IEEE Commun. Surveys Tuts., vol. 17, no. 3, pp. 1239–1267, 3rd Quart., 2015.

Digital Library

[15]

G. Dulac-Arnoldet al., “Challenges of real-world reinforcement learning: Definitions, benchmarks and analysis,” Mach. Learn., vol. 110, no. 9, pp. 2419–2468, 2021.

Digital Library

[16]

C. Zhang and V. Lesser, “Coordinating multi-agent reinforcement learning with limited communication,” in Proc. Int. Conf. Auton. Agents Multi-Agent Syst., 2013, pp. 1101–1108.

[17]

J. K. Gupta, M. Egorov, and M. Kochenderfer, “Cooperative multi-agent control using deep reinforcement learning,” in Proc. Int. Conf. Auton. Agents Multiagent Syst., 2017, pp. 66–83.

[18]

S. Schneider. “DeepCoMP, DD-CoMP, and D3-CoMP GitHub repository.” 2021. [Online]. Available: https://github.com/CN-UPB/DeepCoMP

[19]

D. Xenakis, N. Passas, L. Merakos, and C. Verikoukis, “Mobility management for femtocells in LTE-advanced: Key aspects and survey of handover decision algorithms,” IEEE Commun. Surveys Tuts., vol. 16, no. 1, pp. 64–91, 1st Quart., 2014.

[20]

F. Giust, L. Cominardi, and C. J. Bernardos, “Distributed mobility management for future 5G networks: Overview and analysis of existing approaches,” IEEE Commun. Mag., vol. 53, no. 1, pp. 142–149, Jan. 2015.

Digital Library

[21]

S. Mosleh, L. Liu, and J. Zhang, “Proportional-fair resource allocation for coordinated multi-point transmission in LTE-advanced,” IEEE Trans. Wireless Commun., vol. 15, no. 8, pp. 5355–5367, Aug. 2016.

Digital Library

[22]

Y. Mao, J. Zhang, S. Song, and K. B. Letaief, “Stochastic joint radio and computational resource management for multi-user mobile-edge computing systems,” IEEE Trans. Wireless Commun., vol. 16, no. 9, pp. 5994–6009, Sep. 2017.

Digital Library

[23]

A. Tolliet al., “Distributed coordinated transmission with forward-backward training for 5G radio access,” IEEE Commun. Mag., vol. 57, no. 1, pp. 58–64, Jan. 2019.

Digital Library

[24]

F. Qamar, K. B. Dimyati, M. N. Hindia, K. A. B. Noordin, and A. M. Al-Samman, “A comprehensive review on coordinated multi-point operation for LTE-A,” Comput. Netw., vol. 123, pp. 19–37, Aug. 2017.

[25]

H. Kim and Y. Han, “A proportional fair scheduling for multicarrier transmission systems,” IEEE Commun. Lett., vol. 9, no. 3, pp. 210–212, Mar. 2005.

[26]

Y. Xiao, J. Liu, J. Wu, and N. Ansari, “Leveraging deep reinforcement learning for traffic engineering: A survey,” IEEE Commun. Surveys Tuts., vol. 23, no. 4, pp. 2064–2097, 4th Quart., 2021.

[27]

J. Zhang, M. Ye, Z. Guo, C.-Y. Yen, and H. J. Chao, “CFR-RL: Traffic engineering with reinforcement learning in SDN,” IEEE J. Sel. Areas Commun., vol. 38, no. 10, pp. 2249–2259, Oct. 2020.

[28]

Z. Xu, K. Wu, W. Zhang, J. Tang, Y. Wang, and G. Xue, “PnP-DRL: A plug-and-play deep reinforcement learning approach for experience-driven networking,” IEEE J. Sel. Areas Commun., vol. 39, no. 8, pp. 2476–2486, Aug. 2021.

[29]

Z. Mammeri, “Reinforcement learning based routing in networks: Review and classification of approaches,” IEEE Access, vol. 7, pp. 55916–55950, 2019.

[30]

P. Sun, Z. Guo, J. Li, Y. Xu, J. Lan, and Y. Hu, “Enabling scalable routing in software-defined networks with deep reinforcement learning on critical nodes,” IEEE/ACM Trans. Netw., vol. 30, no. 2, pp. 629–640, Apr. 2022.

Digital Library

[31]

S. Schneideret al., “Self-driving network and service coordination using deep reinforcement learning,” in Proc. Int. Conf. Netw. Service Manag. (CNSM), 2020, pp. 1–9.

[32]

S. Schneideret al., “Self-learning multi-objective service coordination using deep reinforcement learning,” IEEE Trans. Netw. Service Manag., vol. 18, no. 3, pp. 3829–3842, Sep. 2021.

[33]

S. Schneider, H. Qarawlus, and H. Karl, “Distributed online service coordination using deep reinforcement learning,” in Proc. IEEE Int. Conf. Distrib. Comput. Syst. (ICDCS), 2021, pp. 539–549.

[34]

H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource management with deep reinforcement learning,” in Proc. ACM Workshop Hot Topics Netw. (HotNets), 2016, pp. 50–56.

[35]

D. Zeng, L. Gu, S. Pan, J. Cai, and S. Guo, “Resource management at the network edge: A deep reinforcement learning approach,” IEEE Netw., vol. 33, no. 3, pp. 26–33, May/Jun. 2019.

[36]

H. Peng and X. Shen, “Multi-agent reinforcement learning based resource management in MEC-and UAV-assisted vehicular networks,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 131–141, Jan. 2021.

Digital Library

[37]

N. C. Luonget al., “Applications of deep reinforcement learning in communications and networking: A survey,” IEEE Commun. Surveys Tuts., vol. 21, no. 4, pp. 3133–3174, 4th Quart., 2019.

Digital Library

[38]

M. E. Kanakis, R. Khalili, and L. Wang, “Machine learning for computer systems and networking: A survey,” ACM Comput. Surveys, vol. 55, no. 4, pp. 1–36, 2022.

[39]

Y. S. Nasir and D. Guo, “Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks,” IEEE J. Sel. Areas Commun., vol. 37, no. 10, pp. 2239–2250, Oct. 2019.

[40]

X. Huang, S. Leng, S. Maharjan, and Y. Zhang, “Multi-agent deep reinforcement learning for computation offloading and interference coordination in small cell networks,” IEEE Trans. Veh. Technol., vol. 70, no. 9, pp. 9282–9293, Sep. 2021.

[41]

M. Ozturk, M. Gogate, O. Onireti, A. Adeel, A. Hussain, and M. A. Imran, “A novel deep learning driven, low-cost mobility prediction approach for 5G cellular networks: The case of the control/data separation architecture (CDSA),” Neurocomputing, vol. 358, pp. 479–489, Sep. 2019.

Digital Library

[42]

J. A. Ayala-Romero, A. Garcia-Saavedra, M. Gramaglia, X. Costa-Perez, A. Banchs, and J. J. Alcaraz, “vrAIn: A deep learning approach tailoring computing and radio resources in virtualized rans,” in Proc. Int. Conf. Mobile Comput. Netw. (MobiCom), 2019, pp. 1–16.

[43]

S. Wang, Y. Guo, N. Zhang, P. Yang, A. Zhou, and X. Shen, “Delay-aware microservice coordination in mobile edge computing: A reinforcement learning approach,” IEEE Trans. Mobile Comput., vol. 20, no. 3, pp. 939–951, Mar. 2021.

[44]

M. Elsayed, K. Shimotakahara, and M. Erol-Kantarci, “Machine learning-based inter-beam inter-cell interference mitigation in mmWave,” in Proc. IEEE Int. Conf. Commun. (ICC), 2020, pp. 1–6.

[45]

D. Niyato and E. Hossain, “Dynamics of network selection in heterogeneous wireless networks: An evolutionary game approach,” IEEE Trans. Veh. Technol., vol. 58, no. 4, pp. 2008–2017, May 2009.

[46]

A. Xie, J. Harrison, and C. Finn, “Deep reinforcement learning amidst lifelong non-stationarity,” 2020, arXiv:2006.10701.

[47]

A. Marotta, D. Cassioli, C. Antonelli, K. Kondepu, and L. Valcarenghi, “Network solutions for CoMP coordinated scheduling,” IEEE Access, vol. 7, pp. 176624–176633, 2019.

[48]

R. Irmeret al., “Coordinated multipoint: Concepts, performance, and field trial results,” IEEE Commun. Mag., vol. 49, no. 2, pp. 102–111, Feb. 2011.

Digital Library

[49]

Y. L. Lee, T. C. Chuah, J. Loo, and A. Vinel, “Recent advances in radio resource management for heterogeneous LTE/LTE-A networks,” IEEE Commun. Surveys Tuts., vol. 16, no. 4, pp. 2142–2180, 4th Quart., 2014.

[50]

R. Jain, “Quality of experience,” IEEE MultiMedia, vol. 11, no. 1, pp. 95–96, Jan. 2004.

Digital Library

[51]

V. Menkovski, G. Exarchakos, and A. Liotta, “Online QoE prediction,” in Proc. Int. Workshop Qual. Multimedia Exp. (QoMEX), 2010, pp. 118–123.

[52]

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. Mach. Learn. Res., Jul. 2015, pp. 448–456. [Online]. Available: https://proceedings.mlr.press/v37/ioffe15.html

[53]

T. Anagnostopoulos, C. Anagnostopoulos, and S. Hadjiefthymiades, “Efficient location prediction in mobile cellular networks,” Int. J. Wireless Inf. Netw., vol. 19, pp. 97–111, Jun. 2012.

[54]

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017, arXiv:1707.06347.

[55]

M. Hutter, “General discounting versus average reward,” in Proc. Int. Conf. Algorith. Learn. Theory, 2006, pp. 244–258.

[56]

C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of PPO in cooperative multi-agent games,” in Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), 2021.

[57]

Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Concept and applications,” ACM Trans. Intell. Syst. Technol., vol. 10, no. 2, pp. 1–19, 2019.

Digital Library

[58]

P. Hernandez-Leal, M. Kaisers, T. Baarslag, and E. M. de Cote, “A survey of learning in multiagent environments: Dealing with non-stationarity,” 2017, arXiv:1707.09183.

[59]

T. Taleb, K. Samdanis, B. Mada, H. Flinck, S. Dutta, and D. Sabella, “On multi-access edge computing: A survey of the emerging 5G network edge cloud architecture and orchestration,” IEEE Commun. Surveys Tuts., vol. 19, no. 3, pp. 1657–1681, 3rd Quart., 2017.

Digital Library

[60]

“Verizon 5G TF; Network and signaling working group; verizon 5th generation radio access; overall description (release 1),” Cisco, San Jose, CA, USA, Rep. TS V5G.300 v.1.0 (2016-06), 2016.

[61]

P. Casaset al., “Predicting QoE in cellular networks using machine learning and in-smartphone measurements,” in Proc. Int. Conf. Qual. Multimedia Exp. (QoMEX), 2017, pp. 1–6.

[62]

“Release 15: NR; Radio Resource Control (RRC); Protocol specification,” 3GPP, Sophia Antipolis, France, Rep. TS 38.331, 2021, version 15.13.0.

[63]

T. Contributors. “TensorFlow lite.” 2021. [Online]. Available: https://www.tensorflow.org/lite

[64]

A. Kadianet al., “Sim2Real predictivity: Does evaluation in simulation predict real-world performance?” IEEE Robot. Autom. Lett., vol. 5, no. 4, pp. 6670–6677, Oct. 2020.

[65]

A. Hardet al., “Federated learning for mobile keyboard prediction,” 2018, arXiv:1811.03604.

[66]

E. Lianget al., “RLlib: Abstractions for distributed reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2018, pp. 3053–3062.

[67]

S. Schneider, S. Werner, R. Khalili, A. Hecker, and H. Karl, “Mobile-env: An open platform for reinforcement learning in wireless mobile networks,” in Proc. Netw. Operat. Manag. Symp. (NOMS), 2022, pp. 1–3.

[68]

A. Toskala and H. Holma, WCDMA For UMTS: HSDPA Evolution And LTE. Hoboken, NJ, USA: Wiley, 2007.

[69]

A. Medeisis and A. Kajackas, “On the use of the universal Okumura-Hata propagation prediction model in rural areas,” in Proc. Veh. Technol. Conf. (VTC), vol. 3, 2000, pp. 1815–1818.

[70]

J. Yoon, M. Liu, and B. Noble, “Random waypoint considered harmful,” in Proc. IEEE Int. Conf. Comput. Commun. (INFOCOM), vol. 2, 2003, pp. 1312–1321.

[71]

B. Karlik and A. V. Olgac, “Performance analysis of various activation functions in generalized MLP architectures of neural networks,” Int. J. Artif. Intell. Expert Syst., vol. 1, no. 4, pp. 111–122, 2011.

[72]

M. Andrychowiczet al., “What matters in on-policy reinforcement learning? a large-scale empirical study,” 2020, arXiv:2006.05990.

[73]

A. Saxena and R. Sindal, “Strategy for resource allocation in LTE-A,” in Proc. Int. Conf. Signal Process. Commun. Power Embedded Syst. (SCOPES), 2016, pp. 29–34.

[74]

J. Yang, Z. Yifan, W. Ying, and Z. Ping, “Average rate updating mechanism in proportional fair scheduler for HDR,” in Proc. IEEE Global Telecommun. Conf. (GLOBECOM), 2004, pp. 3464–3466.

[75]

Y. Qi, M. Hunukumbure, M. Nekovee, J. Lorca, and V. Sgardoni, “Quantifying data rate and bandwidth requirements for immersive 5G experience,” in Proc. IEEE Int. Conf. Commun. Workshops (ICC), 2016, pp. 455–461.

[76]

E. Greensmith, P. L. Bartlett, and J. Baxter, “Variance reduction techniques for gradient estimates in reinforcement learning,” J. Mach. Learn. Res., vol. 5, pp. 1471–1530, Nov. 2004.

Digital Library

[77]

V. François-Lavet, P. Henderson, R. Islam, M. G. Bellemare, and J. Pineau, “An introduction to deep reinforcement learning,” Found. Trends® Mach. Learn., vol. 11, nos. 3–4, pp. 219–354, 2018.

Digital Library

[78]

C. J. Watkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8, pp. 279–292, May 1992.

Digital Library

[79]

V. Mnihet al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.

Index Terms

Multi-Agent Deep Reinforcement Learning for Coordinated Multipoint in Mobile Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Networks
  1. Network types
    1. Mobile networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Multi-Agent Inverse Reinforcement Learning
ICMLA '10: Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications

Learning the reward function of an agent by observing its behavior is termed inverse reinforcement learning and has applications in learning from demonstration or apprenticeship learning. We introduce the problem of multi-agent inverse reinforcement ...
Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information Processing
Abstract
As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Assured Deep Multi-Agent Reinforcement Learning for Safe Robotic Systems
Agents and Artificial Intelligence
Abstract
Using multi-agent reinforcement learning to find solutions to complex decision-making problems in shared environments has become standard practice in many scenarios. However, this is not the case in safety-critical scenarios, where the ...

Comments

Information & Contributors

Information

Published In

1932-4537 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 August 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents