Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multi-Agent Deep Reinforcement Learning for Coordinated Multipoint in Mobile Networks

Published: 01 August 2023 Publication History

Abstract

Macrodiversity is a key technique to increase the capacity of mobile networks. It can be realized using coordinated multipoint (CoMP), simultaneously connecting users to multiple overlapping cells. Selecting which users to serve by how many and which cells is NP-hard but needs to happen continuously in real time as users move and channel state changes. Existing approaches often require strict assumptions about or perfect knowledge of the underlying radio system, its resource allocation scheme, or user movements, none of which is readily available in practice. Instead, we propose three novel self-learning and self-adapting approaches using model-free deep reinforcement learning (DRL): DeepCoMP, DD-CoMP, and D3-CoMP. DeepCoMP leverages central control and observations of all users to select cells almost optimally. DD-CoMP and D3-CoMP use multi-agent DRL, which allows distributed, robust, and highly scalable coordination. All three approaches learn from experience and self-adapt to varying scenarios, reaching 2x higher Quality of Experience than other approaches. They have very few built-in assumptions and do not need prior system knowledge, making them more robust to change and better applicable in practice than existing approaches.

References

[1]
LTE release 11,” 3GPP, Sophia Antipolis, France, 3GPP Rep. TR 36.819, version 11.1.0, 2012. [Online]. Available: https://www.3gpp.org/specifications/releases/69-release-11
[2]
Y. Al-Eryani and E. Hossain, “The D-OMA method for massive multiple access in 6G: Performance, security, and challenges,” IEEE Veh. Technol. Mag., vol. 14, no. 3, pp. 92–99, Sep. 2019.
[3]
M. Fiedler and T. Hoßfeld, “Quality of experience-related differential equations and provisioning-delivery hysteresis,” in ITC Specialist Seminar on Multimedia Applications-Traffic, Performance and QoE. Tokyo, Japan: IEICE, 2010.
[4]
S. Khirman and P. Henriksen, “Relationship between quality-of-service and quality-of-experience for public Internet service,” in Proc. Workshop Passive Act. Meas., 2002, pp. 1–6.
[5]
D. Amzallag, R. Bar-Yehuda, D. Raz, and G. Scalosub, “Cell selection in 4G cellular networks,” IEEE Trans. Mobile Comput., vol. 12, no. 7, pp. 1443–1455, Jul. 2013.
[6]
R. Vijayarani and L. Nithyanandan, “Dynamic cooperative base station selection scheme for downlink CoMP in LTE-advanced networks,” Wireless Pers. Commun., vol. 92, no. 2, pp. 667–679, 2017.
[7]
P. Marsch and G. Fettweis, “Static clustering for cooperative multi-point (CoMP) in mobile communications,” in Proc. IEEE Int. Conf. Commun. (ICC), 2011, pp. 1–6.
[8]
A. Beylerian and T. Ohtsuki, “Multi-point fairness in resource allocation for C-RAN downlink CoMP transmission,” EURASIP J. Wireless Commun. Netw., vol. 2016, pp. 1–10, 2016.
[9]
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 2018.
[10]
X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji, and M. Bennis, “Optimized computation offloading performance in virtual edge computing systems via deep reinforcement learning,” IEEE Internet Things J., vol. 6, no. 3, pp. 4005–4018, Jun. 2019.
[11]
L. You and D. Yuan, “Joint CoMP-cell selection and resource allocation in fronthaul-constrained C-RAN,” in Proc. Int. Symp. Model. Optim. Mobile Ad Hoc, Wireless Netw. (WiOpt), 2017, pp. 1–6.
[12]
Y. Zhou, C. Shen, and M. van der Schaar, “A non-stationary Online learning approach to mobility management,” IEEE Trans. Wireless Commun., vol. 18, no. 2, pp. 1434–1446, Feb. 2019.
[13]
R. Thomas. “Google’s AutoML: Cutting through the hype.” 2018. Accessed: Jun. 15, 2021. https://www.fast.ai/2018/07/23/auto-ml-3/
[14]
M. A. Alsheikh, D. T. Hoang, D. Niyato, H.-P. Tan, and S. Lin, “Markov decision processes with applications in wireless sensor networks: A survey,” IEEE Commun. Surveys Tuts., vol. 17, no. 3, pp. 1239–1267, 3rd Quart., 2015.
[15]
G. Dulac-Arnoldet al., “Challenges of real-world reinforcement learning: Definitions, benchmarks and analysis,” Mach. Learn., vol. 110, no. 9, pp. 2419–2468, 2021.
[16]
C. Zhang and V. Lesser, “Coordinating multi-agent reinforcement learning with limited communication,” in Proc. Int. Conf. Auton. Agents Multi-Agent Syst., 2013, pp. 1101–1108.
[17]
J. K. Gupta, M. Egorov, and M. Kochenderfer, “Cooperative multi-agent control using deep reinforcement learning,” in Proc. Int. Conf. Auton. Agents Multiagent Syst., 2017, pp. 66–83.
[18]
S. Schneider. “DeepCoMP, DD-CoMP, and D3-CoMP GitHub repository.” 2021. [Online]. Available: https://github.com/CN-UPB/DeepCoMP
[19]
D. Xenakis, N. Passas, L. Merakos, and C. Verikoukis, “Mobility management for femtocells in LTE-advanced: Key aspects and survey of handover decision algorithms,” IEEE Commun. Surveys Tuts., vol. 16, no. 1, pp. 64–91, 1st Quart., 2014.
[20]
F. Giust, L. Cominardi, and C. J. Bernardos, “Distributed mobility management for future 5G networks: Overview and analysis of existing approaches,” IEEE Commun. Mag., vol. 53, no. 1, pp. 142–149, Jan. 2015.
[21]
S. Mosleh, L. Liu, and J. Zhang, “Proportional-fair resource allocation for coordinated multi-point transmission in LTE-advanced,” IEEE Trans. Wireless Commun., vol. 15, no. 8, pp. 5355–5367, Aug. 2016.
[22]
Y. Mao, J. Zhang, S. Song, and K. B. Letaief, “Stochastic joint radio and computational resource management for multi-user mobile-edge computing systems,” IEEE Trans. Wireless Commun., vol. 16, no. 9, pp. 5994–6009, Sep. 2017.
[23]
A. Tolliet al., “Distributed coordinated transmission with forward-backward training for 5G radio access,” IEEE Commun. Mag., vol. 57, no. 1, pp. 58–64, Jan. 2019.
[24]
F. Qamar, K. B. Dimyati, M. N. Hindia, K. A. B. Noordin, and A. M. Al-Samman, “A comprehensive review on coordinated multi-point operation for LTE-A,” Comput. Netw., vol. 123, pp. 19–37, Aug. 2017.
[25]
H. Kim and Y. Han, “A proportional fair scheduling for multicarrier transmission systems,” IEEE Commun. Lett., vol. 9, no. 3, pp. 210–212, Mar. 2005.
[26]
Y. Xiao, J. Liu, J. Wu, and N. Ansari, “Leveraging deep reinforcement learning for traffic engineering: A survey,” IEEE Commun. Surveys Tuts., vol. 23, no. 4, pp. 2064–2097, 4th Quart., 2021.
[27]
J. Zhang, M. Ye, Z. Guo, C.-Y. Yen, and H. J. Chao, “CFR-RL: Traffic engineering with reinforcement learning in SDN,” IEEE J. Sel. Areas Commun., vol. 38, no. 10, pp. 2249–2259, Oct. 2020.
[28]
Z. Xu, K. Wu, W. Zhang, J. Tang, Y. Wang, and G. Xue, “PnP-DRL: A plug-and-play deep reinforcement learning approach for experience-driven networking,” IEEE J. Sel. Areas Commun., vol. 39, no. 8, pp. 2476–2486, Aug. 2021.
[29]
Z. Mammeri, “Reinforcement learning based routing in networks: Review and classification of approaches,” IEEE Access, vol. 7, pp. 55916–55950, 2019.
[30]
P. Sun, Z. Guo, J. Li, Y. Xu, J. Lan, and Y. Hu, “Enabling scalable routing in software-defined networks with deep reinforcement learning on critical nodes,” IEEE/ACM Trans. Netw., vol. 30, no. 2, pp. 629–640, Apr. 2022.
[31]
S. Schneideret al., “Self-driving network and service coordination using deep reinforcement learning,” in Proc. Int. Conf. Netw. Service Manag. (CNSM), 2020, pp. 1–9.
[32]
S. Schneideret al., “Self-learning multi-objective service coordination using deep reinforcement learning,” IEEE Trans. Netw. Service Manag., vol. 18, no. 3, pp. 3829–3842, Sep. 2021.
[33]
S. Schneider, H. Qarawlus, and H. Karl, “Distributed online service coordination using deep reinforcement learning,” in Proc. IEEE Int. Conf. Distrib. Comput. Syst. (ICDCS), 2021, pp. 539–549.
[34]
H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource management with deep reinforcement learning,” in Proc. ACM Workshop Hot Topics Netw. (HotNets), 2016, pp. 50–56.
[35]
D. Zeng, L. Gu, S. Pan, J. Cai, and S. Guo, “Resource management at the network edge: A deep reinforcement learning approach,” IEEE Netw., vol. 33, no. 3, pp. 26–33, May/Jun. 2019.
[36]
H. Peng and X. Shen, “Multi-agent reinforcement learning based resource management in MEC-and UAV-assisted vehicular networks,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 131–141, Jan. 2021.
[37]
N. C. Luonget al., “Applications of deep reinforcement learning in communications and networking: A survey,” IEEE Commun. Surveys Tuts., vol. 21, no. 4, pp. 3133–3174, 4th Quart., 2019.
[38]
M. E. Kanakis, R. Khalili, and L. Wang, “Machine learning for computer systems and networking: A survey,” ACM Comput. Surveys, vol. 55, no. 4, pp. 1–36, 2022.
[39]
Y. S. Nasir and D. Guo, “Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks,” IEEE J. Sel. Areas Commun., vol. 37, no. 10, pp. 2239–2250, Oct. 2019.
[40]
X. Huang, S. Leng, S. Maharjan, and Y. Zhang, “Multi-agent deep reinforcement learning for computation offloading and interference coordination in small cell networks,” IEEE Trans. Veh. Technol., vol. 70, no. 9, pp. 9282–9293, Sep. 2021.
[41]
M. Ozturk, M. Gogate, O. Onireti, A. Adeel, A. Hussain, and M. A. Imran, “A novel deep learning driven, low-cost mobility prediction approach for 5G cellular networks: The case of the control/data separation architecture (CDSA),” Neurocomputing, vol. 358, pp. 479–489, Sep. 2019.
[42]
J. A. Ayala-Romero, A. Garcia-Saavedra, M. Gramaglia, X. Costa-Perez, A. Banchs, and J. J. Alcaraz, “vrAIn: A deep learning approach tailoring computing and radio resources in virtualized rans,” in Proc. Int. Conf. Mobile Comput. Netw. (MobiCom), 2019, pp. 1–16.
[43]
S. Wang, Y. Guo, N. Zhang, P. Yang, A. Zhou, and X. Shen, “Delay-aware microservice coordination in mobile edge computing: A reinforcement learning approach,” IEEE Trans. Mobile Comput., vol. 20, no. 3, pp. 939–951, Mar. 2021.
[44]
M. Elsayed, K. Shimotakahara, and M. Erol-Kantarci, “Machine learning-based inter-beam inter-cell interference mitigation in mmWave,” in Proc. IEEE Int. Conf. Commun. (ICC), 2020, pp. 1–6.
[45]
D. Niyato and E. Hossain, “Dynamics of network selection in heterogeneous wireless networks: An evolutionary game approach,” IEEE Trans. Veh. Technol., vol. 58, no. 4, pp. 2008–2017, May 2009.
[46]
A. Xie, J. Harrison, and C. Finn, “Deep reinforcement learning amidst lifelong non-stationarity,” 2020, arXiv:2006.10701.
[47]
A. Marotta, D. Cassioli, C. Antonelli, K. Kondepu, and L. Valcarenghi, “Network solutions for CoMP coordinated scheduling,” IEEE Access, vol. 7, pp. 176624–176633, 2019.
[48]
R. Irmeret al., “Coordinated multipoint: Concepts, performance, and field trial results,” IEEE Commun. Mag., vol. 49, no. 2, pp. 102–111, Feb. 2011.
[49]
Y. L. Lee, T. C. Chuah, J. Loo, and A. Vinel, “Recent advances in radio resource management for heterogeneous LTE/LTE-A networks,” IEEE Commun. Surveys Tuts., vol. 16, no. 4, pp. 2142–2180, 4th Quart., 2014.
[50]
R. Jain, “Quality of experience,” IEEE MultiMedia, vol. 11, no. 1, pp. 95–96, Jan. 2004.
[51]
V. Menkovski, G. Exarchakos, and A. Liotta, “Online QoE prediction,” in Proc. Int. Workshop Qual. Multimedia Exp. (QoMEX), 2010, pp. 118–123.
[52]
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. Mach. Learn. Res., Jul. 2015, pp. 448–456. [Online]. Available: https://proceedings.mlr.press/v37/ioffe15.html
[53]
T. Anagnostopoulos, C. Anagnostopoulos, and S. Hadjiefthymiades, “Efficient location prediction in mobile cellular networks,” Int. J. Wireless Inf. Netw., vol. 19, pp. 97–111, Jun. 2012.
[54]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017, arXiv:1707.06347.
[55]
M. Hutter, “General discounting versus average reward,” in Proc. Int. Conf. Algorith. Learn. Theory, 2006, pp. 244–258.
[56]
C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of PPO in cooperative multi-agent games,” in Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), 2021.
[57]
Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Concept and applications,” ACM Trans. Intell. Syst. Technol., vol. 10, no. 2, pp. 1–19, 2019.
[58]
P. Hernandez-Leal, M. Kaisers, T. Baarslag, and E. M. de Cote, “A survey of learning in multiagent environments: Dealing with non-stationarity,” 2017, arXiv:1707.09183.
[59]
T. Taleb, K. Samdanis, B. Mada, H. Flinck, S. Dutta, and D. Sabella, “On multi-access edge computing: A survey of the emerging 5G network edge cloud architecture and orchestration,” IEEE Commun. Surveys Tuts., vol. 19, no. 3, pp. 1657–1681, 3rd Quart., 2017.
[60]
Verizon 5G TF; Network and signaling working group; verizon 5th generation radio access; overall description (release 1),” Cisco, San Jose, CA, USA, Rep. TS V5G.300 v.1.0 (2016-06), 2016.
[61]
P. Casaset al., “Predicting QoE in cellular networks using machine learning and in-smartphone measurements,” in Proc. Int. Conf. Qual. Multimedia Exp. (QoMEX), 2017, pp. 1–6.
[62]
Release 15: NR; Radio Resource Control (RRC); Protocol specification,” 3GPP, Sophia Antipolis, France, Rep. TS 38.331, 2021, version 15.13.0.
[63]
T. Contributors. “TensorFlow lite.” 2021. [Online]. Available: https://www.tensorflow.org/lite
[64]
A. Kadianet al., “Sim2Real predictivity: Does evaluation in simulation predict real-world performance?” IEEE Robot. Autom. Lett., vol. 5, no. 4, pp. 6670–6677, Oct. 2020.
[65]
A. Hardet al., “Federated learning for mobile keyboard prediction,” 2018, arXiv:1811.03604.
[66]
E. Lianget al., “RLlib: Abstractions for distributed reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2018, pp. 3053–3062.
[67]
S. Schneider, S. Werner, R. Khalili, A. Hecker, and H. Karl, “Mobile-env: An open platform for reinforcement learning in wireless mobile networks,” in Proc. Netw. Operat. Manag. Symp. (NOMS), 2022, pp. 1–3.
[68]
A. Toskala and H. Holma, WCDMA For UMTS: HSDPA Evolution And LTE. Hoboken, NJ, USA: Wiley, 2007.
[69]
A. Medeisis and A. Kajackas, “On the use of the universal Okumura-Hata propagation prediction model in rural areas,” in Proc. Veh. Technol. Conf. (VTC), vol. 3, 2000, pp. 1815–1818.
[70]
J. Yoon, M. Liu, and B. Noble, “Random waypoint considered harmful,” in Proc. IEEE Int. Conf. Comput. Commun. (INFOCOM), vol. 2, 2003, pp. 1312–1321.
[71]
B. Karlik and A. V. Olgac, “Performance analysis of various activation functions in generalized MLP architectures of neural networks,” Int. J. Artif. Intell. Expert Syst., vol. 1, no. 4, pp. 111–122, 2011.
[72]
M. Andrychowiczet al., “What matters in on-policy reinforcement learning? a large-scale empirical study,” 2020, arXiv:2006.05990.
[73]
A. Saxena and R. Sindal, “Strategy for resource allocation in LTE-A,” in Proc. Int. Conf. Signal Process. Commun. Power Embedded Syst. (SCOPES), 2016, pp. 29–34.
[74]
J. Yang, Z. Yifan, W. Ying, and Z. Ping, “Average rate updating mechanism in proportional fair scheduler for HDR,” in Proc. IEEE Global Telecommun. Conf. (GLOBECOM), 2004, pp. 3464–3466.
[75]
Y. Qi, M. Hunukumbure, M. Nekovee, J. Lorca, and V. Sgardoni, “Quantifying data rate and bandwidth requirements for immersive 5G experience,” in Proc. IEEE Int. Conf. Commun. Workshops (ICC), 2016, pp. 455–461.
[76]
E. Greensmith, P. L. Bartlett, and J. Baxter, “Variance reduction techniques for gradient estimates in reinforcement learning,” J. Mach. Learn. Res., vol. 5, pp. 1471–1530, Nov. 2004.
[77]
V. François-Lavet, P. Henderson, R. Islam, M. G. Bellemare, and J. Pineau, “An introduction to deep reinforcement learning,” Found. Trends® Mach. Learn., vol. 11, nos. 3–4, pp. 219–354, 2018.
[78]
C. J. Watkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8, pp. 279–292, May 1992.
[79]
V. Mnihet al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.

Index Terms

  1. Multi-Agent Deep Reinforcement Learning for Coordinated Multipoint in Mobile Networks
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        Publisher

        IEEE Press

        Publication History

        Published: 01 August 2023

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 04 Oct 2024

        Other Metrics

        Citations

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media