research-article

Open access

Deep Index Policy for Multi-Resource Restless Matching Bandit and Its Application in Multi-Channel Scheduling

Authors:

I-Hong HouAuthors Info & Claims

MOBIHOC '24: Proceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

Pages 71 - 80

https://doi.org/10.1145/3641512.3686381

Published: 01 October 2024 Publication History

Abstract

Scheduling in multi-channel wireless communication system presents formidable challenges in effectively allocating resources. To address these challenges, we investigate a multi-resource restless matching bandit (MR-RMB) model for heterogeneous resource systems with an objective of maximizing long-term discounted total rewards while respecting resource constraints. We have also generalized to applications beyond multi-channel wireless. We discuss the Max-Weight Index Matching algorithm, which optimizes resource allocation based on learned partial indexes. We have derived the policy gradient theorem for index learning. Our main contribution is the introduction of a new Deep Index Policy (DIP), an online learning algorithm tailored for MR-RMB. DIP learns the partial index by leveraging the policy gradient theorem for restless arms with convoluted and unknown transition kernels of heterogeneous resources. We demonstrate the utility of DIP by evaluating its performance for three different MR-RMB problems. Our simulation results show that DIP indeed learns the partial indexes efficiently.

References

[1]

Samuli Aalto, Pasi Lassila, and Prajwal Osti. 2015. Whittle index approach to size-aware scheduling with time-varying channels. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 57--69.

Digital Library

[2]

Arjun Anand and Gustavo de Veciana. 2018. A Whittle's index based approach for qoe optimization in wireless networks. Proceedings of the ACM on Measurement and Analysis of Computing Systems 2, 1 (2018), 1--39.

Digital Library

[3]

PS Ansell, Kevin D Glazebrook, José Nino-Mora, and M O'Keeffe. 2003. Whittle's index policy for a multi-class queueing system with convex holding costs. Mathematical Methods of Operations Research 57 (2003), 21--39.

[4]

Shreeshankar Bodas, Sanjay Shakkottai, Lei Ying, and R Srikant. 2013. Scheduling in multi-channel wireless networks: Rate function optimality in the small-buffer regime. IEEE Transactions on Information Theory 60, 2 (2013), 1101--1125.

Digital Library

[5]

Vivek S Borkar, Gaurav S Kasbekar, Sarath Pattathil, and Priyesh Y Shetty. 2017. Opportunistic scheduling as restless bandits. IEEE Transactions on Control of Network Systems 5, 4 (2017), 1952--1961.

[6]

Marcel C Castro, Peter Dely, Andreas J Kassler, and Nitin H Vaidya. 2009. Qosaware channel scheduling for multi-radio/multi-channel wireless mesh networks. In Proceedings of the 4th ACM international workshop on Experimental evaluation and characterization. 11--18.

Digital Library

[7]

Gongpu Chen, Soung Chang Liew, and Yulin Shao. 2022. Uncertainty-of-information scheduling: A restless multiarmed bandit framework. IEEE Transactions on Information Theory 68, 9 (2022), 6151--6173.

Digital Library

[8]

Wei Cheng, Xiuzhen Cheng, Taieb Znati, Xicheng Lu, and Zexin Lu. 2009. The complexity of channel scheduling in multi-radio multi-channel wireless networks. In IEEE INFOCOM 2009. IEEE, 1512--1520.

[9]

Emmanouil Fountoulakis, Themistoklis Charalambous, Anthony Ephremides, and Nikolaos Pappas. 2023. Scheduling policies for AoI minimization with timely throughput constraints. IEEE Transactions on Communications (2023).

[10]

Yi Gai, Bhaskar Krishnamachari, and Mingyan Liu. 2011. On the combinatorial multi-armed bandit problem with Markovian rewards. In 2011 IEEE Global Telecommunications Conference-GLOBECOM 2011. IEEE, 1--6.

[11]

Aditya Gopalan, Constantine Caramanis, and Sanjay Shakkottai. 2012. On wireless scheduling with partial channel-state information. IEEE Transactions on Information Theory 58, 1 (2012), 403--420.

Digital Library

[12]

David J Hodge and Kevin D Glazebrook. 2015. On the asymptotic optimality of greedy index heuristics for multi-action restless bandits. Advances in Applied Probability 47, 3 (2015), 652--667.

[13]

Shanfeng Huang, Bojie Lv, Rui Wang, and Kaibin Huang. 2020. Scheduling for mobile edge computing with random user arrivals---An approximate MDP and reinforcement learning approach. IEEE Transactions on Vehicular Technology 69, 7 (2020), 7735--7750.

[14]

Igor Kadota, Abhishek Sinha, Elif Uysal-Biyikoglu, Rahul Singh, and Eytan Modiano. 2018. Scheduling policies for minimizing age of information in broadcast wireless networks. IEEE/ACM Transactions on Networking 26, 6 (2018), 2637--2650.

Digital Library

[15]

Jackson A Killian, Andrew Perrault, and Milind Tambe. 2021. Beyond" to act or not to act": Fast lagrangian approaches to general multi-action restless bandits. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. 710--718.

Digital Library

[16]

Yang G Kim and Myung J Lee. 2014. Scheduling multi-channel and multi-timeslot in time constrained wireless sensor networks via simulated annealing and particle swarm optimization. IEEE communications Magazine 52, 1 (2014), 122--129.

[17]

Subhashini Krishnasamy, PT Akhil, Ari Arapostathis, Rajesh Sundaresan, and Sanjay Shakkottai. 2018. Augmenting max-weight with explicit learning for wireless scheduling with switching costs. IEEE/ACM Transactions on Networking 26, 6 (2018), 2501--2514.

Digital Library

[18]

Alex S Leong, Arunselvan Ramaswamy, Daniel E Quevedo, Holger Karl, and Ling Shi. 2020. Deep reinforcement learning for wireless sensor scheduling in cyber-physical systems. Automatica 113 (2020), 108759.

Digital Library

[19]

Gang Li, Chunjing Hu, Tao Peng, Xiaohui Zhou, and Yueqing Xu. 2018. High-Priority Minimum-Interference Channel Assignment in Multi-Radio Multi-Channel Wireless Networks. In Proceedings of the 2nd International Conference on Telecommunications and Communication Engineering. 314--318.

Digital Library

[20]

Songhua Li and Lingjie Duan. 2023. Age of Information Diffusion on Social Networks: Optimizing Multi-Stage Seeding Strategies. In Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing. 81--90.

Digital Library

[21]

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).

[22]

Aditya Mate, Jackson Killian, Haifeng Xu, Andrew Perrault, and Milind Tambe. 2020. Collapsing bandits and their application to public health intervention. Advances in Neural Information Processing Systems 33 (2020), 15639--15650.

[23]

Navid Naderializadeh, Jaroslaw J Sydir, Meryem Simsek, and Hosein Nikopour. 2021. Resource management in wireless networks via multi-agent deep reinforcement learning. IEEE Transactions on Wireless Communications 20, 6 (2021), 3507--3523.

[24]

Khaled Nakhleh and I-Hong Hou. 2022. DeepTOP: Deep threshold-optimal policy for MDPs and RMABs. Advances in Neural Information Processing Systems 35 (2022), 28734--28746.

[25]

Ciara Pike-Burke and Steffen Grunewalder. 2019. Recovering bandits. Advances in Neural Information Processing Systems 32 (2019).

[26]

Shang-Pin Sheng, Mingyan Liu, and Romesh Saigal. 2014. Data-driven channel modeling using spectrum measurement. IEEE Transactions on Mobile Computing 14, 9 (2014), 1794--1805.

Digital Library

[27]

David Simchi-Levi, Rui Sun, and Xinshang Wang. 2023. Online Matching with Bayesian Rewards. Operations Research (2023).

[28]

Bejjipuram Sombabu and Sharayu Moharir. 2020. Age-of-information based scheduling for multi-channel systems. IEEE Transactions on Wireless Communications 19, 7 (2020), 4439--4448.

Digital Library

[29]

Rajat Talak, Sertac Karaman, and Eytan Modiano. 2020. Improving age of information in wireless networks with perfect channel state information. IEEE/ACM Transactions on Networking 28, 4 (2020), 1765--1778.

Digital Library

[30]

Vishrant Tripathi and Eytan Modiano. 2019. A whittle index approach to minimizing functions of age of information. In 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 1160--1167.

Digital Library

[31]

Shresth Verma, Aditya Mate, Kai Wang, Neha Madhiwalla, Aparna Hegde, Aparna Taneja, and Milind Tambe. 2023. Restless Multi-Armed Bandits for Maternal and Child Health: Results from Decision-Focused Learning. In AAMAS. 1312--1320.

[32]

Peng-Jun Wan. 2016. Joint selection and transmission scheduling of point-to-point communication requests in multi-channel wireless networks. In Proceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing. 231--240.

Digital Library

[33]

Shuang Wu, Xiaoqiang Ren, Qing-Shan Jia, Karl Henrik Johansson, and Ling Shi. 2022. Towards efficient dynamic uplink scheduling over multiple unknown channels. arXiv preprint arXiv:2212.06633 (2022).

[34]

Jun Xu and Chengcheng Guo. 2019. Scheduling stochastic real-time D2D communications. IEEE Transactions on Vehicular Technology 68, 6 (2019), 6022--6036.

[35]

Jun Xu, Jianfeng Yang, Yinbo Xie, Chengcheng Guo, and Yinbo Yu. 2016. MDP based link scheduling in wireless networks to maximize the reliability. Wireless Networks 22 (2016), 1659--1671.

Digital Library

[36]

Xiao Yang, Zhiyong Chen, Kuikui Li, Yaping Sun, Ning Liu, Weiliang Xie, and Yong Zhao. 2018. Communication-constrained mobile edge computing systems for wireless virtual reality: Scheduling and tradeoff. IEEE Access 6 (2018), 16665--16677.

[37]

Abolfazl Zakeri, Mohammad Moltafet, Markus Leinonen, and Marian Codreanu. 2023. Minimizing the AoI in resource-constrained multi-source relaying systems: Dynamic and learning-based scheduling. IEEE Transactions on Wireless Communications (2023).

[38]

Yihan Zou, Kwang Taik Kim, Xiaojun Lin, and Mung Chiang. 2021. Minimizing age-of-information in heterogeneous multi-channel systems: A new partial-index approach. In Proceedings of the twenty-second international symposium on theory, algorithmic foundations, and protocol design for mobile networks and mobile computing. 11--20.

Digital Library

Index Terms

Deep Index Policy for Multi-Resource Restless Matching Bandit and Its Application in Multi-Channel Scheduling
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Sequential decision making

Recommendations

Empirical Gittins index strategies with ε-explorations for multi-armed bandit problems
Abstract
The machine learning/statistics literature has so far considered largely multi-armed bandit (MAB) problems in which the rewards from every arm are assumed independent and identically distributed. For more general MAB models in which ...
Scheduling heuristics for improved utilization in multi-resource parallel systems
Multi-armed bandit problem with known trend

We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MobiHoc '24: Proceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

October 2024

511 pages

ISBN:9798400705212

DOI:10.1145/3641512

General Chair:
Symeon Papavassiliou,
Program Chair:
Stefan Schmid

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation
U.S. Army Research Office

Conference

MobiHoc '24

Sponsor:

SIGMOBILE

MobiHoc '24: Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

October 14 - 17, 2024

Athens, Greece

Acceptance Rates

Overall Acceptance Rate 296 of 1,843 submissions, 16%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
109
Total Downloads

Downloads (Last 12 months)109
Downloads (Last 6 weeks)27

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten