Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3641512.3686381acmconferencesArticle/Chapter ViewAbstractPublication PagesmobihocConference Proceedingsconference-collections
research-article
Open access

Deep Index Policy for Multi-Resource Restless Matching Bandit and Its Application in Multi-Channel Scheduling

Published: 01 October 2024 Publication History

Abstract

Scheduling in multi-channel wireless communication system presents formidable challenges in effectively allocating resources. To address these challenges, we investigate a multi-resource restless matching bandit (MR-RMB) model for heterogeneous resource systems with an objective of maximizing long-term discounted total rewards while respecting resource constraints. We have also generalized to applications beyond multi-channel wireless. We discuss the Max-Weight Index Matching algorithm, which optimizes resource allocation based on learned partial indexes. We have derived the policy gradient theorem for index learning. Our main contribution is the introduction of a new Deep Index Policy (DIP), an online learning algorithm tailored for MR-RMB. DIP learns the partial index by leveraging the policy gradient theorem for restless arms with convoluted and unknown transition kernels of heterogeneous resources. We demonstrate the utility of DIP by evaluating its performance for three different MR-RMB problems. Our simulation results show that DIP indeed learns the partial indexes efficiently.

References

[1]
Samuli Aalto, Pasi Lassila, and Prajwal Osti. 2015. Whittle index approach to size-aware scheduling with time-varying channels. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 57--69.
[2]
Arjun Anand and Gustavo de Veciana. 2018. A Whittle's index based approach for qoe optimization in wireless networks. Proceedings of the ACM on Measurement and Analysis of Computing Systems 2, 1 (2018), 1--39.
[3]
PS Ansell, Kevin D Glazebrook, José Nino-Mora, and M O'Keeffe. 2003. Whittle's index policy for a multi-class queueing system with convex holding costs. Mathematical Methods of Operations Research 57 (2003), 21--39.
[4]
Shreeshankar Bodas, Sanjay Shakkottai, Lei Ying, and R Srikant. 2013. Scheduling in multi-channel wireless networks: Rate function optimality in the small-buffer regime. IEEE Transactions on Information Theory 60, 2 (2013), 1101--1125.
[5]
Vivek S Borkar, Gaurav S Kasbekar, Sarath Pattathil, and Priyesh Y Shetty. 2017. Opportunistic scheduling as restless bandits. IEEE Transactions on Control of Network Systems 5, 4 (2017), 1952--1961.
[6]
Marcel C Castro, Peter Dely, Andreas J Kassler, and Nitin H Vaidya. 2009. Qosaware channel scheduling for multi-radio/multi-channel wireless mesh networks. In Proceedings of the 4th ACM international workshop on Experimental evaluation and characterization. 11--18.
[7]
Gongpu Chen, Soung Chang Liew, and Yulin Shao. 2022. Uncertainty-of-information scheduling: A restless multiarmed bandit framework. IEEE Transactions on Information Theory 68, 9 (2022), 6151--6173.
[8]
Wei Cheng, Xiuzhen Cheng, Taieb Znati, Xicheng Lu, and Zexin Lu. 2009. The complexity of channel scheduling in multi-radio multi-channel wireless networks. In IEEE INFOCOM 2009. IEEE, 1512--1520.
[9]
Emmanouil Fountoulakis, Themistoklis Charalambous, Anthony Ephremides, and Nikolaos Pappas. 2023. Scheduling policies for AoI minimization with timely throughput constraints. IEEE Transactions on Communications (2023).
[10]
Yi Gai, Bhaskar Krishnamachari, and Mingyan Liu. 2011. On the combinatorial multi-armed bandit problem with Markovian rewards. In 2011 IEEE Global Telecommunications Conference-GLOBECOM 2011. IEEE, 1--6.
[11]
Aditya Gopalan, Constantine Caramanis, and Sanjay Shakkottai. 2012. On wireless scheduling with partial channel-state information. IEEE Transactions on Information Theory 58, 1 (2012), 403--420.
[12]
David J Hodge and Kevin D Glazebrook. 2015. On the asymptotic optimality of greedy index heuristics for multi-action restless bandits. Advances in Applied Probability 47, 3 (2015), 652--667.
[13]
Shanfeng Huang, Bojie Lv, Rui Wang, and Kaibin Huang. 2020. Scheduling for mobile edge computing with random user arrivals---An approximate MDP and reinforcement learning approach. IEEE Transactions on Vehicular Technology 69, 7 (2020), 7735--7750.
[14]
Igor Kadota, Abhishek Sinha, Elif Uysal-Biyikoglu, Rahul Singh, and Eytan Modiano. 2018. Scheduling policies for minimizing age of information in broadcast wireless networks. IEEE/ACM Transactions on Networking 26, 6 (2018), 2637--2650.
[15]
Jackson A Killian, Andrew Perrault, and Milind Tambe. 2021. Beyond" to act or not to act": Fast lagrangian approaches to general multi-action restless bandits. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. 710--718.
[16]
Yang G Kim and Myung J Lee. 2014. Scheduling multi-channel and multi-timeslot in time constrained wireless sensor networks via simulated annealing and particle swarm optimization. IEEE communications Magazine 52, 1 (2014), 122--129.
[17]
Subhashini Krishnasamy, PT Akhil, Ari Arapostathis, Rajesh Sundaresan, and Sanjay Shakkottai. 2018. Augmenting max-weight with explicit learning for wireless scheduling with switching costs. IEEE/ACM Transactions on Networking 26, 6 (2018), 2501--2514.
[18]
Alex S Leong, Arunselvan Ramaswamy, Daniel E Quevedo, Holger Karl, and Ling Shi. 2020. Deep reinforcement learning for wireless sensor scheduling in cyber-physical systems. Automatica 113 (2020), 108759.
[19]
Gang Li, Chunjing Hu, Tao Peng, Xiaohui Zhou, and Yueqing Xu. 2018. High-Priority Minimum-Interference Channel Assignment in Multi-Radio Multi-Channel Wireless Networks. In Proceedings of the 2nd International Conference on Telecommunications and Communication Engineering. 314--318.
[20]
Songhua Li and Lingjie Duan. 2023. Age of Information Diffusion on Social Networks: Optimizing Multi-Stage Seeding Strategies. In Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing. 81--90.
[21]
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
[22]
Aditya Mate, Jackson Killian, Haifeng Xu, Andrew Perrault, and Milind Tambe. 2020. Collapsing bandits and their application to public health intervention. Advances in Neural Information Processing Systems 33 (2020), 15639--15650.
[23]
Navid Naderializadeh, Jaroslaw J Sydir, Meryem Simsek, and Hosein Nikopour. 2021. Resource management in wireless networks via multi-agent deep reinforcement learning. IEEE Transactions on Wireless Communications 20, 6 (2021), 3507--3523.
[24]
Khaled Nakhleh and I-Hong Hou. 2022. DeepTOP: Deep threshold-optimal policy for MDPs and RMABs. Advances in Neural Information Processing Systems 35 (2022), 28734--28746.
[25]
Ciara Pike-Burke and Steffen Grunewalder. 2019. Recovering bandits. Advances in Neural Information Processing Systems 32 (2019).
[26]
Shang-Pin Sheng, Mingyan Liu, and Romesh Saigal. 2014. Data-driven channel modeling using spectrum measurement. IEEE Transactions on Mobile Computing 14, 9 (2014), 1794--1805.
[27]
David Simchi-Levi, Rui Sun, and Xinshang Wang. 2023. Online Matching with Bayesian Rewards. Operations Research (2023).
[28]
Bejjipuram Sombabu and Sharayu Moharir. 2020. Age-of-information based scheduling for multi-channel systems. IEEE Transactions on Wireless Communications 19, 7 (2020), 4439--4448.
[29]
Rajat Talak, Sertac Karaman, and Eytan Modiano. 2020. Improving age of information in wireless networks with perfect channel state information. IEEE/ACM Transactions on Networking 28, 4 (2020), 1765--1778.
[30]
Vishrant Tripathi and Eytan Modiano. 2019. A whittle index approach to minimizing functions of age of information. In 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 1160--1167.
[31]
Shresth Verma, Aditya Mate, Kai Wang, Neha Madhiwalla, Aparna Hegde, Aparna Taneja, and Milind Tambe. 2023. Restless Multi-Armed Bandits for Maternal and Child Health: Results from Decision-Focused Learning. In AAMAS. 1312--1320.
[32]
Peng-Jun Wan. 2016. Joint selection and transmission scheduling of point-to-point communication requests in multi-channel wireless networks. In Proceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing. 231--240.
[33]
Shuang Wu, Xiaoqiang Ren, Qing-Shan Jia, Karl Henrik Johansson, and Ling Shi. 2022. Towards efficient dynamic uplink scheduling over multiple unknown channels. arXiv preprint arXiv:2212.06633 (2022).
[34]
Jun Xu and Chengcheng Guo. 2019. Scheduling stochastic real-time D2D communications. IEEE Transactions on Vehicular Technology 68, 6 (2019), 6022--6036.
[35]
Jun Xu, Jianfeng Yang, Yinbo Xie, Chengcheng Guo, and Yinbo Yu. 2016. MDP based link scheduling in wireless networks to maximize the reliability. Wireless Networks 22 (2016), 1659--1671.
[36]
Xiao Yang, Zhiyong Chen, Kuikui Li, Yaping Sun, Ning Liu, Weiliang Xie, and Yong Zhao. 2018. Communication-constrained mobile edge computing systems for wireless virtual reality: Scheduling and tradeoff. IEEE Access 6 (2018), 16665--16677.
[37]
Abolfazl Zakeri, Mohammad Moltafet, Markus Leinonen, and Marian Codreanu. 2023. Minimizing the AoI in resource-constrained multi-source relaying systems: Dynamic and learning-based scheduling. IEEE Transactions on Wireless Communications (2023).
[38]
Yihan Zou, Kwang Taik Kim, Xiaojun Lin, and Mung Chiang. 2021. Minimizing age-of-information in heterogeneous multi-channel systems: A new partial-index approach. In Proceedings of the twenty-second international symposium on theory, algorithmic foundations, and protocol design for mobile networks and mobile computing. 11--20.

Index Terms

  1. Deep Index Policy for Multi-Resource Restless Matching Bandit and Its Application in Multi-Channel Scheduling

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MobiHoc '24: Proceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
    October 2024
    511 pages
    ISBN:9798400705212
    DOI:10.1145/3641512
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2024

    Check for updates

    Author Tags

    1. online learning
    2. multi-armed bandit
    3. multi-channel scheduling

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MobiHoc '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 296 of 1,843 submissions, 16%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 62
      Total Downloads
    • Downloads (Last 12 months)62
    • Downloads (Last 6 weeks)30
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media