Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3005745.3005750acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Resource Management with Deep Reinforcement Learning

Published: 09 November 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Resource management problems in systems and networking often manifest as difficult online decision making tasks where appropriate solutions depend on understanding the workload and environment. Inspired by recent advances in deep reinforcement learning for AI problems, we consider building systems that learn to manage resources directly from experience. We present DeepRM, an example solution that translates the problem of packing tasks with multiple resource demands into a learning problem. Our initial results show that DeepRM performs comparably to state-of-the-art heuristics, adapts to different conditions, converges quickly, and learns strategies that are sensible in hindsight.

    References

    [1]
    Terminator, http://www.imdb.com/title/tt0088247/.
    [2]
    M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow. org, 2015.
    [3]
    P. Abbeel, A. Coates, M. Quigley, and A. Y. Ng. An application of reinforcement learning to aerobatic helicopter flight. Advances in neural information processing systems, page 1, 2007.
    [4]
    S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Reoptimizing data parallel computing. In NSDI, pages 281-294, San Jose, CA, 2012. USENIX.
    [5]
    G. Ananthanarayanan, S. Kandula, A. G. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in map-reduce clusters using mantri. In OSDI, number 1, page 24, 2010.
    [6]
    M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, et al. A view of cloud computing. Communications of the ACM, (4), 2010.
    [7]
    D. P. Bertsekas and J. N. Tsitsiklis. Neuro-dynamic programming: an overview. In Decision and Control, IEEE, 1995.
    [8]
    J. A. Boyan and M. L. Littman. Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in neural information processing systems, 1994.
    [9]
    A. R. Cassandra and L. P. Kaelbling. Learning policies for partially observable environments: Scaling up. In Machine Learning Proceedings 1995, page 362. Morgan Kaufmann, 2016.
    [10]
    T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In OSDI, pages 571-582, Broomfield, CO, Oct. 2014. USENIX Association.
    [11]
    J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, pages 74-80, 2013.
    [12]
    C. Delimitrou and C. Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. ASPLOS '14, pages 127-144, New York, NY, USA, 2014. ACM.
    [13]
    M. Dong, Q. Li, D. Zarchy, P. B. Godfrey, and M. Schapira. Pcc: Re-architecting congestion control for consistent high performance. In NSDI, pages 395-408, Oakland, CA, May 2015. USENIX Association.
    [14]
    A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca. Jockey: guaranteed job latency in data parallel clusters. In Proceedings of the 7th ACM european conference on Computer Systems. ACM, 2012.
    [15]
    J. Gao and R. Evans. Deepmind ai reduces google data centre cooling bill by 40%. https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/.
    [16]
    A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant resource fairness: Fair allocation of multiple resource types. NSDI'11, pages 323-336, Berkeley, CA, USA, 2011. USENIX Association.
    [17]
    R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-resource packing for cluster schedulers. SIGCOMM '14, pages 455-466, New York, NY, USA, 2014. ACM.
    [18]
    M. T. Hagan, H. B. Demuth, M. H. Beale, and O. De Jesús. Neural network design. PWS publishing company Boston, 1996.
    [19]
    W. K. Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, (1), 1970.
    [20]
    B. Heller, S. Seetharaman, P. Mahadevan, Y. Yiakoumis, P. Sharma, S. Banerjee, and N. McKeown. Elastictree: Saving energy in data center networks. NSDI'10, Berkeley, CA, USA, 2010. USENIX Association.
    [21]
    G. Hinton. Overview of mini-batch gradient descent. Neural Networks for Machine Learning.
    [22]
    M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: fair scheduling for distributed computing clusters. In ACM SIGOPS, 2009.
    [23]
    J. Junchen, D. Rajdeep, A. Ganesh, C. Philip, P. Venkata, S. Vyas, D. Esbjorn, G. Marcin, K. Dalibor, V. Renat, and Z. Hui. A control-theoretic approach for dynamic adaptive video streaming over http. SIGCOMM '15, New York, NY, USA, 2015. ACM.
    [24]
    L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: A survey. Journal of artificial intelligence research, 1996.
    [25]
    J. Kober, J. A. Bagnell, and J. Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 2013.
    [26]
    S. Mahadevan and G. Theocharous. Optimizing production manufacturing using reinforcement learning. In FLAIRS Conference, 1998.
    [27]
    I. Menache, S. Mannor, and N. Shimkin. Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, (1), 2005.
    [28]
    V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. CoRR, 2016.
    [29]
    V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller. Playing atari with deep reinforcement learning. CoRR, 2013.
    [30]
    V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. H. I. Antonoglou, D. Wierstra, and M. A. Riedmiller. Human-level control through deep reinforcement learning. Nature, 2015.
    [31]
    G. E. Monahan. State of the art - a survey of partially observable markov decision processes: theory, models, and algorithms. Management Science, (1), 1982.
    [32]
    J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel. Trust region policy optimization. CoRR, abs/1502.05477, 2015.
    [33]
    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershevlvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 2016.
    [34]
    R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
    [35]
    R. S. Sutton, D. A. McAllester, S. P. Singh, Y. Mansour, et al. Policy gradient methods for reinforcement learning with function approximation. In NIPS, 1999.
    [36]
    V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache hadoop yarn: Yet another resource negotiator. SOCC '13, pages 5:1-5:16, New York, NY, USA, 2013. ACM.
    [37]
    K. Winstein and H. Balakrishnan. TCP Ex Machina: Computer-generated Congestion Control. In SIGCOMM, 2013.
    [38]
    K. Winstein, A. Sivaraman, and H. Balakrishnan. Stochastic forecasts achieve high throughput and low delay over cellular networks. In NSDI, pages 459-471, Lombard, IL, 2013. USENIX.
    [39]
    S. Yi, Y. Xiaoqi, J. Junchen, S. Vyas, L. Fuyuan, W. Nanshu, L. Tao, and B. Sinopoli. Cs2p: Improving video bitrate selection and adaptation with data-driven throughput prediction. SIGCOMM, New York, NY, USA, 2016. ACM.
    [40]
    X. Yin, A. Jindal, V. Sekar, and B. Sinopoli. Via: Improving internet telephony call quality using predictive relay selection. In SIGCOMM, SIGCOMM '16, 2016.
    [41]
    M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In EuroSys, 2010.
    [42]
    W. Zhang and T. G. Dietterich. A reinforcement learning approach to job-shop scheduling. In IJCAI. Citeseer, 1995.

    Cited By

    View all
    • (2024)Artificial Intelligence in IoT Security: Review of Advancements, Challenges, and Future DirectionsInternational Journal of Innovative Technology and Exploring Engineering10.35940/ijitee.G9911.1307062413:7(14-20)Online publication date: 30-Jun-2024
    • (2024)Solving Combinatorial Optimization Problems with Deep Neural Network: A SurveyTsinghua Science and Technology10.26599/TST.2023.901007629:5(1266-1282)Online publication date: Oct-2024
    • (2024)DeepCTS: A Deep Reinforcement Learning Approach for AI Container Task SchedulingProceedings of the 2024 3rd Asia Conference on Algorithms, Computing and Machine Learning10.1145/3654823.3654885(342-347)Online publication date: 22-Mar-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HotNets '16: Proceedings of the 15th ACM Workshop on Hot Topics in Networks
    November 2016
    217 pages
    ISBN:9781450346610
    DOI:10.1145/3005745
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 November 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    HotNets-XV
    Sponsor:

    Acceptance Rates

    HotNets '16 Paper Acceptance Rate 30 of 108 submissions, 28%;
    Overall Acceptance Rate 110 of 460 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)785
    • Downloads (Last 6 weeks)58
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Artificial Intelligence in IoT Security: Review of Advancements, Challenges, and Future DirectionsInternational Journal of Innovative Technology and Exploring Engineering10.35940/ijitee.G9911.1307062413:7(14-20)Online publication date: 30-Jun-2024
    • (2024)Solving Combinatorial Optimization Problems with Deep Neural Network: A SurveyTsinghua Science and Technology10.26599/TST.2023.901007629:5(1266-1282)Online publication date: Oct-2024
    • (2024)DeepCTS: A Deep Reinforcement Learning Approach for AI Container Task SchedulingProceedings of the 2024 3rd Asia Conference on Algorithms, Computing and Machine Learning10.1145/3654823.3654885(342-347)Online publication date: 22-Mar-2024
    • (2024)Trustworthy and Efficient Digital Twins in Post-Quantum Era with Hybrid Hardware-Assisted SignaturesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363825020:6(1-30)Online publication date: 8-Mar-2024
    • (2024)SPRING: Improving the Throughput of Sharding Blockchain via Deep Reinforcement Learning Based State PlacementProceedings of the ACM on Web Conference 202410.1145/3589334.3645386(2836-2846)Online publication date: 13-May-2024
    • (2024)Autonomous On-Device Protocols: Empowering Wireless with Self-Driven Capabilities2024 IEEE Wireless Communications and Networking Conference (WCNC)10.1109/WCNC57260.2024.10571037(1-6)Online publication date: 21-Apr-2024
    • (2024)Privacy-Preserving Deployment Mechanism for Service Function Chains Across Multiple DomainsIEEE Transactions on Network and Service Management10.1109/TNSM.2023.331158721:1(1241-1256)Online publication date: Feb-2024
    • (2024)Multi-Agent Deep Reinforcement Learning for Coordinated Multipoint in Mobile NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2023.330096221:1(908-924)Online publication date: Feb-2024
    • (2024)Towards Dynamic Request Updating With Elastic Scheduling for Multi-Tenant Cloud-Based Data Center NetworkIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.334190711:2(2223-2237)Online publication date: Mar-2024
    • (2024)Understanding via Exploration: Discovery of Interpretable Features With Deep Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.318495635:2(1696-1707)Online publication date: Feb-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media