Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3552326.3567490acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

OLPart: Online Learning based Resource Partitioning for Colocating Multiple Latency-Critical Jobs on Commodity Computers

Published: 08 May 2023 Publication History

Abstract

Colocating multiple jobs on the same server has been a commonly used approach for improving resource utilization in cloud environments. However, performance interference due to the contention over shared resources makes resource partitioning an important research problem. Partitioning multiple resources coordinately is particularly challenging when multiple latency-critical (LC) jobs are colocated with best-effort (BE) jobs, since the QoS needs to be protected for all the LC jobs. So far, this problem is not well-addressed in the literatures.
We propose an online learning based solution, named OL-Part, for partitioning resources among multiple colocated LC jobs and BE jobs. OLPart is designed based on our observation that runtime performance counters can approximately indicate resource sensitivities of jobs. Based on this finding, OLPart leverages contextual multi-armed bandit (CMAB) to design the partitioning solution, which employs the performance counters to enable an intelligent exploration of the search space. Applying CMAB to the resource partitioning problem faces several critical challenges. OLPart proposes several techniques to overcome these challenges. OLPart does not require prior knowledge of jobs and incurs very small overhead. Evaluations demonstrate that OLPart is optimally efficient and robust, which outperforms state-of-the-art solutions with significant margins. OLPart is publicly available at https://github.com/crbnk/OpenOLPart.

Supplementary Material

PDF File (p347-chen-supp.pdf)
Supplemental files.

References

[1]
Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems 24 (2011), 2312--2320.
[2]
H. Andrew, Khawar M Abbasi, and C. Marcel. 2019. Introduction to Memory Bandwidth Allocation. https://software.intel.com/en-us/articles/introduction-to-memory-bandwidth-allocation.
[3]
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2 (2002), 235--256.
[4]
Donald A Berry and Bert Fristedt. 1985. Bandit problems: sequential allocation of experiments (Monographs on statistics and applied probability). London: Chapman and Hall 5, 71--87 (1985), 7--7.
[5]
Christian Bienia and Kai Li. 2009. Parsec 2.0: A new benchmark suite for chip-multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, Vol. 2011. 37.
[6]
Sergey Blagodurov, Alexandra Fedorova, Evgeny Vinnik, Tyler Dwyer, and Fabien Hermenier. 2015. Multi-objective job placement in clusters. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.
[7]
Sergey Blagodurov, Alexandra Fedorova, Evgeny Vinnik, Tyler Dwyer, and Fabien Hermenier. 2015. Multi-objective job placement in clusters. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.
[8]
Quan Chen, Zhenning Wang, Jingwen Leng, Chao Li, Wenli Zheng, and Minyi Guo. 2019. Avalon: towards qos awareness and improved utilization through multi-resource management in datacenters. In Proceedings of the ACM International Conference on Supercomputing. 272--283.
[9]
Ruobing Chen, Jinping Wu, Haosen Shi, Yusen Li, Xiaoguang Liu, and Gang Wang. 2020. DRLPart: A Deep Reinforcement Learning Framework for Optimally Efficient and Robust Resource Partitioning on Commodity Servers. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing. 175--188.
[10]
Shuang Chen, Christina Delimitrou, and José F Martínez. 2019. Parties: Qos-aware resource partitioning for multiple interactive services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 107--120.
[11]
Yifang Chen, Alex Cuellar, Haipeng Luo, Jignesh Modi, Heramb Nemlekar, and Stefanos Nikolaidis. 2020. Fair contextual multi-armed bandits: Theory and experiments. In Conference on Uncertainty in Artificial Intelligence. PMLR, 181--190.
[12]
Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 208--214.
[13]
Radu Ciucanu, Anatole Delabrouille, Pascal Lafourcade, and Marta Soare. 2020. Secure Cumulative Reward Maximization in Linear Stochastic Bandits. In International Conference on Provable Security. Springer, 257--277.
[14]
Christina Delimitrou and Christos Kozyrakis. 2013. ibench: Quantifying interference for datacenter applications. In 2013 IEEE international symposium on workload characterization (IISWC). IEEE, 23--33.
[15]
Christina Delimitrou and Christos Kozyrakis. 2013. Qos-aware scheduling in heterogeneous datacenters with paragon. ACM Transactions on Computer Systems (TOCS) 31, 4 (2013), 1--34.
[16]
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices 49, 4 (2014), 127--144.
[17]
Christina Delimitrou, Daniel Sanchez, and Christos Kozyrakis. 2015. Tarcil: Reconciling scheduling speed and quality in large shared clusters. In Proceedings of the Sixth ACM Symposium on Cloud Computing. 97--110.
[18]
Nosayba El-Sayed, Anurag Mukkara, Po-An Tsai, Harshad Kasture, Xiaosong Ma, and Daniel Sanchez. 2018. KPart: A hybrid cache partitioning-sharing technique for commodity multicores. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 104--117.
[19]
Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. 2020. Caladan: Mitigating Interference at Microsecond Timescales. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation. USENIX Association, USA, Article 16, 17 pages.
[20]
John Gittins, Kevin Glazebrook, and Richard Weber. 2011. Multi-armed bandit allocation indices. John Wiley & Sons.
[21]
Nicolas Gutowski, Tassadit Amghar, Olivier Camp, and Fabien Chhel. 2018. Context enhancement for linear contextual multi-armed bandits. In 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 1048--1055.
[22]
Călin Iorgulescu, Reza Azimi, Youngjin Kwon, Sameh Elnikety, Manoj Syamala, Vivek Narasayya, Herodotos Herodotou, Paulo Tomita, Alex Chen, Jack Zhang, et al. 2018. Perfiso: Performance isolation for commercial latency-sensitive services. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18). 519--532.
[23]
Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. 2009. Quincy: fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. 261--276.
[24]
Harshad Kasture, Davide B Bartolini, Nathan Beckmann, and Daniel Sanchez. 2015. Rubik: Fast analytical power management for latency-critical systems. In 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 598--610.
[25]
Harshad Kasture and Daniel Sanchez. 2014. Ubik: Efficient cache sharing with strict QoS for latency-critical workloads. ACM SIGPLAN Notices 49, 4 (2014), 729--742.
[26]
Harshad Kasture and Daniel Sanchez. 2016. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1--10.
[27]
Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6, 1 (1985), 4--22.
[28]
John Langford and Tong Zhang. 2007. The epoch-greedy algorithm for contextual multi-armed bandits. Advances in neural information processing systems 20, 1 (2007), 96--1.
[29]
Daniel James Lizotte. 2008. Practical bayesian optimization. University of Alberta.
[30]
David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). IEEE, 301--312.
[31]
David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 450--462.
[32]
Tyler Lu, Dávid Pál, and Martin Pál. 2010. Contextual multi-armed bandits. In Proceedings of the Thirteenth international conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 485--492.
[33]
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture. 248--259.
[34]
Amirhossein Mirhosseini and Thomas F Wenisch. 2019. The queuing-first approach for tail management of interactive services. IEEE Micro 39, 4 (2019), 55--64.
[35]
Khang T Nguyen. 2019. Introduction to Cache Allocation Technology in the Intel® Xeon® Processor E5 v4 Family. https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology/.
[36]
Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. 2019. Shenango: Achieving High CPU Efficiency for Latency-Sensitive Datacenter Workloads. In Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation (Boston, MA, USA) (NSDI'19). USENIX Association, USA, 361--377.
[37]
Jinsu Park, Seongbeom Park, and Woongki Baek. 2019. CoPart: Coordinated partitioning of last-level cache and memory bandwidth for fairness-aware workload consolidation on commodity servers. In Proceedings of the Fourteenth EuroSys Conference 2019. 1--16.
[38]
Tirthak Patel and Devesh Tiwari. 2020. Clite: Efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 193--206.
[39]
Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. In Proceedings of the 38th annual international symposium on Computer architecture. 57--68.
[40]
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 104, 1 (2015), 148--175.
[41]
Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. 2018. Legoos: A disseminated, distributed {OS} for hardware resource disag-gregation. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 69--87.
[42]
Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems 25 (2012).
[43]
Xiaodong Wang, Shuang Chen, Jeff Setter, and José F Martínez. 2017. SWAP: Effective fine-grain management of shared last-level caches with minimum hardware support. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 121--132.
[44]
Carole-Jean Wu and Margaret Martonosi. 2008. A comparison of capacity management schemes for shared CMP caches. In Proc. of the 7th Workshop on Duplicating, Deconstructing, and Debunking, Vol. 15. Citeseer, 50--52.
[45]
Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. ACM SIGARCH Computer Architecture News 41, 3 (2013), 607--618.
[46]
Xi Yang, Stephen M. Blackburn, and Kathryn S. McKinley. 2016. Elfen Scheduling: Fine-Grain Principled Borrowing from Latency-Critical Workloads Using Simultaneous Multithreading. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (Denver, CO, USA) (USENIX ATC '16). USENIX Association, USA, 309--322.
[47]
Wei Zhang, Weihao Cui, Kaihua Fu, Quan Chen, Daniel Edward Mawhirter, Bo Wu, Chao Li, and Minyi Guo. 2019. Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters. In Proceedings of the ACM International Conference on Supercomputing. 58--68.
[48]
Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, G. Edward Suh, and Christina Delimitrou. 2021. Sinan: ML-Based and QoS-Aware Resource Management for Cloud Microservices. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2021). Association for Computing Machinery, New York, NY, USA, 167--181.
[49]
Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. 2014. SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 406--418.
[50]
Laiping Zhao, Yanan Yang, Kaixuan Zhang, Xiaobo Zhou, Tie Qiu, Ke-qiu Li, and Yungang Bao. 2020. Rhythm: Component-Distinguishable Workload Deployment in Datacenters. In Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys '20). New York, NY, USA, Article 19, 17 pages.
[51]
Li Zhou. 2015. A survey on contextual multi-armed bandits. arXiv preprint arXiv:1508.03326 (2015).
[52]
Haishan Zhu and Mattan Erez. 2016. Dirigent: Enforcing QoS for latency-critical tasks on shared multicore systems. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. 33--47.
[53]
Haishan Zhu and Mattan Erez. 2016. Dirigent: Enforcing QoS for latency-critical tasks on shared multicore systems. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. 33--47.
[54]
Hang Zhu, Kostis Kaffes, Zixu Chen, Zhenming Liu, Christos Kozyrakis, Ion Stoica, and Xin Jin. 2020. RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers. USENIX Association, USA.

Cited By

View all
  • (2024)FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless WorkflowsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695477(957-969)Online publication date: 27-Oct-2024
  • (2024)Lavender: An Efficient Resource Partitioning Framework for Large-Scale Job ColocationACM Transactions on Architecture and Code Optimization10.1145/3674736Online publication date: 24-Jun-2024
  • (2024)ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUsProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658657(42-55)Online publication date: 3-Jun-2024
  • Show More Cited By

Index Terms

  1. OLPart: Online Learning based Resource Partitioning for Colocating Multiple Latency-Critical Jobs on Commodity Computers

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    EuroSys '23: Proceedings of the Eighteenth European Conference on Computer Systems
    May 2023
    910 pages
    ISBN:9781450394871
    DOI:10.1145/3552326
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 May 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. job colocating
    2. performance interference
    3. resource partitioning
    4. online learning

    Qualifiers

    • Research-article

    Funding Sources

    • National Science Foundation of China

    Conference

    EuroSys '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 241 of 1,308 submissions, 18%

    Upcoming Conference

    EuroSys '25
    Twentieth European Conference on Computer Systems
    March 30 - April 3, 2025
    Rotterdam , Netherlands

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)314
    • Downloads (Last 6 weeks)43
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless WorkflowsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695477(957-969)Online publication date: 27-Oct-2024
    • (2024)Lavender: An Efficient Resource Partitioning Framework for Large-Scale Job ColocationACM Transactions on Architecture and Code Optimization10.1145/3674736Online publication date: 24-Jun-2024
    • (2024)ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUsProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658657(42-55)Online publication date: 3-Jun-2024
    • (2023)Orchid: An Online Learning Based Resource Partitioning Framework for Job Colocation With Multiple ObjectivesIEEE Transactions on Computers10.1109/TC.2023.330395972:12(3443-3457)Online publication date: 14-Aug-2023
    • (2023)ODRL: Reinforcement Learning in Priority Scheduling for Running Cost Optimization2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00322(2410-2419)Online publication date: 17-Dec-2023
    • (2023)RAPIDNeurocomputing10.1016/j.neucom.2023.126737558:COnline publication date: 14-Nov-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media