research-article

OLPart: Online Learning based Resource Partitioning for Colocating Multiple Latency-Critical Jobs on Commodity Computers

Authors:

Gang WangAuthors Info & Claims

EuroSys '23: Proceedings of the Eighteenth European Conference on Computer Systems

Pages 347 - 364

https://doi.org/10.1145/3552326.3567490

Published: 08 May 2023 Publication History

Abstract

Colocating multiple jobs on the same server has been a commonly used approach for improving resource utilization in cloud environments. However, performance interference due to the contention over shared resources makes resource partitioning an important research problem. Partitioning multiple resources coordinately is particularly challenging when multiple latency-critical (LC) jobs are colocated with best-effort (BE) jobs, since the QoS needs to be protected for all the LC jobs. So far, this problem is not well-addressed in the literatures.

We propose an online learning based solution, named OL-Part, for partitioning resources among multiple colocated LC jobs and BE jobs. OLPart is designed based on our observation that runtime performance counters can approximately indicate resource sensitivities of jobs. Based on this finding, OLPart leverages contextual multi-armed bandit (CMAB) to design the partitioning solution, which employs the performance counters to enable an intelligent exploration of the search space. Applying CMAB to the resource partitioning problem faces several critical challenges. OLPart proposes several techniques to overcome these challenges. OLPart does not require prior knowledge of jobs and incurs very small overhead. Evaluations demonstrate that OLPart is optimally efficient and robust, which outperforms state-of-the-art solutions with significant margins. OLPart is publicly available at https://github.com/crbnk/OpenOLPart.

Supplementary Material

PDF File (p347-chen-supp.pdf)

Supplemental files.

Download
435.36 KB

References

[1]

Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems 24 (2011), 2312--2320.

[2]

H. Andrew, Khawar M Abbasi, and C. Marcel. 2019. Introduction to Memory Bandwidth Allocation. https://software.intel.com/en-us/articles/introduction-to-memory-bandwidth-allocation.

[3]

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2 (2002), 235--256.

Digital Library

[4]

Donald A Berry and Bert Fristedt. 1985. Bandit problems: sequential allocation of experiments (Monographs on statistics and applied probability). London: Chapman and Hall 5, 71--87 (1985), 7--7.

[5]

Christian Bienia and Kai Li. 2009. Parsec 2.0: A new benchmark suite for chip-multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, Vol. 2011. 37.

[6]

Sergey Blagodurov, Alexandra Fedorova, Evgeny Vinnik, Tyler Dwyer, and Fabien Hermenier. 2015. Multi-objective job placement in clusters. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.

Digital Library

[7]

Sergey Blagodurov, Alexandra Fedorova, Evgeny Vinnik, Tyler Dwyer, and Fabien Hermenier. 2015. Multi-objective job placement in clusters. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.

Digital Library

[8]

Quan Chen, Zhenning Wang, Jingwen Leng, Chao Li, Wenli Zheng, and Minyi Guo. 2019. Avalon: towards qos awareness and improved utilization through multi-resource management in datacenters. In Proceedings of the ACM International Conference on Supercomputing. 272--283.

Digital Library

[9]

Ruobing Chen, Jinping Wu, Haosen Shi, Yusen Li, Xiaoguang Liu, and Gang Wang. 2020. DRLPart: A Deep Reinforcement Learning Framework for Optimally Efficient and Robust Resource Partitioning on Commodity Servers. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing. 175--188.

[10]

Shuang Chen, Christina Delimitrou, and José F Martínez. 2019. Parties: Qos-aware resource partitioning for multiple interactive services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 107--120.

Digital Library

[11]

Yifang Chen, Alex Cuellar, Haipeng Luo, Jignesh Modi, Heramb Nemlekar, and Stefanos Nikolaidis. 2020. Fair contextual multi-armed bandits: Theory and experiments. In Conference on Uncertainty in Artificial Intelligence. PMLR, 181--190.

[12]

Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 208--214.

[13]

Radu Ciucanu, Anatole Delabrouille, Pascal Lafourcade, and Marta Soare. 2020. Secure Cumulative Reward Maximization in Linear Stochastic Bandits. In International Conference on Provable Security. Springer, 257--277.

[14]

Christina Delimitrou and Christos Kozyrakis. 2013. ibench: Quantifying interference for datacenter applications. In 2013 IEEE international symposium on workload characterization (IISWC). IEEE, 23--33.

[15]

Christina Delimitrou and Christos Kozyrakis. 2013. Qos-aware scheduling in heterogeneous datacenters with paragon. ACM Transactions on Computer Systems (TOCS) 31, 4 (2013), 1--34.

Digital Library

[16]

Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices 49, 4 (2014), 127--144.

Digital Library

[17]

Christina Delimitrou, Daniel Sanchez, and Christos Kozyrakis. 2015. Tarcil: Reconciling scheduling speed and quality in large shared clusters. In Proceedings of the Sixth ACM Symposium on Cloud Computing. 97--110.

Digital Library

[18]

Nosayba El-Sayed, Anurag Mukkara, Po-An Tsai, Harshad Kasture, Xiaosong Ma, and Daniel Sanchez. 2018. KPart: A hybrid cache partitioning-sharing technique for commodity multicores. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 104--117.

[19]

Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. 2020. Caladan: Mitigating Interference at Microsecond Timescales. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation. USENIX Association, USA, Article 16, 17 pages.

[20]

John Gittins, Kevin Glazebrook, and Richard Weber. 2011. Multi-armed bandit allocation indices. John Wiley & Sons.

[21]

Nicolas Gutowski, Tassadit Amghar, Olivier Camp, and Fabien Chhel. 2018. Context enhancement for linear contextual multi-armed bandits. In 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 1048--1055.

[22]

Călin Iorgulescu, Reza Azimi, Youngjin Kwon, Sameh Elnikety, Manoj Syamala, Vivek Narasayya, Herodotos Herodotou, Paulo Tomita, Alex Chen, Jack Zhang, et al. 2018. Perfiso: Performance isolation for commercial latency-sensitive services. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18). 519--532.

[23]

Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. 2009. Quincy: fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. 261--276.

Digital Library

[24]

Harshad Kasture, Davide B Bartolini, Nathan Beckmann, and Daniel Sanchez. 2015. Rubik: Fast analytical power management for latency-critical systems. In 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 598--610.

Digital Library

[25]

Harshad Kasture and Daniel Sanchez. 2014. Ubik: Efficient cache sharing with strict QoS for latency-critical workloads. ACM SIGPLAN Notices 49, 4 (2014), 729--742.

Digital Library

[26]

Harshad Kasture and Daniel Sanchez. 2016. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1--10.

[27]

Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6, 1 (1985), 4--22.

[28]

John Langford and Tong Zhang. 2007. The epoch-greedy algorithm for contextual multi-armed bandits. Advances in neural information processing systems 20, 1 (2007), 96--1.

[29]

Daniel James Lizotte. 2008. Practical bayesian optimization. University of Alberta.

Digital Library

[30]

David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). IEEE, 301--312.

[31]

David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 450--462.

Digital Library

[32]

Tyler Lu, Dávid Pál, and Martin Pál. 2010. Contextual multi-armed bandits. In Proceedings of the Thirteenth international conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 485--492.

[33]

Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture. 248--259.

Digital Library

[34]

Amirhossein Mirhosseini and Thomas F Wenisch. 2019. The queuing-first approach for tail management of interactive services. IEEE Micro 39, 4 (2019), 55--64.

Digital Library

[35]

Khang T Nguyen. 2019. Introduction to Cache Allocation Technology in the Intel^® Xeon^® Processor E5 v4 Family. https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology/.

[36]

Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. 2019. Shenango: Achieving High CPU Efficiency for Latency-Sensitive Datacenter Workloads. In Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation (Boston, MA, USA) (NSDI'19). USENIX Association, USA, 361--377.

[37]

Jinsu Park, Seongbeom Park, and Woongki Baek. 2019. CoPart: Coordinated partitioning of last-level cache and memory bandwidth for fairness-aware workload consolidation on commodity servers. In Proceedings of the Fourteenth EuroSys Conference 2019. 1--16.

Digital Library

[38]

Tirthak Patel and Devesh Tiwari. 2020. Clite: Efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 193--206.

[39]

Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. In Proceedings of the 38th annual international symposium on Computer architecture. 57--68.

Digital Library

[40]

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 104, 1 (2015), 148--175.

[41]

Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. 2018. Legoos: A disseminated, distributed {OS} for hardware resource disag-gregation. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 69--87.

[42]

Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems 25 (2012).

[43]

Xiaodong Wang, Shuang Chen, Jeff Setter, and José F Martínez. 2017. SWAP: Effective fine-grain management of shared last-level caches with minimum hardware support. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 121--132.

[44]

Carole-Jean Wu and Margaret Martonosi. 2008. A comparison of capacity management schemes for shared CMP caches. In Proc. of the 7th Workshop on Duplicating, Deconstructing, and Debunking, Vol. 15. Citeseer, 50--52.

[45]

Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. ACM SIGARCH Computer Architecture News 41, 3 (2013), 607--618.

Digital Library

[46]

Xi Yang, Stephen M. Blackburn, and Kathryn S. McKinley. 2016. Elfen Scheduling: Fine-Grain Principled Borrowing from Latency-Critical Workloads Using Simultaneous Multithreading. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (Denver, CO, USA) (USENIX ATC '16). USENIX Association, USA, 309--322.

[47]

Wei Zhang, Weihao Cui, Kaihua Fu, Quan Chen, Daniel Edward Mawhirter, Bo Wu, Chao Li, and Minyi Guo. 2019. Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters. In Proceedings of the ACM International Conference on Supercomputing. 58--68.

Digital Library

[48]

Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, G. Edward Suh, and Christina Delimitrou. 2021. Sinan: ML-Based and QoS-Aware Resource Management for Cloud Microservices. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2021). Association for Computing Machinery, New York, NY, USA, 167--181.

Digital Library

[49]

Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. 2014. SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 406--418.

Digital Library

[50]

Laiping Zhao, Yanan Yang, Kaixuan Zhang, Xiaobo Zhou, Tie Qiu, Ke-qiu Li, and Yungang Bao. 2020. Rhythm: Component-Distinguishable Workload Deployment in Datacenters. In Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys '20). New York, NY, USA, Article 19, 17 pages.

Digital Library

[51]

Li Zhou. 2015. A survey on contextual multi-armed bandits. arXiv preprint arXiv:1508.03326 (2015).

[52]

Haishan Zhu and Mattan Erez. 2016. Dirigent: Enforcing QoS for latency-critical tasks on shared multicore systems. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. 33--47.

Digital Library

[53]

Haishan Zhu and Mattan Erez. 2016. Dirigent: Enforcing QoS for latency-critical tasks on shared multicore systems. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. 33--47.

Digital Library

[54]

Hang Zhu, Kostis Kaffes, Zixu Chen, Zhenming Liu, Christos Kozyrakis, Ion Stoica, and Xin Jin. 2020. RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers. USENIX Association, USA.

Cited By

Wang YChen PDou HZhang YYu GHe ZHuang HFilkov VRay BZhou M(2024)FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless WorkflowsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695477(957-969)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695477
Peng WLi YLiu XWang G(2024)Lavender: An Efficient Resource Partitioning Framework for Large-Scale Job ColocationACM Transactions on Architecture and Code Optimization10.1145/3674736Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3674736
Hui XXu YGuo ZShen XMencagli GDazzi PLowenthal DBadia R(2024)ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUsProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658657(42-55)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658657
Show More Cited By

Index Terms

OLPart: Online Learning based Resource Partitioning for Colocating Multiple Latency-Critical Jobs on Commodity Computers
1. Computing methodologies
  1. Artificial intelligence

Recommendations

PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services
ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

Multi-tenancy in modern datacenters is currently limited to a single latency-critical, interactive service, running alongside one or more low-priority, best-effort jobs. This limits the efficiency gains from multi-tenancy, especially as an increasing ...
Jointly Optimizing Job Assignment and Resource Partitioning for Improving System Throughput in Cloud Datacenters
Colocating multiple jobs on the same server has been widely applied for improving resource utilization in cloud datacenters. However, the colocated jobs would contend for the shared resources, which could lead to significant performance degradation. An ...
Static Approximation Algorithms for Regularity-based Resource Partitioning
RTSS '12: Proceedings of the 2012 IEEE 33rd Real-Time Systems Symposium

As a hierarchical real-time system framework, the Regularity-based Resource Partition Model allocates physical resources in time intervals determined by integral numbers of a time unit to tasks in different applications. A Regularity-based Resource ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EuroSys '23: Proceedings of the Eighteenth European Conference on Computer Systems

May 2023

910 pages

ISBN:9781450394871

DOI:10.1145/3552326

General Co-chairs:
Giuseppe Antonio Di Luna
University of Rome La Sapienza
,
Leonardo Querzoni
University of Rome La Sapienza
,
Program Co-chairs:
Alexandra Fedorova
University of British Columbia
,
Dushyanth Narayanan
Microsoft Research

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Available / v1.1

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation of China

Conference

EuroSys '23

Sponsor:

SIGOPS

EuroSys '23: Eighteenth European Conference on Computer Systems

May 8 - 12, 2023

Rome, Italy

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25

Sponsor:
sigops

Twentieth European Conference on Computer Systems

March 30 - April 3, 2025

Rotterdam , Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
617
Total Downloads

Downloads (Last 12 months)314
Downloads (Last 6 weeks)43

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang YChen PDou HZhang YYu GHe ZHuang HFilkov VRay BZhou M(2024)FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless WorkflowsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695477(957-969)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695477
Peng WLi YLiu XWang G(2024)Lavender: An Efficient Resource Partitioning Framework for Large-Scale Job ColocationACM Transactions on Architecture and Code Optimization10.1145/3674736Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3674736
Hui XXu YGuo ZShen XMencagli GDazzi PLowenthal DBadia R(2024)ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUsProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658657(42-55)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658657
Chen RPeng WLi YLiu XWang G(2023)Orchid: An Online Learning Based Resource Partitioning Framework for Job Colocation With Multiple ObjectivesIEEE Transactions on Computers10.1109/TC.2023.330395972:12(3443-3457)Online publication date: 14-Aug-2023
https://dl.acm.org/doi/10.1109/TC.2023.3303959
Kuang CDuan MLv TWu YLi LWang L(2023)ODRL: Reinforcement Learning in Priority Scheduling for Running Cost Optimization2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00322(2410-2419)Online publication date: 17-Dec-2023
https://doi.org/10.1109/ICPADS60453.2023.00322
Penney DLi BChen LSydir JDrewek-Ossowicka AIllikkal RTai CIyer RHerdrich A(2023)RAPIDNeurocomputing10.1016/j.neucom.2023.126737558:COnline publication date: 14-Nov-2023
https://dl.acm.org/doi/10.1016/j.neucom.2023.126737

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents