research-article

Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems

Authors:

Mattan ErezAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 44, Issue 2

Pages 33 - 47

https://doi.org/10.1145/2980024.2872394

Published: 25 March 2016 Publication History

Abstract

Latency-critical applications suffer from both average performance degradation and reduced completion time predictability when collocated with batch tasks. Such variation forces the system to overprovision resources to ensure Quality of Service (QoS) for latency-critical tasks, degrading overall system throughput. We explore the causes of this variation and exploit the opportunities of mitigating variation directly to simultaneously improve both QoS and utilization. We develop, implement, and evaluate Dirigent, a lightweight performance-management runtime system that accurately controls the QoS of latency-critical applications at fine time scales, leveraging existing architecture mechanisms. We evaluate Dirigent on a real machine and show that it is significantly more effective than configurations representative of prior schemes.

References

[1]

Alia Atlas and Azer Bestavros. Statistical rate monotonic scheduling. In Real-Time Systems Symposium, 1998. Proceedings., The 19th IEEE. IEEE, 1998.

[2]

Luiz Andres Barroso and Urs Hoelzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan Claypool, 2009.

[3]

Christian Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011.

Digital Library

[4]

Dominik Brodowski and Nico Golde. CPU Frequency and Voltage Scaling Code in the Linux kernel.

[5]

Tao Chen, Alexander Rucker, and G Edward Suh. Execution time prediction for energy-efficient hardware accelerators. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 2015.

Digital Library

[6]

Yixin Chen and Li Tu. Density-based clustering for real-time stream data. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2007.

Digital Library

[7]

Derek Chiou, Prabhat Jain, Srinivas Devadas, and Larry Rudolph. Dynamic cache partitioning via columnization. In Proceedings of Design Automation Conference. Citeseer, 2000.

[8]

Jason Clemons, Haishan Zhu, Silvio Savarese, and Todd Austin. Mevbench: A mobile computer vision benchmarking suite. In Workload Characterization (IISWC), 2011 IEEE International Symposium on. IEEE, 2011.

Digital Library

[9]

Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A Patterson, and Krste Asanovic. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In ACM SIGARCH Computer Architecture News. ACM, 2013.

[10]

Ryan R. Curtin, James R. Cline, Neil P. Slagle, William B. March, P. Ram, Nishant A. Mehta, and Alexander G. Gray. MLPACK: A scalable C++ machine learning library. Journal of Machine Learning Research, 2013.

[11]

Jeffrey Dean and Luiz Andre Barroso. The tail at scale. Communications of the ACM, 2013.

Digital Library

[12]

Christina Delimitrou and Christos Kozyrakis. Paragon: Qos-aware scheduling for heterogeneous datacenters. ACM SIGARCH Computer Architecture News, 2013.

[13]

Christina Delimitrou and Christos Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices, 2014.

[14]

Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Yale N Patt. Fairness via source throttling: a configurable and high- performance fairness substrate for multi-core memory systems. In ACM Sigplan Notices. ACM, 2010.

Digital Library

[15]

Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Yale N Patt. Prefetch-aware shared resource management for multi-core systems. ACM SIGARCH Computer Architecture News, 2011.

[16]

Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. Power provisioning for a warehouse-sized computer. In ACM SIGARCH Computer Architecture News. ACM, 2007.

[17]

Johann Hauswald, Michael A Laurenzano, Yunqi Zhang, Cheng Li, Austin Rovinski, Arjun Khurana, Ronald G Dres- linski, Trevor Mudge, Vinicius Petrucci, Lingjia Tang, et al. Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 2015.

Digital Library

[18]

John L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News, 2006.

[19]

Andrew Herdrich, Ramesh Illikkal, Ravi Iyer, Don Newell, Vineet Chadha, and Jaideep Moses. Rate-based qos techniques for cache/memory in cmp platforms. In Proceedings of the 23rd international conference on Supercomputing. ACM, 2009.

Digital Library

[20]

Henry Hoffmann, Jonathan Eastep, Marco D Santambrogio, Jason E Miller, and Anant Agarwal. Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments. In Proceedings of the 7th international conference on Autonomic computing. ACM, 2010.

Digital Library

[21]

Chang-Hong Hsu, Yunqi Zhang, Michael Laurenzano, David Meisner, Thomas Wenisch, Jason Mars, Lingjia Tang, Ronald G Dreslinski, et al. Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting. In High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on. IEEE, 2015.

[22]

Ramesh Illikkal, Vineet Chadha, Andrew Herdrich, Ravi Iyer, and Donald Newell. PIRATE: QoS and performance management in CMP architectures. ACM SIGMETRICS Performance Evaluation Review, 2010.

[23]

Intel. Intel Product Information.

[24]

Intel. Cache Monitoring Technology and Cache Allocation Technology.

[25]

Intel. Intel 64 and IA-32 Architectures Software Developer Manuals.

[26]

Ravi Iyer. Cqos: a framework for enabling qos in shared caches of cmp platforms. In Proceedings of the 18th annual international conference on Supercomputing. ACM, 2004.

[27]

Ravi Iyer, Li Zhao, Fei Guo, Ramesh Illikkal, Srihari Makineni, Don Newell, Yan Solihin, Lisa Hsu, and Steve Reinhardt. Qos policies and architecture for cache/memory in cmp platforms. ACM SIGMETRICS Performance Evaluation Review, 2007.

[28]

Min Kyu Jeong, Mattan Erez, Chander Sudanthi, and Nigel Paver. A qos-aware memory controller for dynamically balancing gpu and cpu bandwidth use in an mpsoc. In Proceedings of the 49th Annual Design Automation Conference. ACM, 2012.

Digital Library

[29]

Min Kyu Jeong, Doe Hyun Yoon, Dam Sunwoo, Mike Sullivan, Ikhwan Lee, and Mattan Erez. Balancing dram locality and parallelism in shared memory cmp systems. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on. IEEE, 2012.

Digital Library

[30]

Melanie Kambadur, Tipp Moseley, Rick Hank, and Martha A. Kim. Measuring interference between live datacenter applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, 2012.

Digital Library

[31]

Harshad Kasture, Davide B Bartolini Nathan Beckmann, and Daniel Sanchez. Rubik: Fast analytical power management for latency-critical systems. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 2015.

Digital Library

[32]

Harshad Kasture and Daniel Sanchez. Ubik: efficient cache sharing with strict qos for latency-critical workloads. ACM SIGARCH Computer Architecture News, 2014.

[33]

Wonyoung Kim, Meeta S Gupta, Gu-Yeon Wei, and David Brooks. System level analysis of fast, per-core dvfs using on-chip switching regulators. In High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on. IEEE, 2008.

[34]

Yoongu Kim, Dongsu Han, Onur Mutlu, and Mor Harchol-Balter. Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers. In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. IEEE, 2010.

[35]

Karthik Kumar, Jibang Liu, Yung-Hsiang Lu, and Bharat Bhargava. A survey of computation offloading for mobile systems. Mobile Networks and Applications, 2013.

Digital Library

[36]

Min Lee, AS Krishnakumar, Parameshwaran Krishnan, Navjot Singh, and Shalini Yajnik. Supporting soft real-time tasks in the xen hypervisor. In ACM Sigplan Notices. ACM, 2010.

Digital Library

[37]

Jacob Leverich and Christos Kozyrakis. Reconciling high server utilization and sub-millisecond quality-of-service. In Proceedings of the Ninth European Conference on Computer Systems, EuroSys '14, 2014.

Digital Library

[38]

Chit-Kwan Lin and H. T. Kung. Mobile app acceleration via fine-grain offloading to the cloud. In 6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14). USENIX Association, 2014.

Digital Library

[39]

Fang Liu, Xiaowei Jiang, and Yan Solihin. Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. IEEE, 2010.

[40]

Daniel Lo, Taejoon Song, and G Edward Suh. Prediction- guided performance-energy trade-off for interactive applications. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 2015.

Digital Library

[41]

David Lo, Liqun Cheng, Rama Govindaraju, Luiz Andre Barroso, and Christos Kozyrakis. Towards energy proportionality for large-scale latency-critical workloads. In Proceeding of the 41st annual international symposium on Computer architecuture. IEEE Press, 2014.

[42]

David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. Heracles: improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. ACM, 2015.

Digital Library

[43]

Ying Lu, Tarek Abdelzaher, Chenyang Lu, and Gang Tao. An adaptive control framework for qos guarantees and its application to differentiated caching. In Quality of Service, 2002. Tenth IEEE International Workshop on. IEEE, 2002.

[44]

Jiuyue Ma, Xiufeng Sui, Ninghui Sun, Yupeng Li, Zihao Yu, Bowen Huang, Tianni Xu, Zhicheng Yao, Yun Chen, Haibin Wang, et al. Supporting differentiated services in computers via programmable architecture for resourcing-on-demand (pard). In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 2015.

Digital Library

[45]

Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2011.

Digital Library

[46]

David Meisner, Brian T. Gold, and Thomas F. Wenisch. Powernap: Eliminating server idle power. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV. ACM, 2009.

Digital Library

[47]

Rustam Miftakhutdinov, Eiman Ebrahimi, and Yale N Patt. Predicting performance impact of dvfs for realistic memory systems. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2012.

Digital Library

[48]

Sai Prashanth Muralidhara, Lavanya Subramanian, Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2011.

Digital Library

[49]

Onur Mutlu and Thomas Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2007.

Digital Library

[50]

Onur Mutlu and Thomas Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In ACM SIGARCH Computer Architecture News. IEEE Computer Society, 2008.

[51]

Ripal Nathuji, Aman Kansal, and Alireza Ghaffarkhah. Q-clouds: managing performance interference effects for qos-aware clouds. In Proceedings of the 5th European conference on Computer systems. ACM, 2010.

Digital Library

[52]

Dejan Novakovic, Nedeljko Vasic, Stanko Novakovic, Dejan Kostic, and Ricardo Bianchini. Deepdive: Transparently identifying and managing performance interference in virtualized environments. Technical Report 183449, EPFL, 2013.

[53]

Moinuddin K Qureshi and Yale N Patt. Utility-based cache partitioning: A low-overhead, high-performance, run- time mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2006.

Digital Library

[54]

Arun Raghavan, Yixin Luo, Anuj Chandawalla, Marios Papaefthymiou, Kevin P Pipe, Thomas F Wenisch, and Milo MK Martin. Computational sprinting. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on. IEEE, 2012.

Digital Library

[55]

Daniel Sanchez and Christos Kozyrakis. Vantage: scalable and efficient fine-grain cache partitioning. In ACM SIGARCH Computer Architecture News. ACM, 2011.

[56]

Lui Sha, Tarek Abdelzaher, Karl-Erik Arzen, Anton Cervin, Theodore Baker, Alan Burns, Giorgio Buttazzo, Marco Caccamo, John Lehoczky, and Aloysius K Mok. Real time scheduling theory: A historical perspective. Real-time systems, 2004.

[57]

Techspot. Facebook to build a $1 billion wind-powered data center in Fort Worth.

[58]

Hiroyuki Usui, Lavanya Subramanian, Kevin Chang, and Onur Mutlu. Squash: Simple qos-aware high-performance memory scheduler for heterogeneous systems with hardware accelerators. arXiv preprint arXiv:1505.07502, 2015.

[59]

Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and TN Vijaykumar. Timetrader: Exploiting latency tail to save datacenter energy for online search. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 2015.

Digital Library

[60]

Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. In ACM SIGARCH Computer Architecture News. ACM, 2013.

[61]

Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Marco Caccamo, and Lui Sha. Memguard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2013 IEEE 19th. IEEE, 2013.

[62]

Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. Cpi2: Cpu performance isolation for shared compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, New York, NY, USA, 2013. ACM.

Digital Library

[63]

Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. Smite: Precise qos prediction on real-system smt processors to improve utilization in warehouse scale computers. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-47, 2014.

Digital Library

[64]

Jiacheng Zhao, Huimin Cui, Jingling Xue, Xiaobing Feng, Youliang Yan, and Wensen Yang. An empirical model for predicting cross-core performance interference on multicore processors. In Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques, PACT '13, 2013.

Digital Library

[65]

Yanqi Zhou and David Wentzlaff. The sharing architecture: sub-core configurability for iaas clouds. In ACM SIGARCH Computer Architecture News. ACM, 2014.

Cited By

Zhao LCui YYang YZhou XQiu TLi KBao Y(2023)Component-distinguishable Co-location and Resource Reclamation for High-throughput ComputingACM Transactions on Computer Systems10.1145/363000642:1-2(1-37)Online publication date: 18-Nov-2023
https://dl.acm.org/doi/10.1145/3630006
Li BPatel TSamsi SGadepally VTiwari DGavrilovska AAltınbüken DBinnig C(2022)MISOProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563510(173-189)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3542929.3563510
Xu YBelviranli MShen XVetter J(2021)PCCS: Processor-Centric Contention-aware Slowdown Model for Heterogeneous System-on-ChipsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480101(1282-1295)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480101
Show More Cited By

Index Terms

Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems
1. Computer systems organization

Recommendations

Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems

Latency-critical applications suffer from both average performance degradation and reduced completion time predictability when collocated with batch tasks. Such variation forces the system to overprovision resources to ensure Quality of Service (QoS) ...
Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems
ASPLOS '16

Latency-critical applications suffer from both average performance degradation and reduced completion time predictability when collocated with batch tasks. Such variation forces the system to overprovision resources to ensure Quality of Service (QoS) ...
Rubik: fast analytical power management for latency-critical systems
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

Latency-critical workloads (e.g., web search), common in datacenters, require stable tail (e.g., 95^th percentile) latencies of a few milliseconds. Servers running these workloads are kept lightly loaded to meet these stringent latency targets. This low ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 44, Issue 2

ASPLOS'16

May 2016

774 pages

ISSN:0163-5964

DOI:10.1145/2980024

Editor:
Doug DeGroot
acm dot org

Issue’s Table of Contents

ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
March 2016
824 pages
ISBN:9781450340915
DOI:10.1145/2872362
General Chair:
Tom Conte
Georgia Tech, USA
,
Program Chair:
Yuanyuan Zhou
University of California, San Diego, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2016

Published in SIGARCH Volume 44, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

102
Total Citations
View Citations
1,403
Total Downloads

Downloads (Last 12 months)79
Downloads (Last 6 weeks)7

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao LCui YYang YZhou XQiu TLi KBao Y(2023)Component-distinguishable Co-location and Resource Reclamation for High-throughput ComputingACM Transactions on Computer Systems10.1145/363000642:1-2(1-37)Online publication date: 18-Nov-2023
https://dl.acm.org/doi/10.1145/3630006
Li BPatel TSamsi SGadepally VTiwari DGavrilovska AAltınbüken DBinnig C(2022)MISOProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563510(173-189)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3542929.3563510
Xu YBelviranli MShen XVetter J(2021)PCCS: Processor-Centric Contention-aware Slowdown Model for Heterogeneous System-on-ChipsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480101(1282-1295)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480101
Masouros DXydis SSoudris D(2021)Rusty: Runtime Interference-Aware Predictive Monitoring for Modern Multi-Tenant SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.301394832:1(184-198)Online publication date: 1-Jan-2021
https://doi.org/10.1109/TPDS.2020.3013948
Roy RPatel TTiwari DMartínez JDuato JJohn L(2021)SatoriProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00031(292-305)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00031
Yuan YAlian MWang YWang RKurakin ITai CKim NMartínez JDuato JJohn L(2021)Don't forget the I/O when allocating your LLCProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00018(112-125)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00018
Zhao LYang YZhang KZhou XQiu TLi KBao YBilas AMagoutis KMarkatos EKostic DSeltzer M(2020)RhythmProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387534(1-17)Online publication date: 15-Apr-2020
https://dl.acm.org/doi/10.1145/3342195.3387534
Patel TTiwari D(2020)CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale Computers2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00025(193-206)Online publication date: Feb-2020
https://doi.org/10.1109/HPCA47549.2020.00025
Nam YYoo BChoi YSon YEom H(2020)Towards Hybrid Isolation for Shared Multicore SystemsJob Scheduling Strategies for Parallel Processing10.1007/978-3-030-63171-0_2(25-44)Online publication date: 22-May-2020
https://dl.acm.org/doi/10.1007/978-3-030-63171-0_2
Peng YVarman PYadgar GNoh S(2019)Fair-EDFProceedings of the 11th USENIX Conference on Hot Topics in Storage and File Systems10.5555/3357062.3357070(6-6)Online publication date: 8-Jul-2019
https://dl.acm.org/doi/10.5555/3357062.3357070
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents