Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems

Published: 25 March 2016 Publication History

Abstract

Latency-critical applications suffer from both average performance degradation and reduced completion time predictability when collocated with batch tasks. Such variation forces the system to overprovision resources to ensure Quality of Service (QoS) for latency-critical tasks, degrading overall system throughput. We explore the causes of this variation and exploit the opportunities of mitigating variation directly to simultaneously improve both QoS and utilization. We develop, implement, and evaluate Dirigent, a lightweight performance-management runtime system that accurately controls the QoS of latency-critical applications at fine time scales, leveraging existing architecture mechanisms. We evaluate Dirigent on a real machine and show that it is significantly more effective than configurations representative of prior schemes.

References

[1]
Alia Atlas and Azer Bestavros. Statistical rate monotonic scheduling. In Real-Time Systems Symposium, 1998. Proceedings., The 19th IEEE. IEEE, 1998.
[2]
Luiz Andres Barroso and Urs Hoelzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan Claypool, 2009.
[3]
Christian Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011.
[4]
Dominik Brodowski and Nico Golde. CPU Frequency and Voltage Scaling Code in the Linux kernel.
[5]
Tao Chen, Alexander Rucker, and G Edward Suh. Execution time prediction for energy-efficient hardware accelerators. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 2015.
[6]
Yixin Chen and Li Tu. Density-based clustering for real-time stream data. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2007.
[7]
Derek Chiou, Prabhat Jain, Srinivas Devadas, and Larry Rudolph. Dynamic cache partitioning via columnization. In Proceedings of Design Automation Conference. Citeseer, 2000.
[8]
Jason Clemons, Haishan Zhu, Silvio Savarese, and Todd Austin. Mevbench: A mobile computer vision benchmarking suite. In Workload Characterization (IISWC), 2011 IEEE International Symposium on. IEEE, 2011.
[9]
Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A Patterson, and Krste Asanovic. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In ACM SIGARCH Computer Architecture News. ACM, 2013.
[10]
Ryan R. Curtin, James R. Cline, Neil P. Slagle, William B. March, P. Ram, Nishant A. Mehta, and Alexander G. Gray. MLPACK: A scalable C++ machine learning library. Journal of Machine Learning Research, 2013.
[11]
Jeffrey Dean and Luiz Andre Barroso. The tail at scale. Communications of the ACM, 2013.
[12]
Christina Delimitrou and Christos Kozyrakis. Paragon: Qos-aware scheduling for heterogeneous datacenters. ACM SIGARCH Computer Architecture News, 2013.
[13]
Christina Delimitrou and Christos Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices, 2014.
[14]
Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Yale N Patt. Fairness via source throttling: a configurable and high- performance fairness substrate for multi-core memory systems. In ACM Sigplan Notices. ACM, 2010.
[15]
Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Yale N Patt. Prefetch-aware shared resource management for multi-core systems. ACM SIGARCH Computer Architecture News, 2011.
[16]
Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. Power provisioning for a warehouse-sized computer. In ACM SIGARCH Computer Architecture News. ACM, 2007.
[17]
Johann Hauswald, Michael A Laurenzano, Yunqi Zhang, Cheng Li, Austin Rovinski, Arjun Khurana, Ronald G Dres- linski, Trevor Mudge, Vinicius Petrucci, Lingjia Tang, et al. Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 2015.
[18]
John L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News, 2006.
[19]
Andrew Herdrich, Ramesh Illikkal, Ravi Iyer, Don Newell, Vineet Chadha, and Jaideep Moses. Rate-based qos techniques for cache/memory in cmp platforms. In Proceedings of the 23rd international conference on Supercomputing. ACM, 2009.
[20]
Henry Hoffmann, Jonathan Eastep, Marco D Santambrogio, Jason E Miller, and Anant Agarwal. Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments. In Proceedings of the 7th international conference on Autonomic computing. ACM, 2010.
[21]
Chang-Hong Hsu, Yunqi Zhang, Michael Laurenzano, David Meisner, Thomas Wenisch, Jason Mars, Lingjia Tang, Ronald G Dreslinski, et al. Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting. In High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on. IEEE, 2015.
[22]
Ramesh Illikkal, Vineet Chadha, Andrew Herdrich, Ravi Iyer, and Donald Newell. PIRATE: QoS and performance management in CMP architectures. ACM SIGMETRICS Performance Evaluation Review, 2010.
[23]
Intel. Intel Product Information.
[24]
Intel. Cache Monitoring Technology and Cache Allocation Technology.
[25]
Intel. Intel 64 and IA-32 Architectures Software Developer Manuals.
[26]
Ravi Iyer. Cqos: a framework for enabling qos in shared caches of cmp platforms. In Proceedings of the 18th annual international conference on Supercomputing. ACM, 2004.
[27]
Ravi Iyer, Li Zhao, Fei Guo, Ramesh Illikkal, Srihari Makineni, Don Newell, Yan Solihin, Lisa Hsu, and Steve Reinhardt. Qos policies and architecture for cache/memory in cmp platforms. ACM SIGMETRICS Performance Evaluation Review, 2007.
[28]
Min Kyu Jeong, Mattan Erez, Chander Sudanthi, and Nigel Paver. A qos-aware memory controller for dynamically balancing gpu and cpu bandwidth use in an mpsoc. In Proceedings of the 49th Annual Design Automation Conference. ACM, 2012.
[29]
Min Kyu Jeong, Doe Hyun Yoon, Dam Sunwoo, Mike Sullivan, Ikhwan Lee, and Mattan Erez. Balancing dram locality and parallelism in shared memory cmp systems. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on. IEEE, 2012.
[30]
Melanie Kambadur, Tipp Moseley, Rick Hank, and Martha A. Kim. Measuring interference between live datacenter applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, 2012.
[31]
Harshad Kasture, Davide B Bartolini Nathan Beckmann, and Daniel Sanchez. Rubik: Fast analytical power management for latency-critical systems. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 2015.
[32]
Harshad Kasture and Daniel Sanchez. Ubik: efficient cache sharing with strict qos for latency-critical workloads. ACM SIGARCH Computer Architecture News, 2014.
[33]
Wonyoung Kim, Meeta S Gupta, Gu-Yeon Wei, and David Brooks. System level analysis of fast, per-core dvfs using on-chip switching regulators. In High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on. IEEE, 2008.
[34]
Yoongu Kim, Dongsu Han, Onur Mutlu, and Mor Harchol-Balter. Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers. In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. IEEE, 2010.
[35]
Karthik Kumar, Jibang Liu, Yung-Hsiang Lu, and Bharat Bhargava. A survey of computation offloading for mobile systems. Mobile Networks and Applications, 2013.
[36]
Min Lee, AS Krishnakumar, Parameshwaran Krishnan, Navjot Singh, and Shalini Yajnik. Supporting soft real-time tasks in the xen hypervisor. In ACM Sigplan Notices. ACM, 2010.
[37]
Jacob Leverich and Christos Kozyrakis. Reconciling high server utilization and sub-millisecond quality-of-service. In Proceedings of the Ninth European Conference on Computer Systems, EuroSys '14, 2014.
[38]
Chit-Kwan Lin and H. T. Kung. Mobile app acceleration via fine-grain offloading to the cloud. In 6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14). USENIX Association, 2014.
[39]
Fang Liu, Xiaowei Jiang, and Yan Solihin. Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. IEEE, 2010.
[40]
Daniel Lo, Taejoon Song, and G Edward Suh. Prediction- guided performance-energy trade-off for interactive applications. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 2015.
[41]
David Lo, Liqun Cheng, Rama Govindaraju, Luiz Andre Barroso, and Christos Kozyrakis. Towards energy proportionality for large-scale latency-critical workloads. In Proceeding of the 41st annual international symposium on Computer architecuture. IEEE Press, 2014.
[42]
David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. Heracles: improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. ACM, 2015.
[43]
Ying Lu, Tarek Abdelzaher, Chenyang Lu, and Gang Tao. An adaptive control framework for qos guarantees and its application to differentiated caching. In Quality of Service, 2002. Tenth IEEE International Workshop on. IEEE, 2002.
[44]
Jiuyue Ma, Xiufeng Sui, Ninghui Sun, Yupeng Li, Zihao Yu, Bowen Huang, Tianni Xu, Zhicheng Yao, Yun Chen, Haibin Wang, et al. Supporting differentiated services in computers via programmable architecture for resourcing-on-demand (pard). In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 2015.
[45]
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2011.
[46]
David Meisner, Brian T. Gold, and Thomas F. Wenisch. Powernap: Eliminating server idle power. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV. ACM, 2009.
[47]
Rustam Miftakhutdinov, Eiman Ebrahimi, and Yale N Patt. Predicting performance impact of dvfs for realistic memory systems. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2012.
[48]
Sai Prashanth Muralidhara, Lavanya Subramanian, Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2011.
[49]
Onur Mutlu and Thomas Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2007.
[50]
Onur Mutlu and Thomas Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In ACM SIGARCH Computer Architecture News. IEEE Computer Society, 2008.
[51]
Ripal Nathuji, Aman Kansal, and Alireza Ghaffarkhah. Q-clouds: managing performance interference effects for qos-aware clouds. In Proceedings of the 5th European conference on Computer systems. ACM, 2010.
[52]
Dejan Novakovic, Nedeljko Vasic, Stanko Novakovic, Dejan Kostic, and Ricardo Bianchini. Deepdive: Transparently identifying and managing performance interference in virtualized environments. Technical Report 183449, EPFL, 2013.
[53]
Moinuddin K Qureshi and Yale N Patt. Utility-based cache partitioning: A low-overhead, high-performance, run- time mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2006.
[54]
Arun Raghavan, Yixin Luo, Anuj Chandawalla, Marios Papaefthymiou, Kevin P Pipe, Thomas F Wenisch, and Milo MK Martin. Computational sprinting. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on. IEEE, 2012.
[55]
Daniel Sanchez and Christos Kozyrakis. Vantage: scalable and efficient fine-grain cache partitioning. In ACM SIGARCH Computer Architecture News. ACM, 2011.
[56]
Lui Sha, Tarek Abdelzaher, Karl-Erik Arzen, Anton Cervin, Theodore Baker, Alan Burns, Giorgio Buttazzo, Marco Caccamo, John Lehoczky, and Aloysius K Mok. Real time scheduling theory: A historical perspective. Real-time systems, 2004.
[57]
Techspot. Facebook to build a $1 billion wind-powered data center in Fort Worth.
[58]
Hiroyuki Usui, Lavanya Subramanian, Kevin Chang, and Onur Mutlu. Squash: Simple qos-aware high-performance memory scheduler for heterogeneous systems with hardware accelerators. arXiv preprint arXiv:1505.07502, 2015.
[59]
Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and TN Vijaykumar. Timetrader: Exploiting latency tail to save datacenter energy for online search. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 2015.
[60]
Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. In ACM SIGARCH Computer Architecture News. ACM, 2013.
[61]
Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Marco Caccamo, and Lui Sha. Memguard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2013 IEEE 19th. IEEE, 2013.
[62]
Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. Cpi2: Cpu performance isolation for shared compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, New York, NY, USA, 2013. ACM.
[63]
Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. Smite: Precise qos prediction on real-system smt processors to improve utilization in warehouse scale computers. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-47, 2014.
[64]
Jiacheng Zhao, Huimin Cui, Jingling Xue, Xiaobing Feng, Youliang Yan, and Wensen Yang. An empirical model for predicting cross-core performance interference on multicore processors. In Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques, PACT '13, 2013.
[65]
Yanqi Zhou and David Wentzlaff. The sharing architecture: sub-core configurability for iaas clouds. In ACM SIGARCH Computer Architecture News. ACM, 2014.

Cited By

View all
  • (2023)Component-distinguishable Co-location and Resource Reclamation for High-throughput ComputingACM Transactions on Computer Systems10.1145/363000642:1-2(1-37)Online publication date: 18-Nov-2023
  • (2022)MISOProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563510(173-189)Online publication date: 7-Nov-2022
  • (2021)PCCS: Processor-Centric Contention-aware Slowdown Model for Heterogeneous System-on-ChipsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480101(1282-1295)Online publication date: 18-Oct-2021
  • Show More Cited By

Index Terms

  1. Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 44, Issue 2
    ASPLOS'16
    May 2016
    774 pages
    ISSN:0163-5964
    DOI:10.1145/2980024
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
      March 2016
      824 pages
      ISBN:9781450340915
      DOI:10.1145/2872362
      • General Chair:
      • Tom Conte,
      • Program Chair:
      • Yuanyuan Zhou
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 March 2016
    Published in SIGARCH Volume 44, Issue 2

    Check for updates

    Author Tags

    1. DVFs
    2. QoS
    3. cache partitioning
    4. colocation
    5. latency distribution
    6. latency tail
    7. run-time prediction

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)79
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Component-distinguishable Co-location and Resource Reclamation for High-throughput ComputingACM Transactions on Computer Systems10.1145/363000642:1-2(1-37)Online publication date: 18-Nov-2023
    • (2022)MISOProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563510(173-189)Online publication date: 7-Nov-2022
    • (2021)PCCS: Processor-Centric Contention-aware Slowdown Model for Heterogeneous System-on-ChipsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480101(1282-1295)Online publication date: 18-Oct-2021
    • (2021)Rusty: Runtime Interference-Aware Predictive Monitoring for Modern Multi-Tenant SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.301394832:1(184-198)Online publication date: 1-Jan-2021
    • (2021)SatoriProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00031(292-305)Online publication date: 14-Jun-2021
    • (2021)Don't forget the I/O when allocating your LLCProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00018(112-125)Online publication date: 14-Jun-2021
    • (2020)RhythmProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387534(1-17)Online publication date: 15-Apr-2020
    • (2020)CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale Computers2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00025(193-206)Online publication date: Feb-2020
    • (2020)Towards Hybrid Isolation for Shared Multicore SystemsJob Scheduling Strategies for Parallel Processing10.1007/978-3-030-63171-0_2(25-44)Online publication date: 22-May-2020
    • (2019)Fair-EDFProceedings of the 11th USENIX Conference on Hot Topics in Storage and File Systems10.5555/3357062.3357070(6-6)Online publication date: 8-Jul-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media