Article

QoS policies and architecture for cache/memory in CMP platforms

Authors:

Ramesh Illikkal,

Srihari Makineni,

Steve ReinhardtAuthors Info & Claims

SIGMETRICS '07: Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

Pages 25 - 36

https://doi.org/10.1145/1254882.1254886

Published: 12 June 2007 Publication History

Abstract

As we enter the era of CMP platforms with multiple threads/cores on the die, the diversity of the simultaneous workloads running on them is expected to increase. The rapid deployment of virtualization as a means to consolidate workloads on to a single platform is a prime example of this trend. In such scenarios, the quality of service (QoS) that each individual workload gets from the platform can widely vary depending on the behavior of the simultaneously running workloads. While the number of cores assigned to each workload can be controlled, there is no hardware or software support in today's platforms to control allocation of platform resources such as cache space and memory bandwidth to individual workloads. In this paper, we propose a QoS-enabled memory architecture for CMP platforms that addresses this problem. The QoS-enabled memory architecture enables more cache resources (i.e. space) and memory resources (i.e. bandwidth) for high priority applications based on guidance from the operating environment. The architecture also allows dynamic resource reassignment during run-time to further optimize the performance of the high priority application with minimal degradation to low priority. To achieve these goals, we will describe the hardware/software support required in the platform as well as the operating environment (O/S and virtual machine monitor). Our evaluation framework consists of detailed platform simulation models and a QoS-enabled version of Linux. Based on evaluation experiments, we show the effectiveness of a QoS-enabled architecture and summarize key findings/trade-offs.

References

[1]

Azul Systems. Azul Compute Appliance. http://www.azulsystems.com/products/cpools_cappliance.html

[2]

P. Barham, et al. Xen and the Art of Virtualization. In Proc. of the ACM Symposium on Operating Systems Principles (SOSP), Oct 2003.

Digital Library

[3]

D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multiprocessor architecture", In Proc. of 11th International Symposium on High Performance Computer Architecture (HPCA), Feb 2005.

Digital Library

[4]

T. Deshane, D. Dimatos, et al. Performance Isolation of a Misbehaving Virtual Machine with Xen, VMware and Solaris Containers. http://people.clarkson.edu/~jnm/publications/isolationOfMisbehavingVMs.pdf.

[5]

L. Hsu, S. Reinhardt, R. Iyer and S. Makineni. Communist, Utilitarian, and Capitalist Policies on CMPs: Caches as a Shared Resource. In Proc. of 15th International Conference on Parallel Architectures and Compilation Techniques (PACT), Sept 2006.

Digital Library

[6]

R. P. Goldberg. Survey of virtual machine research. IEEE Transactions on Computers, 1974.

Digital Library

[7]

Intel Corporation. Intel Dual-Core Processors-The First Multi-core Revolution. http://www.intel.com/technology/computing/dual-core/.

[8]

R. Iyer. On Modeling and Analyzing Cache Performance using CASPER. In Proc. of 11th International Symposium on Modeling, Analysis and Simulation of Computer & Telecom Systems, Oct 2003.

[9]

R. Iyer. CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms. In Proc. of 18th Annual International Conference on Supercomputing (ICS'04), July 2004.

Digital Library

[10]

S. Kim, D. Chandra, and Y. Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In Proc. of 13th Int'l Conf. on Parallel Arch. & Complication Techniques(PACT), Sept 2004.

Digital Library

[11]

P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded Sparc Processor.In Proc. of Annual International Symposium on Microarchitecture(MICRO), Mar 2005.

Digital Library

[12]

K. Krewell. Best Servers of 2004: Multicore is Norm. Microprocessor Report, www.mpronline.com, Jan 2005.

[13]

R. Kumar, D. M. Tullsen, N. P. Jouppi, P. Ranganathan. Heterogeneous Chip Multiprocessors. IEEE Transactions on Computers, 2005.

Digital Library

[14]

J. Laudon. Performance/Watt: The New Server Focus. In 1st Workshop on Design, Architecture and Simulation of CMP (dasCMP), Nov 2005.

Digital Library

[15]

K. Lee, T. Lin and C. Jen. An Efficient Quality-Aware Memory Controller for Multimedia Platform SoC. IEEE Trans. On Circuits and Systems for Video Technology, May 2005.

Digital Library

[16]

C. Natarajan, B. Christenson, and F. Briggs. Performance Impact of Memory Controller Features in Multiprocessor Server Environment. In 3rd Workshop on Memory Performance Issues, 2004.

Digital Library

[17]

Kyle J. Nesbit, et al. Fair Queuing Memory Systems. In Proc. of Annual International Symposium on Microarchitecture (MICRO), June 2006.

Digital Library

[18]

K. Olukotun, B. A. Nayfeh, et. al. The case for a single-chip multiprocessor. In Proc. of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct 1996.

Digital Library

[19]

M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches In Proc. of Annual Int'l Symposium on Microarchitecture (MICRO), June 2006.

Digital Library

[20]

N. Rafique, W. T. Lim and M. Thottethodi. Architectural Support for Operating System-Driven CMP Cache Management. In Proc. of the 15th International Conference on Parallel Architectures and Compilation Technology (PACT 2006), Sept 2006.

Digital Library

[21]

P. Ranganathan and N. Jouppi. Enterprise IT Trends and Implications on Architecture Research. In Proc. of the 11th International Symposium on High Performance Computer Architecture (HPCA), Feb 2005.

Digital Library

[22]

S. Rixner, W. J. Dally, U. J. Kapasi, et al. Memory access scheduling. In Proc. of the International Symposium on Computer Architecture (ISCA), June 2000.

Digital Library

[23]

M. Rosenblum and T. Garfinkel. Virtual Machine Monitors: Current Technology and Future Trends. IEEE Transactions on Computers, 2005.

Digital Library

[24]

L. Sha, R. Rajkumar and J. P. Lehoczky. Priority Inheritance Protocols: An Approach to Real-Time Synchronization. IEEE Transactions on Computers, Sept 1990.

Digital Library

[25]

SPECint, http://www.spec.org/cpu2000/SPECint

[26]

SPECjbb2005, http://www.spec.org/jbb2005

[27]

H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE Transactions on Computers, Sept 1992.

Digital Library

[28]

G. Suh, S. Devadas, and L. Rudolph. A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. In Proc. of International Symposium on High Performance Computer Architecture (HPCA), Feb 2002.

Digital Library

[29]

"Test TCP (TTCP) Benchmarking Tool", http://www.pcausa.com

[30]

"TPC-C Design Document", http://www.tpc.org/tpcc/

[31]

R. Uhlig, et al., "Intel Virtualization Technology," IEEE Transactions on Computers, 2005.

Digital Library

[32]

R. Uhlig, R. Fishtein, et. al. SoftSDV: A Presilicon Software Development Environment for the IA-64 Architecture. Intel Technology Journal. (http://www.intel.com/technology/itjf)

[33]

T. Y. Yeh and G. Reinman. Fast and Fair: Data-stream Quality of Service. In Proc. of International Conference of Compilers, Architecture and System For Embedded Systems (CASES), July 2004.

Digital Library

[34]

L. Zhao, J. Moses, R. Iyer, et al. Architectural Evaluation of Large-Scale CMP Platforms using ManySim. In Intel's Design & Test Technology Conference (DTTC), Aug 2006.

[35]

H. Zhang. Service Disciplines for Guaranteed Performance Service in Packet-switching Networks. In Proc. of IEEE, Oct. 1995.

[36]

Z. Zhu and Z. Zhang. A Performance Comparison of DRAM Memory System Optimizations for SMT Processors. In Proc, of the 11th International Symposium on High Performance Computer Architecture (HPCA), Feb 2005.

Digital Library

Cited By

Han MBaek W(2022)SDRP: Safe, Efficient, and SLO-Aware Workload Consolidation Through Secure and Dynamic Resource PartitioningIEEE Transactions on Services Computing10.1109/TSC.2020.302455215:4(1868-1882)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TSC.2020.3024552
Iyer RDe VIllikkal RKoufaty DChitlur BHerdrich AKhellah MHamzaoglu FKarl E(2021)Advances in Microprocessor Cache Architectures Over the Last 25 YearsIEEE Micro10.1109/MM.2021.311490341:6(78-88)Online publication date: 1-Nov-2021
https://doi.org/10.1109/MM.2021.3114903
Dutta SNaghibijouybari HAbu-Ghazaleh NMarquez ABarker K(2021)Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU Systems2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00080(972-984)Online publication date: Jun-2021
https://doi.org/10.1109/ISCA52012.2021.00080
Show More Cited By

Index Terms

QoS policies and architecture for cache/memory in CMP platforms
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

QoS policies and architecture for cache/memory in CMP platforms
SIGMETRICS '07 Conference Proceedings

As we enter the era of CMP platforms with multiple threads/cores on the die, the diversity of the simultaneous workloads running on them is expected to increase. The rapid deployment of virtualization as a means to consolidate workloads on to a single ...
Quality of service shared cache management in chip multiprocessor architecture

The trends in enterprise IT toward service-oriented computing, server consolidation, and virtual computing point to a future in which workloads are becoming increasingly diverse in terms of performance, reliability, and availability requirements. It can ...
Video multicasting in an autonomic future internet with essentially-perfect throughput and QoS guarantees
NEW2AN'11/ruSMART'11: Proceedings of the 11th international conference and 4th international conference on Smart spaces and next generation wired/wireless networking

A framework for an Autonomic Future Internet which supports 2 services classes, the Essentially-Perfect QoS (QoS) class and the Best-Effort (BE) class, is proposed. All provisioned traffic flows in the QoS class can achieve 100% throughput and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMETRICS '07: Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

June 2007

398 pages

ISBN:9781595936394

DOI:10.1145/1254882

General Chair:
Leana Golubchik
University of Southern California, USA
,
Program Chairs:
Mostafa Ammar
Georgia Institute of Technology, USA
,
Mor Harchol-Balter
Carnegie Mellon University, USA

ACM SIGMETRICS Performance Evaluation Review Volume 35, Issue 1
SIGMETRICS '07 Conference Proceedings
June 2007
382 pages
ISSN:0163-5999
DOI:10.1145/1269899
Issue’s Table of Contents

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGMETRICS07

Sponsor:

SIGMETRICS07: ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

June 12 - 16, 2007

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

228
Total Citations
View Citations
2,155
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)3

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Han MBaek W(2022)SDRP: Safe, Efficient, and SLO-Aware Workload Consolidation Through Secure and Dynamic Resource PartitioningIEEE Transactions on Services Computing10.1109/TSC.2020.302455215:4(1868-1882)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TSC.2020.3024552
Iyer RDe VIllikkal RKoufaty DChitlur BHerdrich AKhellah MHamzaoglu FKarl E(2021)Advances in Microprocessor Cache Architectures Over the Last 25 YearsIEEE Micro10.1109/MM.2021.311490341:6(78-88)Online publication date: 1-Nov-2021
https://doi.org/10.1109/MM.2021.3114903
Dutta SNaghibijouybari HAbu-Ghazaleh NMarquez ABarker K(2021)Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU Systems2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00080(972-984)Online publication date: Jun-2021
https://doi.org/10.1109/ISCA52012.2021.00080
Li BWang YWang RTai CIyer RZhou ZHerdrich AZhang THaj-Ali AStoica IAsanovic K(2020)RLDRM: Closed Loop Dynamic Cache Allocation with Deep Reinforcement Learning for Network Function Virtualization2020 6th IEEE Conference on Network Softwarization (NetSoft)10.1109/NetSoft48620.2020.9165471(335-343)Online publication date: Jun-2020
https://doi.org/10.1109/NetSoft48620.2020.9165471
Nikas KPapadopoulou NGiantsidi DKarakostas VGoumas GKoziris N(2019)DICERProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337891(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337891
Park JPark SBaek W(2019)CoPartProceedings of the Fourteenth EuroSys Conference 201910.1145/3302424.3303963(1-16)Online publication date: 25-Mar-2019
https://dl.acm.org/doi/10.1145/3302424.3303963
Cheshmikhani EFarbeh HMiremadi SAsadi H(2019)TA-LRWIEEE Transactions on Computers10.1109/TC.2018.287543968:3(455-470)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1109/TC.2018.2875439
Chung JRo YKim JAhn JKim JKim JLee JAhn J(2019)Enforcing Last-level Cache Partitioning through Memory Virtual ChannelsProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2019.00016(97-109)Online publication date: 23-Sep-2019
https://dl.acm.org/doi/10.1109/PACT.2019.00016
Sun GShen JVeidenbaum A(2019)Combining Prefetch Control and Cache Partitioning to Improve Multicore Performance2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00103(953-962)Online publication date: May-2019
https://doi.org/10.1109/IPDPS.2019.00103
Mirhosseini ASriraman AWenisch T(2019)Enhancing Server Efficiency in the Face of Killer Microseconds2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00037(185-198)Online publication date: Feb-2019
https://doi.org/10.1109/HPCA.2019.00037
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten