research-article

FlexDCP: a QoS framework for CMP architectures

Authors:

Francisco J. Cazorla,

Rizos Sakellariou,

Mateo ValeroAuthors Info & Claims

ACM SIGOPS Operating Systems Review, Volume 43, Issue 2

Pages 86 - 96

https://doi.org/10.1145/1531793.1531806

Published: 21 April 2009 Publication History

Abstract

Current multicore architectures offer high throughput by increasing hardware resource utilization. As the number of cores in a multicore system increases, providing Quality of Service (QoS) to applications in addition to throughput is becoming an important problem.

In this work, we present FlexDCP, a framework that allows the Operating System (OS) to guarantee a QoS for each application running in a chip multiprocessor. FlexDCP directly estimates the performance of applications for different cache configurations instead of using indirect measures of performance like the number of misses. This information allows the OS to convert QoS requirements into resource assignments. Consequently, it offers more flexibility to the OS as it can optimize different QoS metrics like per-application performance or global performance metrics such as fairness, weighted speed up or throughput.

Our results show that FlexDCP is able to force applications in a workload to run at a certain percentage of their maximum performance in 94% of the cases considered, being on average 1:48% under the objective for remaining cases. When optimizing a global QoS metric like fairness, FlexDCP consistently outperforms traditional eviction policies like LRU, pseudo LRU and previous dynamic cache partitioning proposals for two-, four- and eightcore configurations. In an eight-core architecture FlexDCP obtains a fairness improvement of 10:1% over Fair, the best policy in the literature optimizing fairness.

References

[1]

ARM920T. Technical Reference Manual. http://infocenter.arm.com/help/topic/com.arm.doc.ddi0151c/ARM920T_TRM1_S.pdf.

[2]

UltraSPARC T2. Supplement to the UltraSPARC Architecture 2007. http://opensparc-t2.sunsource.net/specs/UST2-UASuppl-current-draft-HP-EXT.pdf.

[3]

D.P. Bovet and M. Cesati. Understanding Linux kernel. O'Reilly, 3rd edition, 2005.

Digital Library

[4]

F.J. Cazorla, P.M.W. Knijnenburg, R. Sakellariou, E. Fernandez, A. Ramirez, and M. Valero. Predictable performance in SMT processors: Synergy between the OS and SMTs. IEEE ToC, 55(7):785--799, 2006.

Digital Library

[5]

D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic cache partitioning via columnization. In Design Automation Conference, 2000.

[6]

F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A framework for providing quality of service in chip multi-processors. In MICRO, 2007.

Digital Library

[7]

L. Hammond, B.A. Nayfeh, and K. Olukotun. A single-chip multiprocessor. Computer, 30(9):79--85, 1997.

Digital Library

[8]

L.C. Heller and M.S. Farrell. Millicode in an IBM zSeries processor. IBM J. Res. Dev., 48(3-4):425--434, 2004.

Digital Library

[9]

J.L. Hennessy and D.A. Patterson. Computer architecture: a quantitative approach. Morgan Kaufmann Publishers Inc., 3rd edition, 2002.

Digital Library

[10]

L.R. Hsu, S.K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In PACT, 2006.

Digital Library

[11]

C.J. Hughes, P. Kaul, S.V. Adve, R. Jain, C. Park, and J. Srinivasan. Variability in the execution of multimedia applications and implications for architecture. In ISCA, 2001.

Digital Library

[12]

R.R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L.R. Hsu, and S.K. Reinhardt. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS, 2007.

Digital Library

[13]

A. Jaleel, W. Hasenplaugh, M.K. Qureshi, J. Sebot, S.C.S. Jr, and J. Emer. Adaptive insertion policies for managing shared caches on cmps. In PACT, 2008.

Digital Library

[14]

T.S. Karkhanis and J.E. Smith. A first-order superscalar processor model. In ISCA, 2004.

Digital Library

[15]

S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004.

Digital Library

[16]

J.W. Lee and K. Asanovic. METERG: Measurement-based end-toend performance estimation technique in QoS-capable multiprocessors. In RTAS, 2006.

Digital Library

[17]

C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI, 2005.

Digital Library

[18]

K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.

[19]

R.L. Mattson, J. Gecsei, D.R. Slutz, and I.L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, 1970.

Digital Library

[20]

M. Moreto, F.J. Cazorla, A. Ramirez, and M. Valero. Explaining dynamic cache partitioning speed ups. IEEE CAL, 2007.

Digital Library

[21]

M. Moreto, F.J. Cazorla, A. Ramirez, and M. Valero. Online prediction of applications cache utility. In IC-SAMOS, 2007.

[22]

O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO, 2007.

Digital Library

[23]

K.J. Nesbit, N. Aggarwal, J. Laudon, and J.E. Smith. Fair queuing memory systems. In MICRO, 2006.

Digital Library

[24]

K.J. Nesbit, J. Laudon, and J.E. Smith. Virtual private caches. In ISCA, 2007.

Digital Library

[25]

K.J. Nesbit, M. Moreto, F.J. Cazorla, A. Ramirez, M. Valero, and J.E. Smith. A framework for managing multicore resources. IEEE Micro, special issue on Interaction of Computer Architecture and Operating System in the Many-core Era, 38(3), 2008.

[26]

M.K. Qureshi and Y.N. Patt. Utility-based cache partitioning: A lowoverhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006.

Digital Library

[27]

N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven CMP cache management. In PACT, 2006.

Digital Library

[28]

M.J. Serrano, R. Wood, and M. Nemirovsky. A study on multistreamed superscalar processors. Technical Report 93-05, UCSB, 1993.

[29]

A. Settle, D. Connors, E. Gibert, and A. Gonzalez. A dynamically reconfigurable cache for multithreaded processors. Journal of Embedded Computing, 1(3-4), 2005.

Digital Library

[30]

T. Sherwood, E. Perelman, G. Hamerly, S. Sair, and B. Calder. Discovering and exploiting program phases. IEEE Micro, 2003.

Digital Library

[31]

J.E. Smith and R. Nair. Virtual machines: versatile platforms for systems and processes. Morgan Kaufmann Publishers Inc., 2005.

Digital Library

[32]

G.E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA, 2002.

Digital Library

[33]

D.M. Tullsen, S.J. Eggers, and H.M. Levy. Simultaneous multithreading: maximizing on-chip parallelism. In ISCA, 1995.

Digital Library

[34]

J. Vera, F.J. Cazorla, A. Pajuelo, O.J. Santana, E. Fernandez, and M. Valero. FAME: Fairly measuring multithreaded architectures. In PACT, 2007.

Digital Library

[35]

T.Y. Yeh and G. Reinman. Fast and fair: data-stream quality of service. In CASES, 2005.

Digital Library

[36]

P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In ASPLOS, 2004.

Digital Library

Cited By

Nejat MManivannan MPericàs MStenström P(2022)Cooperative Slack Management: Saving Energy of Multicore Processors by Trading Performance Slack Between QoS-Constrained ApplicationsACM Transactions on Architecture and Code Optimization10.1145/350555919:2(1-27)Online publication date: 31-Jan-2022
https://dl.acm.org/doi/10.1145/3505559
Qiu JHua ZLiu LCao MChen D(2021)Machine-learning-based cache partition method in cloud environmentPeer-to-Peer Networking and Applications10.1007/s12083-021-01235-xOnline publication date: 6-Sep-2021
https://doi.org/10.1007/s12083-021-01235-x
Schwedock BBeckmann N(2020)Jumanji: The Case for Dynamic NUCA in the Datacenter2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00061(665-680)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00061
Show More Cited By

Index Terms

FlexDCP: a QoS framework for CMP architectures

Recommendations

Ubik: efficient cache sharing with strict qos for latency-critical workloads
ASPLOS '14

Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, ...
Ubik: efficient cache sharing with strict qos for latency-critical workloads
ASPLOS '14

Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, ...
Ubik: efficient cache sharing with strict qos for latency-critical workloads
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review

ACM SIGOPS Operating Systems Review Volume 43, Issue 2

April 2009

119 pages

ISSN:0163-5980

DOI:10.1145/1531793

Issue’s Table of Contents

Copyright © 2009 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2009

Published in SIGOPS Volume 43, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
443
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nejat MManivannan MPericàs MStenström P(2022)Cooperative Slack Management: Saving Energy of Multicore Processors by Trading Performance Slack Between QoS-Constrained ApplicationsACM Transactions on Architecture and Code Optimization10.1145/350555919:2(1-27)Online publication date: 31-Jan-2022
https://dl.acm.org/doi/10.1145/3505559
Qiu JHua ZLiu LCao MChen D(2021)Machine-learning-based cache partition method in cloud environmentPeer-to-Peer Networking and Applications10.1007/s12083-021-01235-xOnline publication date: 6-Sep-2021
https://doi.org/10.1007/s12083-021-01235-x
Schwedock BBeckmann N(2020)Jumanji: The Case for Dynamic NUCA in the Datacenter2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00061(665-680)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00061
Nejat MManivannan MPericas MStenstrom P(2020)Coordinated Management of Processor Configuration and Cache Partitioning to Optimize Energy under QoS Constraints2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00067(590-601)Online publication date: May-2020
https://doi.org/10.1109/IPDPS47924.2020.00067
Nejat MManivannan MPericàs MStenström P(2020)Coordinated management of DVFS and cache partitioning under QoS constraints to save energy in multi-core systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2020.05.006Online publication date: Jun-2020
https://doi.org/10.1016/j.jpdc.2020.05.006
Xiao ZChen LWang BDu JLi K(2020)Novel Fairness-aware Co-scheduling for Shared Cache Contention Game on Chip MultiprocessorsInformation Sciences10.1016/j.ins.2020.03.078Online publication date: Apr-2020
https://doi.org/10.1016/j.ins.2020.03.078
Javadi SSuresh AWajahat MGandhi A(2019)ScavengerProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362734(272-285)Online publication date: 20-Nov-2019
https://dl.acm.org/doi/10.1145/3357223.3362734
Huang ZJoao JRico AHilton ALee B(2019)DynaSprintProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358301(426-439)Online publication date: 12-Oct-2019
https://dl.acm.org/doi/10.1145/3352460.3358301
Nejat MPericas MStenstrom P(2019)QoS-Driven Coordinated Management of Resources to Save Energy in Multi-core Systems2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00040(303-313)Online publication date: May-2019
https://doi.org/10.1109/IPDPS.2019.00040
Wang LQian DWang RLuan ZYang HZhang H(2019)A novel index system describing program runtime characteristics for workload consolidationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-018-6614-213:3(489-499)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1007/s11704-018-6614-2
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents