Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FlexDCP: a QoS framework for CMP architectures

Published: 21 April 2009 Publication History

Abstract

Current multicore architectures offer high throughput by increasing hardware resource utilization. As the number of cores in a multicore system increases, providing Quality of Service (QoS) to applications in addition to throughput is becoming an important problem.
In this work, we present FlexDCP, a framework that allows the Operating System (OS) to guarantee a QoS for each application running in a chip multiprocessor. FlexDCP directly estimates the performance of applications for different cache configurations instead of using indirect measures of performance like the number of misses. This information allows the OS to convert QoS requirements into resource assignments. Consequently, it offers more flexibility to the OS as it can optimize different QoS metrics like per-application performance or global performance metrics such as fairness, weighted speed up or throughput.
Our results show that FlexDCP is able to force applications in a workload to run at a certain percentage of their maximum performance in 94% of the cases considered, being on average 1:48% under the objective for remaining cases. When optimizing a global QoS metric like fairness, FlexDCP consistently outperforms traditional eviction policies like LRU, pseudo LRU and previous dynamic cache partitioning proposals for two-, four- and eightcore configurations. In an eight-core architecture FlexDCP obtains a fairness improvement of 10:1% over Fair, the best policy in the literature optimizing fairness.

References

[1]
ARM920T. Technical Reference Manual. http://infocenter.arm.com/help/topic/com.arm.doc.ddi0151c/ARM920T_TRM1_S.pdf.
[2]
UltraSPARC T2. Supplement to the UltraSPARC Architecture 2007. http://opensparc-t2.sunsource.net/specs/UST2-UASuppl-current-draft-HP-EXT.pdf.
[3]
D.P. Bovet and M. Cesati. Understanding Linux kernel. O'Reilly, 3rd edition, 2005.
[4]
F.J. Cazorla, P.M.W. Knijnenburg, R. Sakellariou, E. Fernandez, A. Ramirez, and M. Valero. Predictable performance in SMT processors: Synergy between the OS and SMTs. IEEE ToC, 55(7):785--799, 2006.
[5]
D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic cache partitioning via columnization. In Design Automation Conference, 2000.
[6]
F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A framework for providing quality of service in chip multi-processors. In MICRO, 2007.
[7]
L. Hammond, B.A. Nayfeh, and K. Olukotun. A single-chip multiprocessor. Computer, 30(9):79--85, 1997.
[8]
L.C. Heller and M.S. Farrell. Millicode in an IBM zSeries processor. IBM J. Res. Dev., 48(3-4):425--434, 2004.
[9]
J.L. Hennessy and D.A. Patterson. Computer architecture: a quantitative approach. Morgan Kaufmann Publishers Inc., 3rd edition, 2002.
[10]
L.R. Hsu, S.K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In PACT, 2006.
[11]
C.J. Hughes, P. Kaul, S.V. Adve, R. Jain, C. Park, and J. Srinivasan. Variability in the execution of multimedia applications and implications for architecture. In ISCA, 2001.
[12]
R.R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L.R. Hsu, and S.K. Reinhardt. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS, 2007.
[13]
A. Jaleel, W. Hasenplaugh, M.K. Qureshi, J. Sebot, S.C.S. Jr, and J. Emer. Adaptive insertion policies for managing shared caches on cmps. In PACT, 2008.
[14]
T.S. Karkhanis and J.E. Smith. A first-order superscalar processor model. In ISCA, 2004.
[15]
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004.
[16]
J.W. Lee and K. Asanovic. METERG: Measurement-based end-toend performance estimation technique in QoS-capable multiprocessors. In RTAS, 2006.
[17]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI, 2005.
[18]
K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.
[19]
R.L. Mattson, J. Gecsei, D.R. Slutz, and I.L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, 1970.
[20]
M. Moreto, F.J. Cazorla, A. Ramirez, and M. Valero. Explaining dynamic cache partitioning speed ups. IEEE CAL, 2007.
[21]
M. Moreto, F.J. Cazorla, A. Ramirez, and M. Valero. Online prediction of applications cache utility. In IC-SAMOS, 2007.
[22]
O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO, 2007.
[23]
K.J. Nesbit, N. Aggarwal, J. Laudon, and J.E. Smith. Fair queuing memory systems. In MICRO, 2006.
[24]
K.J. Nesbit, J. Laudon, and J.E. Smith. Virtual private caches. In ISCA, 2007.
[25]
K.J. Nesbit, M. Moreto, F.J. Cazorla, A. Ramirez, M. Valero, and J.E. Smith. A framework for managing multicore resources. IEEE Micro, special issue on Interaction of Computer Architecture and Operating System in the Many-core Era, 38(3), 2008.
[26]
M.K. Qureshi and Y.N. Patt. Utility-based cache partitioning: A lowoverhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006.
[27]
N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven CMP cache management. In PACT, 2006.
[28]
M.J. Serrano, R. Wood, and M. Nemirovsky. A study on multistreamed superscalar processors. Technical Report 93-05, UCSB, 1993.
[29]
A. Settle, D. Connors, E. Gibert, and A. Gonzalez. A dynamically reconfigurable cache for multithreaded processors. Journal of Embedded Computing, 1(3-4), 2005.
[30]
T. Sherwood, E. Perelman, G. Hamerly, S. Sair, and B. Calder. Discovering and exploiting program phases. IEEE Micro, 2003.
[31]
J.E. Smith and R. Nair. Virtual machines: versatile platforms for systems and processes. Morgan Kaufmann Publishers Inc., 2005.
[32]
G.E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA, 2002.
[33]
D.M. Tullsen, S.J. Eggers, and H.M. Levy. Simultaneous multithreading: maximizing on-chip parallelism. In ISCA, 1995.
[34]
J. Vera, F.J. Cazorla, A. Pajuelo, O.J. Santana, E. Fernandez, and M. Valero. FAME: Fairly measuring multithreaded architectures. In PACT, 2007.
[35]
T.Y. Yeh and G. Reinman. Fast and fair: data-stream quality of service. In CASES, 2005.
[36]
P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In ASPLOS, 2004.

Cited By

View all
  • (2022)Cooperative Slack Management: Saving Energy of Multicore Processors by Trading Performance Slack Between QoS-Constrained ApplicationsACM Transactions on Architecture and Code Optimization10.1145/350555919:2(1-27)Online publication date: 31-Jan-2022
  • (2021)Machine-learning-based cache partition method in cloud environmentPeer-to-Peer Networking and Applications10.1007/s12083-021-01235-xOnline publication date: 6-Sep-2021
  • (2020)Jumanji: The Case for Dynamic NUCA in the Datacenter2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00061(665-680)Online publication date: Oct-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 43, Issue 2
April 2009
119 pages
ISSN:0163-5980
DOI:10.1145/1531793
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2009
Published in SIGOPS Volume 43, Issue 2

Check for updates

Author Tags

  1. cache partitioning
  2. multicore systems
  3. operating systems
  4. performance predictability
  5. quality of service

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Cooperative Slack Management: Saving Energy of Multicore Processors by Trading Performance Slack Between QoS-Constrained ApplicationsACM Transactions on Architecture and Code Optimization10.1145/350555919:2(1-27)Online publication date: 31-Jan-2022
  • (2021)Machine-learning-based cache partition method in cloud environmentPeer-to-Peer Networking and Applications10.1007/s12083-021-01235-xOnline publication date: 6-Sep-2021
  • (2020)Jumanji: The Case for Dynamic NUCA in the Datacenter2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00061(665-680)Online publication date: Oct-2020
  • (2020)Coordinated Management of Processor Configuration and Cache Partitioning to Optimize Energy under QoS Constraints2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00067(590-601)Online publication date: May-2020
  • (2020)Coordinated management of DVFS and cache partitioning under QoS constraints to save energy in multi-core systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2020.05.006Online publication date: Jun-2020
  • (2020)Novel Fairness-aware Co-scheduling for Shared Cache Contention Game on Chip MultiprocessorsInformation Sciences10.1016/j.ins.2020.03.078Online publication date: Apr-2020
  • (2019)ScavengerProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362734(272-285)Online publication date: 20-Nov-2019
  • (2019)DynaSprintProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358301(426-439)Online publication date: 12-Oct-2019
  • (2019)QoS-Driven Coordinated Management of Resources to Save Energy in Multi-core Systems2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00040(303-313)Online publication date: May-2019
  • (2019)A novel index system describing program runtime characteristics for workload consolidationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-018-6614-213:3(489-499)Online publication date: 1-Jun-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media