Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2155620.2155650acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Published: 03 December 2011 Publication History

Abstract

As much of the world's computing continues to move into the cloud, the overprovisioning of computing resources to ensure the performance isolation of latency-sensitive tasks, such as web search, in modern datacenters is a major contributor to low machine utilization. Being unable to accurately predict performance degradation due to contention for shared resources on multicore systems has led to the heavy handed approach of simply disallowing the co-location of high-priority, latency-sensitive tasks with other tasks. Performing this precise prediction has been a challenging and unsolved problem.
In this paper, we present Bubble-Up, a characterization methodology that enables the accurate prediction of the performance degradation that results from contention for shared resources in the memory subsystem. By using a bubble to apply a tunable amount of "pressure" to the memory subsystem on processors in production datacenters, our methodology can predict the performance interference between co-locate applications with an accuracy within 1% to 2% of the actual performance degradation. Using this methodology to arrive at "sensible" co-locations in Google's production datacenters with real-world large-scale applications, we can improve the utilization of a 500-machine cluster by 50% to 90% while guaranteeing a high quality of service of latency-sensitive applications.

References

[1]
G. Banga, P. Druschel, and J. C. Mogul. Resource containers: a new facility for resource management in server systems. In OSDI'99, Berkeley, CA, USA, 1999. USENIX Association.
[2]
L. A. Barroso and U. Hölzle. The case for energy-proportional computing. IEEE Computer, 40:33--37, December 2007.
[3]
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In HPCA '05, pages 340--351, Washington, DC, USA, 2005. IEEE Computer Society.
[4]
J. Chang and G. Sohi. Cooperative cache partitioning for chip multiprocessors. Supercomputing, page 252, 2007.
[5]
S. Chen, P. Gibbons, M. Kozuch, V. Liaskovitis, A. Ailamaki, G. Blelloch, B. Falsafi, L. Fix, N. Hardavellas, T. Mowry, and C. Wilkerson. Scheduling threads for constructive cache sharing on cmps. SPAA '07, Jun 2007.
[6]
S. Cho and L. Jin. Managing distributed, shared 12 caches through os-level page allocation. MICRO 39, Dec 2006.
[7]
E. Ebrahimi, C. Lee, O. Mutlu, and Y. Patt. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. ASPLOS '10, 2010.
[8]
A. Fedorova, M. Seltzer, and M. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. PACT '07, Sep 2007.
[9]
S. Govindan, J. Liu, A. Kansal, and A. Sivasubramaniam. Cuanta: Quantifying effects of shared on-chip resource interference for consolidated virtual machines. In SOCC '11, Oct 2011.
[10]
F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A framework for providing quality of service in chip multi-processors. MICRO 40, Dec 2007.
[11]
A. Herdrich, R. Illikkal, R. Iyer, D. Newell, V. Chadha, and J. Moses. Rate-based qos techniques for cache/memory in cmp platforms. ICS '09, Jun 2009.
[12]
U. Hoelzle and L. A. Barroso. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 1st edition, 2009.
[13]
R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. Qos policies and architecture for cache/memory in cmp platforms. SIGMETRICS '07, Jun 2007.
[14]
A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr, and J. Emer. Adaptive insertion policies for managing shared caches. PACT '08, Oct 2008.
[15]
V. Janapa Reddi, B. C. Lee, T. Chilimbi, and K. Vaid. Web search using mobile cores: quantifying and mitigating the price of efficiency. In ISCA '10, New York, NY, USA, 2010. ACM.
[16]
Y. Jiang, K. Tian, and X. Shen. Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. HiPeac '10, pages 201--215, 2010.
[17]
R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using os observations to improve performance in multicore systems. IEEE Micro, 28:54--66, May 2008.
[18]
C. Kozyrakis, A. Kansal, S. Sankar, and K. Vaid. Server engineering insights for large-scale online services. IEEE Micro, 30, July 2010.
[19]
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. HPCA '08, pages 367--378, 2008.
[20]
F. Liu, X. Jiang, and Y. Solihin. Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. HPCA '10, pages 1--12, 2010.
[21]
J. Machina and A. Sodan. Predicting cache needs and cache sensitivity for applications in cloud computing on cmp servers with configurable caches. IPDPS 2009, pages 1--8, 2009.
[22]
J. Mars, L. Tang, and R. Hundt. Heterogeneity in "homogeneous" warehouse-scale computers: A performance opportunity. IEEE Computer Architecture Letters, 2011.
[23]
J. Mars, L. Tang, and M. L. Soffa. Directly characterizing cross-core interference through contention synthesis. In HiPEAC '11, pages 167--176, New York, NY, USA, 2011. ACM.
[24]
J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. Contention aware execution: online contention detection and response. In CGO '10, New York, NY, USA, 2010. ACM.
[25]
A. Mishra, J. Hellerstein, W. Cirne, and C. Das. Towards characterizing cloud backend workloads: insights from google compute clusters. ACM SIGMETRICS Performance Evaluation Review, 37(4):34--41, 2010.
[26]
M. Moreto, F. Cazorla, A. Ramirez, R. Sakellariou, and M. Valero. Flexdcp: a qos framework for cmp architectures. SIGOPS Operating Systems Review, 43(2), Apr 2009.
[27]
K. Nesbit, N. Aggarwal, J. Laudon, and J. Smith. Fair queuing memory systems. MICRO-39, pages 208--222, 2006.
[28]
K. Nesbit, M. Moreto, F. Cazorla, A. Ramirez, M. Valero, and J. Smith. Multicore resource management. Micro, IEEE, 28(3):6--16, 2008.
[29]
M. Qureshi and Y. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. MICRO 39, Dec 2006.
[30]
N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. PACT '06, Sep 2006.
[31]
N. Rafique, W.-T. Lim, and M. Thottethodi. Effective management of dram bandwidth in multicore processors. PACT 2007, pages 245--258, 2007.
[32]
S. Srikantaiah, R. Das, A. Mishra, C. Das, and M. Kandemir. A case for integrated processor-cache partitioning in chip multiprocessors. SC '09, Nov 2009.
[33]
S. Srikantaiah, M. Kandemir, and M. Irwin. Adaptive set pinning: managing shared caches in chip multiprocessors. ASPLOS XIII, Mar 2008.
[34]
G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA '02. IEEE Computer Society, 2002.
[35]
D. Tam, R. Azimi, L. Soares, and M. Stumm. Rapidmrc: approximating 12 miss rate curves on commodity systems for online optimizations. ASPLOS '09, Feb 2009.
[36]
L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA '11, pages 283--294, New York, NY, USA, 2011. ACM.
[37]
Y. Xie and G. Loh. Dynamic classification of program memory behaviors in cmps. Proc. of CMP-MSI, held in conjunction with ISCA-35, 2008.
[38]
Y. Xie and G. Loh. Pipp: promotion/insertion pseudo-partitioning of multi-core shared caches. ISCA '09, Jun 2009.
[39]
C. Xu, X. Chen, R. Dick, and Z. Mao. Cache contention and application performance prediction for multi-core systems. In ISPASS 2010, march 2010.
[40]
D. Xu, C. Wu, and P.-C. Yew. On mitigating memory bandwidth contention through bandwidth-aware scheduling. PACT '10, Sep 2010.
[41]
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. ASPLOS '10, Mar 2010.

Cited By

View all
  • (2024)Co-Approximator: Enabling Performance Prediction in Colocated Applications.ACM Transactions on Embedded Computing Systems10.1145/367718024:1(1-28)Online publication date: 25-Jul-2024
  • (2024)CoolDC: A Cost-Effective Immersion-Cooled Datacenter with Workload-Aware Temperature ScalingACM Transactions on Architecture and Code Optimization10.1145/366492521:3(1-27)Online publication date: 14-May-2024
  • (2024)Prudentia: Findings of an Internet Fairness WatchdogProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672229(506-520)Online publication date: 4-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
December 2011
519 pages
ISBN:9781450310536
DOI:10.1145/2155620
  • Conference Chair:
  • Carlo Galuzzi,
  • General Chair:
  • Luigi Carro,
  • Program Chairs:
  • Andreas Moshovos,
  • Milos Prvulovic
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2011

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

MICRO-44
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)124
  • Downloads (Last 6 weeks)20
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Co-Approximator: Enabling Performance Prediction in Colocated Applications.ACM Transactions on Embedded Computing Systems10.1145/367718024:1(1-28)Online publication date: 25-Jul-2024
  • (2024)CoolDC: A Cost-Effective Immersion-Cooled Datacenter with Workload-Aware Temperature ScalingACM Transactions on Architecture and Code Optimization10.1145/366492521:3(1-27)Online publication date: 14-May-2024
  • (2024)Prudentia: Findings of an Internet Fairness WatchdogProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672229(506-520)Online publication date: 4-Aug-2024
  • (2024)Modeling Memory Bandwidth Interference in Cloud Data Centers via Deep Learning2024 9th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS61882.2024.10603358(441-447)Online publication date: 19-Apr-2024
  • (2024)Syscall Analysis for Resource Stress Identification for Container Network Functions2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00037(256-266)Online publication date: 7-Jul-2024
  • (2024)A self-stabilizing and auto-provisioning orchestration for microservices in edge-cloud continuumComputer Networks10.1016/j.comnet.2024.110279242(110279)Online publication date: Apr-2024
  • (2024)Orchestration Extensions for Interference- and Heterogeneity-Aware Placement for Data-AnalyticsInternational Journal of Parallel Programming10.1007/s10766-024-00771-252:4(298-323)Online publication date: 28-May-2024
  • (2024)DeepMRA: An Efficient Microservices Resource Allocation Framework with Deep Reinforcement Learning in the CloudAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5581-3_37(455-466)Online publication date: 5-Aug-2024
  • (2024)Running Serverless Function on Resource Fragments in Data CenterAlgorithms and Architectures for Parallel Processing10.1007/978-981-97-0798-0_26(443-462)Online publication date: 1-Mar-2024
  • (2023)Component-distinguishable Co-location and Resource Reclamation for High-throughput ComputingACM Transactions on Computer Systems10.1145/363000642:1-2(1-37)Online publication date: 18-Nov-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media