research-article

Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Authors:

Mary Lou SoffaAuthors Info & Claims

MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 248 - 259

https://doi.org/10.1145/2155620.2155650

Published: 03 December 2011 Publication History

Abstract

As much of the world's computing continues to move into the cloud, the overprovisioning of computing resources to ensure the performance isolation of latency-sensitive tasks, such as web search, in modern datacenters is a major contributor to low machine utilization. Being unable to accurately predict performance degradation due to contention for shared resources on multicore systems has led to the heavy handed approach of simply disallowing the co-location of high-priority, latency-sensitive tasks with other tasks. Performing this precise prediction has been a challenging and unsolved problem.

In this paper, we present Bubble-Up, a characterization methodology that enables the accurate prediction of the performance degradation that results from contention for shared resources in the memory subsystem. By using a bubble to apply a tunable amount of "pressure" to the memory subsystem on processors in production datacenters, our methodology can predict the performance interference between co-locate applications with an accuracy within 1% to 2% of the actual performance degradation. Using this methodology to arrive at "sensible" co-locations in Google's production datacenters with real-world large-scale applications, we can improve the utilization of a 500-machine cluster by 50% to 90% while guaranteeing a high quality of service of latency-sensitive applications.

References

[1]

G. Banga, P. Druschel, and J. C. Mogul. Resource containers: a new facility for resource management in server systems. In OSDI'99, Berkeley, CA, USA, 1999. USENIX Association.

Digital Library

[2]

L. A. Barroso and U. Hölzle. The case for energy-proportional computing. IEEE Computer, 40:33--37, December 2007.

Digital Library

[3]

D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In HPCA '05, pages 340--351, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[4]

J. Chang and G. Sohi. Cooperative cache partitioning for chip multiprocessors. Supercomputing, page 252, 2007.

Digital Library

[5]

S. Chen, P. Gibbons, M. Kozuch, V. Liaskovitis, A. Ailamaki, G. Blelloch, B. Falsafi, L. Fix, N. Hardavellas, T. Mowry, and C. Wilkerson. Scheduling threads for constructive cache sharing on cmps. SPAA '07, Jun 2007.

Digital Library

[6]

S. Cho and L. Jin. Managing distributed, shared 12 caches through os-level page allocation. MICRO 39, Dec 2006.

Digital Library

[7]

E. Ebrahimi, C. Lee, O. Mutlu, and Y. Patt. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. ASPLOS '10, 2010.

Digital Library

[8]

A. Fedorova, M. Seltzer, and M. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. PACT '07, Sep 2007.

Digital Library

[9]

S. Govindan, J. Liu, A. Kansal, and A. Sivasubramaniam. Cuanta: Quantifying effects of shared on-chip resource interference for consolidated virtual machines. In SOCC '11, Oct 2011.

Digital Library

[10]

F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A framework for providing quality of service in chip multi-processors. MICRO 40, Dec 2007.

Digital Library

[11]

A. Herdrich, R. Illikkal, R. Iyer, D. Newell, V. Chadha, and J. Moses. Rate-based qos techniques for cache/memory in cmp platforms. ICS '09, Jun 2009.

Digital Library

[12]

U. Hoelzle and L. A. Barroso. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 1st edition, 2009.

Digital Library

[13]

R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. Qos policies and architecture for cache/memory in cmp platforms. SIGMETRICS '07, Jun 2007.

Digital Library

[14]

A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr, and J. Emer. Adaptive insertion policies for managing shared caches. PACT '08, Oct 2008.

Digital Library

[15]

V. Janapa Reddi, B. C. Lee, T. Chilimbi, and K. Vaid. Web search using mobile cores: quantifying and mitigating the price of efficiency. In ISCA '10, New York, NY, USA, 2010. ACM.

Digital Library

[16]

Y. Jiang, K. Tian, and X. Shen. Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. HiPeac '10, pages 201--215, 2010.

Digital Library

[17]

R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using os observations to improve performance in multicore systems. IEEE Micro, 28:54--66, May 2008.

Digital Library

[18]

C. Kozyrakis, A. Kansal, S. Sankar, and K. Vaid. Server engineering insights for large-scale online services. IEEE Micro, 30, July 2010.

Digital Library

[19]

J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. HPCA '08, pages 367--378, 2008.

[20]

F. Liu, X. Jiang, and Y. Solihin. Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. HPCA '10, pages 1--12, 2010.

[21]

J. Machina and A. Sodan. Predicting cache needs and cache sensitivity for applications in cloud computing on cmp servers with configurable caches. IPDPS 2009, pages 1--8, 2009.

Digital Library

[22]

J. Mars, L. Tang, and R. Hundt. Heterogeneity in "homogeneous" warehouse-scale computers: A performance opportunity. IEEE Computer Architecture Letters, 2011.

Digital Library

[23]

J. Mars, L. Tang, and M. L. Soffa. Directly characterizing cross-core interference through contention synthesis. In HiPEAC '11, pages 167--176, New York, NY, USA, 2011. ACM.

Digital Library

[24]

J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. Contention aware execution: online contention detection and response. In CGO '10, New York, NY, USA, 2010. ACM.

Digital Library

[25]

A. Mishra, J. Hellerstein, W. Cirne, and C. Das. Towards characterizing cloud backend workloads: insights from google compute clusters. ACM SIGMETRICS Performance Evaluation Review, 37(4):34--41, 2010.

Digital Library

[26]

M. Moreto, F. Cazorla, A. Ramirez, R. Sakellariou, and M. Valero. Flexdcp: a qos framework for cmp architectures. SIGOPS Operating Systems Review, 43(2), Apr 2009.

Digital Library

[27]

K. Nesbit, N. Aggarwal, J. Laudon, and J. Smith. Fair queuing memory systems. MICRO-39, pages 208--222, 2006.

Digital Library

[28]

K. Nesbit, M. Moreto, F. Cazorla, A. Ramirez, M. Valero, and J. Smith. Multicore resource management. Micro, IEEE, 28(3):6--16, 2008.

Digital Library

[29]

M. Qureshi and Y. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. MICRO 39, Dec 2006.

Digital Library

[30]

N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. PACT '06, Sep 2006.

Digital Library

[31]

N. Rafique, W.-T. Lim, and M. Thottethodi. Effective management of dram bandwidth in multicore processors. PACT 2007, pages 245--258, 2007.

Digital Library

[32]

S. Srikantaiah, R. Das, A. Mishra, C. Das, and M. Kandemir. A case for integrated processor-cache partitioning in chip multiprocessors. SC '09, Nov 2009.

Digital Library

[33]

S. Srikantaiah, M. Kandemir, and M. Irwin. Adaptive set pinning: managing shared caches in chip multiprocessors. ASPLOS XIII, Mar 2008.

Digital Library

[34]

G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA '02. IEEE Computer Society, 2002.

Digital Library

[35]

D. Tam, R. Azimi, L. Soares, and M. Stumm. Rapidmrc: approximating 12 miss rate curves on commodity systems for online optimizations. ASPLOS '09, Feb 2009.

Digital Library

[36]

L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA '11, pages 283--294, New York, NY, USA, 2011. ACM.

Digital Library

[37]

Y. Xie and G. Loh. Dynamic classification of program memory behaviors in cmps. Proc. of CMP-MSI, held in conjunction with ISCA-35, 2008.

[38]

Y. Xie and G. Loh. Pipp: promotion/insertion pseudo-partitioning of multi-core shared caches. ISCA '09, Jun 2009.

Digital Library

[39]

C. Xu, X. Chen, R. Dick, and Z. Mao. Cache contention and application performance prediction for multi-core systems. In ISPASS 2010, march 2010.

[40]

D. Xu, C. Wu, and P.-C. Yew. On mitigating memory bandwidth contention through bandwidth-aware scheduling. PACT '10, Sep 2010.

Digital Library

[41]

S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. ASPLOS '10, Mar 2010.

Digital Library

Cited By

Yang DZheng KQian SHua QZhang KCao JXue G(2025)Mitigating interference of microservices with a scoring mechanism in large-scale clustersThe Journal of Supercomputing10.1007/s11227-024-06534-781:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s11227-024-06534-7
Liu QYang YDu DXia YZhang PFeng JLarus JChen HBagchi SZhang Y(2024)Harmonizing efficiency and practicabilityProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3691993(1-17)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3691993
Sun BHuang ZZhao HXiao WZhang XLi YLin WGavrilovska ATerry D(2024)LlumnixProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691948(173-191)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691948
Show More Cited By

Index Terms

Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations
1. General and reference
  1. Cross-computing tools and techniques
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Monitors

Recommendations

Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers
ICSA '13

Ensuring the quality of service (QoS) for latency-sensitive applications while allowing co-locations of multiple applications on servers is critical for improving server utilization and reducing cost in modern warehouse-scale computers (WSCs). Recent ...
Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up

Precisely predicting performance degradation due to colocating multiple executing applications on a single machine is critical for improving utilization in modern warehouse-scale computers (WSCs). Bubble-Up is the first mechanism for such precise ...
Bubble: Lightweight Core Sharing in NFV
2019 IEEE Global Communications Conference (GLOBECOM)
Many researches have revealed the requirement of enabling multiple network functions (NFs) to share a CPU core in Network Function Virtualization (NFV) to support fine-grained NF models, efficient resource utilization, and chain consolidation. However, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

December 2011

519 pages

ISBN:9781450310536

DOI:10.1145/2155620

Conference Chair:
Carlo Galuzzi
Technische Universiteit Delft, The Netherlands
,
General Chair:
Luigi Carro
Universidade Federal do Rio Grande do Sul, Brasil
,
Program Chairs:
Andreas Moshovos
University of Toronto, Canada
,
Milos Prvulovic
Georgia Institute of Technology, United States

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE
ACM: Association for Computing Machinery
UFRGS: Universidade Federal do Rio Grande do Sul
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

MICRO-44

Sponsor:

ACM
UFRGS
SIGMICRO
IEEE-CS

MICRO-44: The 44th Annual IEEE/ACM International Symposium on Microarchitecture

December 3 - 7, 2011

Porto Alegre, Brazil

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

502
Total Citations
View Citations
1,681
Total Downloads

Downloads (Last 12 months)144
Downloads (Last 6 weeks)21

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang DZheng KQian SHua QZhang KCao JXue G(2025)Mitigating interference of microservices with a scoring mechanism in large-scale clustersThe Journal of Supercomputing10.1007/s11227-024-06534-781:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s11227-024-06534-7
Liu QYang YDu DXia YZhang PFeng JLarus JChen HBagchi SZhang Y(2024)Harmonizing efficiency and practicabilityProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3691993(1-17)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3691993
Sun BHuang ZZhao HXiao WZhang XLi YLin WGavrilovska ATerry D(2024)LlumnixProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691948(173-191)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691948
Luo ZSon SRatnasamy SShenker SGavrilovska ATerry D(2024)Harvesting memory-bound CPU stall cycles in software with MSHProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691942(57-75)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691942
Peng YChen SZhao YYu ZVanbever LZhang I(2024)UFOProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691909(1511-1530)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.5555/3691825.3691909
Han BPaul TLiu ZGandhi A(2024)KACE: Kernel-Aware Colocation for Efficient GPU Spatial SharingProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698555(460-469)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698555
Huang ZXie MTang SChang ZYao ZBao YWang S(2024)INS: Identifying and Mitigating Performance Interference in Clouds via Interference-Sensitive PathsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698508(380-397)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698508
Mohammad RGopalakrishnan SPattabiraman K(2024)Co-Approximator: Enabling Performance Prediction in Colocated Applications.ACM Transactions on Embedded Computing Systems10.1145/367718024:1(1-28)Online publication date: 25-Jul-2024
https://dl.acm.org/doi/10.1145/3677180
Min DByun ILee GKim J(2024)CoolDC: A Cost-Effective Immersion-Cooled Datacenter with Workload-Aware Temperature ScalingACM Transactions on Architecture and Code Optimization10.1145/366492521:3(1-27)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3664925
Philip AAthapathu RWare RMkocheko FSchlomer AShou MMeng ZSeshan SSherry JSekar VYu MSeneviratne AVeitch D(2024)Prudentia: Findings of an Internet Fairness WatchdogProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672229(506-520)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3651890.3672229
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten