research-article

Open access

Improving Resource Efficiency at Scale with Heracles

Authors:

David Lo,

Liqun Cheng,

Rama Govindaraju,

Parthasarathy Ranganathan,

Christos KozyrakisAuthors Info & Claims

ACM Transactions on Computer Systems (TOCS), Volume 34, Issue 2

Article No.: 6, Pages 1 - 33

https://doi.org/10.1145/2882783

Published: 05 May 2016 Publication History

PDF eReader

Abstract

User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared resources can cause latency spikes that violate the service-level objectives of latency-sensitive tasks. The resulting under-utilization hurts both the affordability and energy efficiency of large-scale datacenters. With the slowdown in technology scaling caused by the sunsetting of Moore’s law, it becomes important to address this opportunity.

We present Heracles, a feedback-based controller that enables the safe colocation of best-effort tasks alongside a latency-critical service. Heracles dynamically manages multiple hardware and software isolation mechanisms, such as CPU, memory, and network isolation, to ensure that the latency-sensitive job meets latency targets while maximizing the resources given to best-effort tasks. We evaluate Heracles using production latency-critical and batch workloads from Google and demonstrate average server utilizations of 90% without latency violations across all the load and colocation scenarios that we evaluated.

References

[1]

Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. In Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication (SIGCOMM’08). ACM, New York, NY.

Digital Library

Google Scholar

[2]

Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center TCP (DCTCP). In Proceedings of the ACM SIGCOMM 2010 Conference (SIGCOMM’10). ACM, New York, NY. 1851182.1851192

Digital Library

Google Scholar

[3]

Luiz Barroso and Urs Hölzle. 2007. The case for energy-proportional computing. Computer 40, 12 (Dec. 2007).

Digital Library

Google Scholar

[4]

Luiz André Barroso, Jimmy Clidaras, and Urs Holzle. 2013. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines (2nd ed.). Morgan & Claypool.

Digital Library

Google Scholar

[5]

Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO.

Digital Library

Google Scholar

[6]

Sergey Blagodurov, Sergey Zhuravlev, Mohammad Dashti, and Alexandra Fedorova. 2011. A case for NUMA-aware contention management on multicore systems. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIXATC’11). USENIX Association, Berkeley, CA.

Digital Library

Google Scholar

[7]

Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO.

Digital Library

Google Scholar

[8]

Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press, Cambridge.

Digital Library

Google Scholar

[9]

Bob Briscoe. 2007. Flow rate fairness: Dismantling a religion. SIGCOMM Comput. Commun. Rev. 37, 2 (March 2007).

Digital Library

Google Scholar

[10]

Martin A. Brown. 2006. Traffic Control HOWTO. Retrieved from http://linux-ip.net/articles/Traffic-Control-HOWTO/.

Google Scholar

[11]

Marcus Carvalho, Walfredo Cirne, Francisco Brasileiro, and John Wilkes. 2014. Long-term SLOs for reclaimed cloud computing resources. In Proceedings of SOCC.

Digital Library

Google Scholar

[12]

Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. 2013. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY.

Digital Library

Google Scholar

[13]

Carlo Curino, Djellel E. Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, and Sriram Rao. 2014. Reservation-based scheduling: If you’re late don’t blame us!. In Proceedings of the 5th Annual Symposium on Cloud Computing.

Digital Library

Google Scholar

[14]

Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (Feb. 2013).

Digital Library

Google Scholar

[15]

Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Houston, TX.

Digital Library

Google Scholar

[16]

Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Salt Lake City, UT.

Digital Library

Google Scholar

[17]

Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Yale N. Patt. 2010. Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). ACM, New York, NY.

Digital Library

Google Scholar

[18]

H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 2011 38th Annual International Symposium on Computer Architecture.

Digital Library

Google Scholar

[19]

Sriram Govindan, Jie Liu, Aman Kansal, and Anand Sivasubramaniam. 2011. Cuanta: Quantifying effects of shared on-chip resource interference for consolidated virtual machines. In Proceedings of the 2nd ACM Symposium on Cloud Computing.

Digital Library

Google Scholar

[20]

Donald Gross. 2008. Fundamentals of Queueing Theory. John Wiley & Sons, New York NY.

Crossref

Google Scholar

[21]

Fei Guo, Hari Kannan, Li Zhao, Ramesh Illikkal, Ravi Iyer, Don Newell, Yan Solihin, and Christos Kozyrakis. 2007a. From chaos to QoS: Case studies in CMP resource management. SIGARCH Comput. Arch. News 35, 1 (March 2007).

Digital Library

Google Scholar

[22]

Fei Guo, Yan Solihin, Li Zhao, and Ravishankar Iyer. 2007b. A framework for providing quality of service in chip multi-processors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40). IEEE Computer Society, Washington, DC.

Digital Library

Google Scholar

[23]

Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro 31, 4 (2011).

Digital Library

Google Scholar

[24]

Lisa R. Hsu, Steven K. Reinhardt, Ravishankar Iyer, and Srihari Makineni. 2006. Communist, utilitarian, and capitalist cache policies on CMPs: Caches as a shared resource. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT’06). ACM, New York, NY.

Digital Library

Google Scholar

[25]

Intel. 2003. Serial ATA II Native Command Queuing Overview. Retrieved from http://download.intel.com/ support/chipsets/imsm/sb/sata2_ncq_overview.pdf.

Google Scholar

[26]

Intel. 2014. Intel®64 and IA-32 architectures software developer’s manual. 3B: System Programming Guide, Part 2 (Sep 2014).

Google Scholar

[27]

iperf. 2011. Iperf - The TCP/UDP Bandwidth Measurement Tool. Retrieved from https://iperf.fr/.

Google Scholar

[28]

Teerawat Issariyakul and Ekram Hossain. 2010. Introduction to Network Simulator NS2 (1st ed.). Springer.

Digital Library

Google Scholar

[29]

Ravi Iyer. 2004. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proceedings of the 18th Annual International Conference on Supercomputing (ICS’04). ACM, New York, NY.

Digital Library

Google Scholar

[30]

Ravi Iyer, Li Zhao, Fei Guo, Ramesh Illikkal, Srihari Makineni, Don Newell, Yan Solihin, Lisa Hsu, and Steve Reinhardt. 2007. QoS policies and architecture for cache/memory in CMP platforms. In Proceeding of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’07). ACM, New York, NY.

Digital Library

Google Scholar

[31]

Vijay Janapa Reddi, Benjamin C. Lee, Trishul Chilimbi, and Kushagra Vaid. 2010. Web search using mobile cores: Quantifying and mitigating the price of efficiency. SIGARCH Comput. Arch. News 38, 3 (June 2010). 1816038.1816002

Digital Library

Google Scholar

[32]

Min Kyu Jeong, Mattan Erez, Chander Sudanthi, and Nigel Paver. 2012. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. In Proceeding of the 49th Annual Design Automation Conference (DAC’12). ACM, New York, NY.

Digital Library

Google Scholar

[33]

Vimalkumar Jeyakumar, Mohammad Alizadeh, David Mazières, Balaji Prabhakar, Changhoon Kim, and Albert Greenberg. 2013. EyeQ: Practical network performance isolation at the edge. In Proceeding of the 10th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, Berkeley, CA.

Digital Library

Google Scholar

[34]

Svilen Kanev, Kim Hazelwood, Gu-Yeon Wei, and David Brooks. 2014. Tradeoffs between power management and tail latency in warehouse-scale applications. In IISWC.

Google Scholar

[35]

Rishi Kapoor, George Porter, Malveeka Tewari, Geoffrey M. Voelker, and Amin Vahdat. 2012. Chronos: Predictable low latency for data center applications. In Proceeding of the 3rd ACM Symposium on Cloud Computing (SoCC’12). ACM, New York, NY, Article 9.

Digital Library

Google Scholar

[36]

Harshad Kasture and Daniel Sanchez. 2014. Ubik: Efficient cache sharing with strict QoS for latency-critical workloads. In Proceeding of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIX).

Digital Library

Google Scholar

[37]

Wonyoung Kim, M. S. Gupta, Gu-Yeon Wei, and D. Brooks. 2008. System level analysis of fast, per-core DVFS using on-chip switching regulators. In Proceeding of the IEEE 14th International Symposium on High Performance Computer Architecture, 2008 (HPCA’08). HPCA.2008.4658633

Google Scholar

[38]

Quoc Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg Corrado, Jeff Dean, and Andrew Ng. 2012. Building high-level features using large scale unsupervised learning. In Proceeding of the International Conference in Machine Learning.

Google Scholar

[39]

Jacob Leverich and Christos Kozyrakis. 2014. Reconciling high server utilization and sub-millisecond quality-of-service. In Proceeding of the SIGOPS European Conference on Computer Systems (EuroSys).

Digital Library

Google Scholar

[40]

Bin Li, Li Zhao, Ravi Iyer, Li-Shiuan Peh, Michael Leddige, Michael Espig, Seung Eun Lee, and Donald Newell. 2011. CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs. J. Parallel Distrib. Comput. 71, 5 (May 2011).

Digital Library

Google Scholar

[41]

Kevin Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2013. Thin servers with smart pipes: Designing SoC accelerators for memcached. In Proceeding of the 40th Annual International Symposium on Computer Architecture.

Digital Library

Google Scholar

[42]

Kevin Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2012. System-level implications of disaggregated memory. In Proceeding of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA’12). IEEE Computer Society, Washington, DC.

Digital Library

Google Scholar

[43]

Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceeding of the IEEE 14th International Symposium on High Performance Computer Architecture, 2008 (HPCA’08).

Crossref

Google Scholar

[44]

Huan Liu. 2011. A measurement study of server utilization in public clouds. In Proceeding of the 2011 IEEE 9th International Conference on Dependable, Autonomic and Secure Computing (DASC).

Digital Library

Google Scholar

[45]

Rose Liu, Kevin Klues, Sarah Bird, Steven Hofmeyr, Krste Asanović, and John Kubiatowicz. 2009. Tessellation: Space-time partitioning in a manycore client OS. In Proceedings of the 1st USENIX Conference on Hot Topics in Parallelism (HotPar’09). USENIX Association, Berkeley, CA.

Digital Library

Google Scholar

[46]

Yanpei Liu, Stark C. Draper, and Nam Sung Kim. 2014. SleepScale: Runtime joint speed scaling and sleep states management for power efficient data centers. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, Piscataway, NJ.

Digital Library

Google Scholar

[47]

David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, Piscataway, NJ.

Digital Library

Google Scholar

[48]

David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY. 2749469.2749475

Digital Library

Google Scholar

[49]

Krishna T. Malladi, Benjamin C. Lee, Frank A. Nothaft, Christos Kozyrakis, Karthika Periyathambi, and Mark Horowitz. 2012. Towards energy-proportional datacenter memory with mobile DRAM. SIGARCH Comput. Arch. News 40, 3 (June 2012).

Digital Library

Google Scholar

[50]

R. Manikantan, Kaushik Rajan, and R. Govindarajan. 2012. Probabilistic shared cache management (PriSM). In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA’12). IEEE Computer Society, Washington, DC.

Digital Library

Google Scholar

[51]

Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-Up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th Annual IEEE/ACM Intl. Symp. on Microarchitecture (MICRO-44’11).

Digital Library

Google Scholar

[52]

J. Mars, Lingjia Tang, K. Skadron, M. L. Soffa, and R. Hundt. 2012. Increasing utilization in modern warehouse-scale computers using bubble-up. IEEE Micro. 32, 3 (May 2012). 10.1109/MM.2012.22

Digital Library

Google Scholar

[53]

Paul Marshall, Kate Keahey, and Tim Freeman. 2011. Improving utilization of infrastructure clouds. In Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

Digital Library

Google Scholar

[54]

McKinsey & Company. 2008. Revolutionizing data center efficiency. In Proceedings of the Uptime Institute Symposium.

Google Scholar

[55]

David Meisner, Brian T. Gold, and Thomas F. Wenisch. 2009. PowerNap: Eliminating server idle power. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV).

Digital Library

Google Scholar

[56]

David Meisner, Christopher M. Sadler, Luiz Andr Barroso, Wolf-Dietrich Weber, and Thomas F. Wenisch. 2011. Power management of online data-intensive services. In Proceedings of the 38th ACM Intl. Symp. on Computer Architecture. ACM, New York, NY.

Digital Library

Google Scholar

[57]

Paul Menage. 2007. CGROUPS. Retrieved from https://www.kernel.org/doc/Documentation/cgroup-v1/ cgroups.txt.

Google Scholar

[58]

Sai Prashanth Muralidhara, Lavanya Subramanian, Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda. 2011. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY.

Digital Library

Google Scholar

[59]

Vijay Nagarajan and Rajiv Gupta. 2009. ECMon: Exposing cache events for monitoring. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY.

Digital Library

Google Scholar

[60]

R. Nathuji, A. Kansal, and A. Ghaffarkhah. 2010. Q-clouds: Managing performance interference effects for QoS-aware clouds. In Proceedings of EuroSys, France.

Digital Library

Google Scholar

[61]

K. J. Nesbit, Nidhi Aggarwal, J. Laudon, and J. E. Smith. 2006. Fair queuing memory systems. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006 (MICRO-39).

Digital Library

Google Scholar

[62]

Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling memcache at facebook. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). USENIX, Lombard, IL, 385--398. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala.

Digital Library

Google Scholar

[63]

Dejan Novakovic, Nedeljko Vasic, Stanko Novakovic, Dejan Kostic, and Ricardo Bianchini. 2013. DeepDive: Transparently identifying and managing performance interference in virtualized environments. In Proc. of the USENIX Annual Technical Conference (ATC’13).

Digital Library

Google Scholar

[64]

W. Pattara-Aukom, S. Banerjee, and P. Krishnamurthy. 2002. Starvation prevention and quality of service in wireless LANs. In The 5th International Symposium on Wireless Personal Multimedia Communications, 2002, Vol. 3.

Crossref

Google Scholar

[65]

M. Podlesny and C. Williamson. 2012. Solving the TCP-incast problem with application-level scheduling. In Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE Press, Piscataway, NJ.

Digital Library

Google Scholar

[66]

Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth, Gopal Jan, Gray Michael, Haselman Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi, and Xiao Doug Burger. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, Piscataway, NJ.

Digital Library

Google Scholar

[67]

M. K. Qureshi and Y. N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture.

Digital Library

Google Scholar

[68]

Parthasarathy Ranganathan, Sarita Adve, and Norman P. Jouppi. 2000. Reconfigurable caches and their application to media processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). ACM, New York, NY.

Digital Library

Google Scholar

[69]

Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the ACM Symposium on Cloud Computing (SoCC). ACM, New York, NY.

Digital Library

Google Scholar

[70]

Chuck Rosenberg. 2013. Improving Photo Search: A Step Across the Semantic Gap. Retrieved from http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.

Google Scholar

[71]

Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. SIGARCH Comput. Archit. News 39, 3 (June 2011).

Digital Library

Google Scholar

[72]

Yoon Jae Seong, Eyec Hyun Nam, Jin Hyuk Yoon, Hongseok Kim, Jin yong Choi, Sookwan Lee, Young Hyun Bae, Jaejin Lee, Yookun Cho, and Sang Lyul Min. 2010. Hydra: A block-mapped parallel flash memory solid-state disk architecture. IEEE Trans. Comput. 59, 7 (July 2010).

Digital Library

Google Scholar

[73]

Akbar Sharifi, Shekhar Srikantaiah, Asit K. Mishra, Mahmut Kandemir, and Chita R. Das. 2011. METE: Meeting end-to-end qos in multicores through system-wide resource management. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’11). ACM, New York, NY.

Digital Library

Google Scholar

[74]

Shekhar Srikantaiah, Mahmut Kandemir, and Qian Wang. 2009. SHARP control: Controlled shared cache management in chip multiprocessors. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY.

Digital Library

Google Scholar

[75]

Shingo Tanaka and Christos Kozyrakis. 2014. High performance hardware-accelerated flash key-value store. In Proceedings of the 2014 Non-volatile Memories Workshop (NVMW).

Google Scholar

[76]

Lingjia Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. 2011. The impact of memory subsystem resource sharing on datacenter applications. In Proceedings of the 2011 38th Annual International Symposium on Computer Architecture (ISCA).

Digital Library

Google Scholar

[77]

Eno Thereska, Hitesh Ballani, Greg O’Shea, Thomas Karagiannis, Antony Rowstron, Tom Talpey, Richard Black, and Timothy Zhu. 2013. IOFlow: A software-defined storage architecture. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 182--196.

Digital Library

Google Scholar

[78]

Arunchandar Vasan, Anand Sivasubramaniam, Vikrant Shimpi, T. Sivabalan, and Rajesh Subbiah. 2010. Worth their watts? An empirical study of datacenter servers. In Proceedings of the International Symposium on High-Performance Computer Architecture.

Crossref

Google Scholar

[79]

Nedeljko Vasić, Dejan Novaković, Svetozar Miučin, Dejan Kostić, and Ricardo Bianchini. 2012. DejaVu: Accelerating resource allocation in virtualized environments. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). London, UK.

Digital Library

Google Scholar

[80]

Ben Verghese, Anoop Gupta, and Mendel Rosenblum. 1998. Performance isolation: Sharing and isolation in shared-memory multiprocessors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). ACM, New York, NY, 181--192.

Digital Library

Google Scholar

[81]

Christo Wilson, Hitesh Ballani, Thomas Karagiannis, and Ant Rowtron. 2011. Better never than late: Meeting deadlines in datacenter networks. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM’11). ACM, New York, NY.

Digital Library

Google Scholar

[82]

Carole-Jean Wu and Margaret Martonosi. 2008. A comparison of capacity management schemes for shared CMP caches. In Proceedings of the 7th Workshop on Duplicating, Deconstructing, and Debunking, Vol. 15. Citeseer.

Google Scholar

[83]

Yuejian Xie and Gabriel H. Loh. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY.

Digital Library

Google Scholar

[84]

Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13).

Digital Library

Google Scholar

[85]

A. Yasin. 2014. A top-down method for performance analysis and counters architecture. In Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 35--44.

Crossref

Google Scholar

[86]

Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. 2013. CPI2: CPU performance isolation for shared compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys). Prague, Czech Republic.

Digital Library

Google Scholar

[87]

Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. 2014. SMiTe: Precise QoS prediction on real-system SMT processors to improve utilization in warehouse scale computers. In Proceedings of the International Symposium on Microarchitecture (MICRO).

Digital Library

Google Scholar

Cited By

View all

Barnhart BBrooker MChinenkov DHooper TIm JJha PKraska TKurakula AKuznetsov AMcAlister GMuthukrishnan ANarayanan ATerry DUrgaonkar BYan J(2024)Resource Management in Aurora ServerlessProceedings of the VLDB Endowment10.14778/3685800.368582517:12(4038-4050)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.14778/3685800.3685825
Vardas IHunold SSwartvagher PTräff J(2024)Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00023(119-124)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00023
Liang YZeng SWang L(2023)Quantifying Resource Contention of Co-located Workloads with the System-level EntropyACM Transactions on Architecture and Code Optimization10.1145/356369620:1(1-25)Online publication date: 10-Feb-2023
https://dl.acm.org/doi/10.1145/3563696
Show More Cited By

Index Terms

Improving Resource Efficiency at Scale with Heracles
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Heracles: improving resource efficiency at scale
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared ...
The Hipster Approach for Improving Cloud System Efficiency

In 2013, U.S. data centers accounted for 2.2% of the country’s total electricity consumption, a figure that is projected to increase rapidly over the next decade. Many important data center workloads in cloud computing are interactive, and they demand ...
Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up

Precisely predicting performance degradation due to colocating multiple executing applications on a single machine is critical for improving utilization in modern warehouse-scale computers (WSCs). Bubble-Up is the first mechanism for such precise ...

Reviews

Reviewer: Bayard Kohlhepp

Most of the paper's authors are connected to Google, and their work centers on Google workload performance improvement. They've developed runtime controller software, Heracles, that uses real-time feedback and static modeling rules to adjust resource allocation within servers in order to meet service-level objectives (SLOs). The paper's closing section demonstrates that Heracles improved performance in test systems. It's great that they've made Google faster, but what use is this Google performance tool to the rest of us Unless and until they make Heracles freely downloadable (and we have server applications that can make use of it), the tool itself is of no general interest. The value of this paper, though, lies not in the end product, but in the journey that led to the end product. The first nine or ten pages describe the authors' analysis of resource contention, specifically the interplay between latency critical (LC) tasks and noncritical, best-effort (BE) tasks. All applications, from Internet of Things (IoT) to the cloud, on smartphones and in data center servers, face the problem of guaranteeing quick response from critical services despite the unpredictable activity of background tasks. At present, we solve the problem by over allocating resources. We throw money at the problem, paying for peak usage scenarios, while day in and day out we tolerate idle central processing units (CPUs) and underutilized storage. The analysis that led to Heracles, summarized in this paper, brings us a step closer to building efficient systems. The authors have created a template we can all use to analyze resource contention. They also identify specific tools and techniques used to address contention issues, quantify performance improvements achieved by using those tools, and survey numerous research contributors for further investigation. The rest of us will probably never use Heracles, but we can all use this advice to improve our own little corner of the universe. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems

ACM Transactions on Computer Systems Volume 34, Issue 2

May 2016

96 pages

ISSN:0734-2071

EISSN:1557-7333

DOI:10.1145/2912575

Editor:
Todd C. Mowry
Carnegie Mellon University, Pittsburgh, PA

Issue’s Table of Contents

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 May 2016

Accepted: 01 January 2016

Received: 01 October 2015

Published in TOCS Volume 34, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Google PhD Fellowship
Stanford Experimental Datacenter Lab
NSF
Google research

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
2,891
Total Downloads

Downloads (Last 12 months)157
Downloads (Last 6 weeks)31

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Barnhart BBrooker MChinenkov DHooper TIm JJha PKraska TKurakula AKuznetsov AMcAlister GMuthukrishnan ANarayanan ATerry DUrgaonkar BYan J(2024)Resource Management in Aurora ServerlessProceedings of the VLDB Endowment10.14778/3685800.368582517:12(4038-4050)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.14778/3685800.3685825
Vardas IHunold SSwartvagher PTräff J(2024)Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00023(119-124)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00023
Liang YZeng SWang L(2023)Quantifying Resource Contention of Co-located Workloads with the System-level EntropyACM Transactions on Architecture and Code Optimization10.1145/356369620:1(1-25)Online publication date: 10-Feb-2023
https://dl.acm.org/doi/10.1145/3563696
Cai BGuo QDong X(2023)AutoInfer: Self-Driving Management for Resource-Efficient, SLO-Aware Machine=Learning Inference in GPU ClustersIEEE Internet of Things Journal10.1109/JIOT.2022.322338110:7(6271-6285)Online publication date: 1-Apr-2023
https://doi.org/10.1109/JIOT.2022.3223381
Andreoli RCucinotta T(2023)Inducing Huge Tail Latency on a MongoDB deployment2023 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E59103.2023.00020(107-112)Online publication date: 25-Sep-2023
https://doi.org/10.1109/IC2E59103.2023.00020
Masouros DPinto CGazzetti MXydis SSoudris D(2023)Adrias: Interference-Aware Memory Orchestration for Disaggregated Cloud Infrastructures2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070939(855-869)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10070939
Navarro-Torres AAlastruey-Benedé JIbáñez PViñals-Yúfera V(2023)BALANCER: bandwidth allocation and cache partitioning for multicore processorsThe Journal of Supercomputing10.1007/s11227-023-05070-079:9(10252-10276)Online publication date: 4-Feb-2023
https://dl.acm.org/doi/10.1007/s11227-023-05070-0
Shekhar SChhokra ABhattacharjee ABarve YKhare SPallez GSun HGokhale AKarsai G(2023)INDICES: Applying DDDAS Principles for Performance Interference-aware Cloud-to-Fog Application MigrationHandbook of Dynamic Data Driven Applications Systems10.1007/978-3-031-27986-7_33(869-902)Online publication date: 31-Mar-2023
https://doi.org/10.1007/978-3-031-27986-7_33
Yang YKong XZhao LLi YZhang HLi JQi HLi K(2022)SDCBench: A Benchmark Suite for Workload Colocation and Evaluation in DatacentersIntelligent Computing10.34133/2022/98106912022Online publication date: Jan-2022
https://doi.org/10.34133/2022/9810691
Cai BLi KZhao LZhang R(2022)Less Provisioning: A Hybrid Resource Scaling Engine for Long-Running Services With Tail Latency GuaranteesIEEE Transactions on Cloud Computing10.1109/TCC.2020.301634510:3(1941-1957)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TCC.2020.3016345
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

Heracles: improving resource efficiency at scale

The Hipster Approach for Improving Cloud System Efficiency

Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations