Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Improving Resource Efficiency at Scale with Heracles

Published: 05 May 2016 Publication History

Abstract

User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared resources can cause latency spikes that violate the service-level objectives of latency-sensitive tasks. The resulting under-utilization hurts both the affordability and energy efficiency of large-scale datacenters. With the slowdown in technology scaling caused by the sunsetting of Moore’s law, it becomes important to address this opportunity.
We present Heracles, a feedback-based controller that enables the safe colocation of best-effort tasks alongside a latency-critical service. Heracles dynamically manages multiple hardware and software isolation mechanisms, such as CPU, memory, and network isolation, to ensure that the latency-sensitive job meets latency targets while maximizing the resources given to best-effort tasks. We evaluate Heracles using production latency-critical and batch workloads from Google and demonstrate average server utilizations of 90% without latency violations across all the load and colocation scenarios that we evaluated.

References

[1]
Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. In Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication (SIGCOMM’08). ACM, New York, NY.
[2]
Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center TCP (DCTCP). In Proceedings of the ACM SIGCOMM 2010 Conference (SIGCOMM’10). ACM, New York, NY. 1851182.1851192
[3]
Luiz Barroso and Urs Hölzle. 2007. The case for energy-proportional computing. Computer 40, 12 (Dec. 2007).
[4]
Luiz André Barroso, Jimmy Clidaras, and Urs Holzle. 2013. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines (2nd ed.). Morgan & Claypool.
[5]
Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO.
[6]
Sergey Blagodurov, Sergey Zhuravlev, Mohammad Dashti, and Alexandra Fedorova. 2011. A case for NUMA-aware contention management on multicore systems. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIXATC’11). USENIX Association, Berkeley, CA.
[7]
Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO.
[8]
Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press, Cambridge.
[9]
Bob Briscoe. 2007. Flow rate fairness: Dismantling a religion. SIGCOMM Comput. Commun. Rev. 37, 2 (March 2007).
[10]
Martin A. Brown. 2006. Traffic Control HOWTO. Retrieved from http://linux-ip.net/articles/Traffic-Control-HOWTO/.
[11]
Marcus Carvalho, Walfredo Cirne, Francisco Brasileiro, and John Wilkes. 2014. Long-term SLOs for reclaimed cloud computing resources. In Proceedings of SOCC.
[12]
Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. 2013. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY.
[13]
Carlo Curino, Djellel E. Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, and Sriram Rao. 2014. Reservation-based scheduling: If you’re late don’t blame us!. In Proceedings of the 5th Annual Symposium on Cloud Computing.
[14]
Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (Feb. 2013).
[15]
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Houston, TX.
[16]
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Salt Lake City, UT.
[17]
Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Yale N. Patt. 2010. Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). ACM, New York, NY.
[18]
H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 2011 38th Annual International Symposium on Computer Architecture.
[19]
Sriram Govindan, Jie Liu, Aman Kansal, and Anand Sivasubramaniam. 2011. Cuanta: Quantifying effects of shared on-chip resource interference for consolidated virtual machines. In Proceedings of the 2nd ACM Symposium on Cloud Computing.
[20]
Donald Gross. 2008. Fundamentals of Queueing Theory. John Wiley & Sons, New York NY.
[21]
Fei Guo, Hari Kannan, Li Zhao, Ramesh Illikkal, Ravi Iyer, Don Newell, Yan Solihin, and Christos Kozyrakis. 2007a. From chaos to QoS: Case studies in CMP resource management. SIGARCH Comput. Arch. News 35, 1 (March 2007).
[22]
Fei Guo, Yan Solihin, Li Zhao, and Ravishankar Iyer. 2007b. A framework for providing quality of service in chip multi-processors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40). IEEE Computer Society, Washington, DC.
[23]
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro 31, 4 (2011).
[24]
Lisa R. Hsu, Steven K. Reinhardt, Ravishankar Iyer, and Srihari Makineni. 2006. Communist, utilitarian, and capitalist cache policies on CMPs: Caches as a shared resource. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT’06). ACM, New York, NY.
[25]
Intel. 2003. Serial ATA II Native Command Queuing Overview. Retrieved from http://download.intel.com/ support/chipsets/imsm/sb/sata2_ncq_overview.pdf.
[26]
Intel. 2014. Intel®64 and IA-32 architectures software developer’s manual. 3B: System Programming Guide, Part 2 (Sep 2014).
[27]
iperf. 2011. Iperf - The TCP/UDP Bandwidth Measurement Tool. Retrieved from https://iperf.fr/.
[28]
Teerawat Issariyakul and Ekram Hossain. 2010. Introduction to Network Simulator NS2 (1st ed.). Springer.
[29]
Ravi Iyer. 2004. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proceedings of the 18th Annual International Conference on Supercomputing (ICS’04). ACM, New York, NY.
[30]
Ravi Iyer, Li Zhao, Fei Guo, Ramesh Illikkal, Srihari Makineni, Don Newell, Yan Solihin, Lisa Hsu, and Steve Reinhardt. 2007. QoS policies and architecture for cache/memory in CMP platforms. In Proceeding of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’07). ACM, New York, NY.
[31]
Vijay Janapa Reddi, Benjamin C. Lee, Trishul Chilimbi, and Kushagra Vaid. 2010. Web search using mobile cores: Quantifying and mitigating the price of efficiency. SIGARCH Comput. Arch. News 38, 3 (June 2010). 1816038.1816002
[32]
Min Kyu Jeong, Mattan Erez, Chander Sudanthi, and Nigel Paver. 2012. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC. In Proceeding of the 49th Annual Design Automation Conference (DAC’12). ACM, New York, NY.
[33]
Vimalkumar Jeyakumar, Mohammad Alizadeh, David Mazières, Balaji Prabhakar, Changhoon Kim, and Albert Greenberg. 2013. EyeQ: Practical network performance isolation at the edge. In Proceeding of the 10th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, Berkeley, CA.
[34]
Svilen Kanev, Kim Hazelwood, Gu-Yeon Wei, and David Brooks. 2014. Tradeoffs between power management and tail latency in warehouse-scale applications. In IISWC.
[35]
Rishi Kapoor, George Porter, Malveeka Tewari, Geoffrey M. Voelker, and Amin Vahdat. 2012. Chronos: Predictable low latency for data center applications. In Proceeding of the 3rd ACM Symposium on Cloud Computing (SoCC’12). ACM, New York, NY, Article 9.
[36]
Harshad Kasture and Daniel Sanchez. 2014. Ubik: Efficient cache sharing with strict QoS for latency-critical workloads. In Proceeding of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIX).
[37]
Wonyoung Kim, M. S. Gupta, Gu-Yeon Wei, and D. Brooks. 2008. System level analysis of fast, per-core DVFS using on-chip switching regulators. In Proceeding of the IEEE 14th International Symposium on High Performance Computer Architecture, 2008 (HPCA’08). HPCA.2008.4658633
[38]
Quoc Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg Corrado, Jeff Dean, and Andrew Ng. 2012. Building high-level features using large scale unsupervised learning. In Proceeding of the International Conference in Machine Learning.
[39]
Jacob Leverich and Christos Kozyrakis. 2014. Reconciling high server utilization and sub-millisecond quality-of-service. In Proceeding of the SIGOPS European Conference on Computer Systems (EuroSys).
[40]
Bin Li, Li Zhao, Ravi Iyer, Li-Shiuan Peh, Michael Leddige, Michael Espig, Seung Eun Lee, and Donald Newell. 2011. CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs. J. Parallel Distrib. Comput. 71, 5 (May 2011).
[41]
Kevin Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2013. Thin servers with smart pipes: Designing SoC accelerators for memcached. In Proceeding of the 40th Annual International Symposium on Computer Architecture.
[42]
Kevin Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2012. System-level implications of disaggregated memory. In Proceeding of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA’12). IEEE Computer Society, Washington, DC.
[43]
Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceeding of the IEEE 14th International Symposium on High Performance Computer Architecture, 2008 (HPCA’08).
[44]
Huan Liu. 2011. A measurement study of server utilization in public clouds. In Proceeding of the 2011 IEEE 9th International Conference on Dependable, Autonomic and Secure Computing (DASC).
[45]
Rose Liu, Kevin Klues, Sarah Bird, Steven Hofmeyr, Krste Asanović, and John Kubiatowicz. 2009. Tessellation: Space-time partitioning in a manycore client OS. In Proceedings of the 1st USENIX Conference on Hot Topics in Parallelism (HotPar’09). USENIX Association, Berkeley, CA.
[46]
Yanpei Liu, Stark C. Draper, and Nam Sung Kim. 2014. SleepScale: Runtime joint speed scaling and sleep states management for power efficient data centers. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, Piscataway, NJ.
[47]
David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, Piscataway, NJ.
[48]
David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY. 2749469.2749475
[49]
Krishna T. Malladi, Benjamin C. Lee, Frank A. Nothaft, Christos Kozyrakis, Karthika Periyathambi, and Mark Horowitz. 2012. Towards energy-proportional datacenter memory with mobile DRAM. SIGARCH Comput. Arch. News 40, 3 (June 2012).
[50]
R. Manikantan, Kaushik Rajan, and R. Govindarajan. 2012. Probabilistic shared cache management (PriSM). In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA’12). IEEE Computer Society, Washington, DC.
[51]
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-Up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th Annual IEEE/ACM Intl. Symp. on Microarchitecture (MICRO-44’11).
[52]
J. Mars, Lingjia Tang, K. Skadron, M. L. Soffa, and R. Hundt. 2012. Increasing utilization in modern warehouse-scale computers using bubble-up. IEEE Micro. 32, 3 (May 2012). 10.1109/MM.2012.22
[53]
Paul Marshall, Kate Keahey, and Tim Freeman. 2011. Improving utilization of infrastructure clouds. In Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[54]
McKinsey & Company. 2008. Revolutionizing data center efficiency. In Proceedings of the Uptime Institute Symposium.
[55]
David Meisner, Brian T. Gold, and Thomas F. Wenisch. 2009. PowerNap: Eliminating server idle power. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV).
[56]
David Meisner, Christopher M. Sadler, Luiz Andr Barroso, Wolf-Dietrich Weber, and Thomas F. Wenisch. 2011. Power management of online data-intensive services. In Proceedings of the 38th ACM Intl. Symp. on Computer Architecture. ACM, New York, NY.
[57]
Paul Menage. 2007. CGROUPS. Retrieved from https://www.kernel.org/doc/Documentation/cgroup-v1/ cgroups.txt.
[58]
Sai Prashanth Muralidhara, Lavanya Subramanian, Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda. 2011. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY.
[59]
Vijay Nagarajan and Rajiv Gupta. 2009. ECMon: Exposing cache events for monitoring. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY.
[60]
R. Nathuji, A. Kansal, and A. Ghaffarkhah. 2010. Q-clouds: Managing performance interference effects for QoS-aware clouds. In Proceedings of EuroSys, France.
[61]
K. J. Nesbit, Nidhi Aggarwal, J. Laudon, and J. E. Smith. 2006. Fair queuing memory systems. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006 (MICRO-39).
[62]
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling memcache at facebook. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). USENIX, Lombard, IL, 385--398. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala.
[63]
Dejan Novakovic, Nedeljko Vasic, Stanko Novakovic, Dejan Kostic, and Ricardo Bianchini. 2013. DeepDive: Transparently identifying and managing performance interference in virtualized environments. In Proc. of the USENIX Annual Technical Conference (ATC’13).
[64]
W. Pattara-Aukom, S. Banerjee, and P. Krishnamurthy. 2002. Starvation prevention and quality of service in wireless LANs. In The 5th International Symposium on Wireless Personal Multimedia Communications, 2002, Vol. 3.
[65]
M. Podlesny and C. Williamson. 2012. Solving the TCP-incast problem with application-level scheduling. In Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE Press, Piscataway, NJ.
[66]
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth, Gopal Jan, Gray Michael, Haselman Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi, and Xiao Doug Burger. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, Piscataway, NJ.
[67]
M. K. Qureshi and Y. N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture.
[68]
Parthasarathy Ranganathan, Sarita Adve, and Norman P. Jouppi. 2000. Reconfigurable caches and their application to media processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). ACM, New York, NY.
[69]
Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the ACM Symposium on Cloud Computing (SoCC). ACM, New York, NY.
[70]
Chuck Rosenberg. 2013. Improving Photo Search: A Step Across the Semantic Gap. Retrieved from http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.
[71]
Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. SIGARCH Comput. Archit. News 39, 3 (June 2011).
[72]
Yoon Jae Seong, Eyec Hyun Nam, Jin Hyuk Yoon, Hongseok Kim, Jin yong Choi, Sookwan Lee, Young Hyun Bae, Jaejin Lee, Yookun Cho, and Sang Lyul Min. 2010. Hydra: A block-mapped parallel flash memory solid-state disk architecture. IEEE Trans. Comput. 59, 7 (July 2010).
[73]
Akbar Sharifi, Shekhar Srikantaiah, Asit K. Mishra, Mahmut Kandemir, and Chita R. Das. 2011. METE: Meeting end-to-end qos in multicores through system-wide resource management. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’11). ACM, New York, NY.
[74]
Shekhar Srikantaiah, Mahmut Kandemir, and Qian Wang. 2009. SHARP control: Controlled shared cache management in chip multiprocessors. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY.
[75]
Shingo Tanaka and Christos Kozyrakis. 2014. High performance hardware-accelerated flash key-value store. In Proceedings of the 2014 Non-volatile Memories Workshop (NVMW).
[76]
Lingjia Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. 2011. The impact of memory subsystem resource sharing on datacenter applications. In Proceedings of the 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[77]
Eno Thereska, Hitesh Ballani, Greg O’Shea, Thomas Karagiannis, Antony Rowstron, Tom Talpey, Richard Black, and Timothy Zhu. 2013. IOFlow: A software-defined storage architecture. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 182--196.
[78]
Arunchandar Vasan, Anand Sivasubramaniam, Vikrant Shimpi, T. Sivabalan, and Rajesh Subbiah. 2010. Worth their watts? An empirical study of datacenter servers. In Proceedings of the International Symposium on High-Performance Computer Architecture.
[79]
Nedeljko Vasić, Dejan Novaković, Svetozar Miučin, Dejan Kostić, and Ricardo Bianchini. 2012. DejaVu: Accelerating resource allocation in virtualized environments. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). London, UK.
[80]
Ben Verghese, Anoop Gupta, and Mendel Rosenblum. 1998. Performance isolation: Sharing and isolation in shared-memory multiprocessors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). ACM, New York, NY, 181--192.
[81]
Christo Wilson, Hitesh Ballani, Thomas Karagiannis, and Ant Rowtron. 2011. Better never than late: Meeting deadlines in datacenter networks. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM’11). ACM, New York, NY.
[82]
Carole-Jean Wu and Margaret Martonosi. 2008. A comparison of capacity management schemes for shared CMP caches. In Proceedings of the 7th Workshop on Duplicating, Deconstructing, and Debunking, Vol. 15. Citeseer.
[83]
Yuejian Xie and Gabriel H. Loh. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY.
[84]
Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13).
[85]
A. Yasin. 2014. A top-down method for performance analysis and counters architecture. In Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 35--44.
[86]
Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. 2013. CPI2: CPU performance isolation for shared compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys). Prague, Czech Republic.
[87]
Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. 2014. SMiTe: Precise QoS prediction on real-system SMT processors to improve utilization in warehouse scale computers. In Proceedings of the International Symposium on Microarchitecture (MICRO).

Cited By

View all
  • (2024)Resource Management in Aurora ServerlessProceedings of the VLDB Endowment10.14778/3685800.368582517:12(4038-4050)Online publication date: 1-Aug-2024
  • (2024)Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00023(119-124)Online publication date: 6-May-2024
  • (2023)Quantifying Resource Contention of Co-located Workloads with the System-level EntropyACM Transactions on Architecture and Code Optimization10.1145/356369620:1(1-25)Online publication date: 10-Feb-2023
  • Show More Cited By

Recommendations

Reviews

Bayard Kohlhepp

Most of the paper's authors are connected to Google, and their work centers on Google workload performance improvement. They've developed runtime controller software, Heracles, that uses real-time feedback and static modeling rules to adjust resource allocation within servers in order to meet service-level objectives (SLOs). The paper's closing section demonstrates that Heracles improved performance in test systems. It's great that they've made Google faster, but what use is this Google performance tool to the rest of us Unless and until they make Heracles freely downloadable (and we have server applications that can make use of it), the tool itself is of no general interest. The value of this paper, though, lies not in the end product, but in the journey that led to the end product. The first nine or ten pages describe the authors' analysis of resource contention, specifically the interplay between latency critical (LC) tasks and noncritical, best-effort (BE) tasks. All applications, from Internet of Things (IoT) to the cloud, on smartphones and in data center servers, face the problem of guaranteeing quick response from critical services despite the unpredictable activity of background tasks. At present, we solve the problem by over allocating resources. We throw money at the problem, paying for peak usage scenarios, while day in and day out we tolerate idle central processing units (CPUs) and underutilized storage. The analysis that led to Heracles, summarized in this paper, brings us a step closer to building efficient systems. The authors have created a template we can all use to analyze resource contention. They also identify specific tools and techniques used to address contention issues, quantify performance improvements achieved by using those tools, and survey numerous research contributors for further investigation. The rest of us will probably never use Heracles, but we can all use this advice to improve our own little corner of the universe. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 34, Issue 2
May 2016
96 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/2912575
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 May 2016
Accepted: 01 January 2016
Received: 01 October 2015
Published in TOCS Volume 34, Issue 2

Check for updates

Author Tags

  1. Datacenter
  2. QoS
  3. interference
  4. latency-critical applications
  5. performance isolation
  6. resource efficiency
  7. scheduling
  8. warehouse-scale computer

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Google PhD Fellowship
  • Stanford Experimental Datacenter Lab
  • NSF
  • Google research

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)157
  • Downloads (Last 6 weeks)31
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Resource Management in Aurora ServerlessProceedings of the VLDB Endowment10.14778/3685800.368582517:12(4038-4050)Online publication date: 1-Aug-2024
  • (2024)Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00023(119-124)Online publication date: 6-May-2024
  • (2023)Quantifying Resource Contention of Co-located Workloads with the System-level EntropyACM Transactions on Architecture and Code Optimization10.1145/356369620:1(1-25)Online publication date: 10-Feb-2023
  • (2023)AutoInfer: Self-Driving Management for Resource-Efficient, SLO-Aware Machine=Learning Inference in GPU ClustersIEEE Internet of Things Journal10.1109/JIOT.2022.322338110:7(6271-6285)Online publication date: 1-Apr-2023
  • (2023)Inducing Huge Tail Latency on a MongoDB deployment2023 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E59103.2023.00020(107-112)Online publication date: 25-Sep-2023
  • (2023)Adrias: Interference-Aware Memory Orchestration for Disaggregated Cloud Infrastructures2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070939(855-869)Online publication date: Feb-2023
  • (2023)BALANCER: bandwidth allocation and cache partitioning for multicore processorsThe Journal of Supercomputing10.1007/s11227-023-05070-079:9(10252-10276)Online publication date: 4-Feb-2023
  • (2023)INDICES: Applying DDDAS Principles for Performance Interference-aware Cloud-to-Fog Application MigrationHandbook of Dynamic Data Driven Applications Systems10.1007/978-3-031-27986-7_33(869-902)Online publication date: 31-Mar-2023
  • (2022)SDCBench: A Benchmark Suite for Workload Colocation and Evaluation in DatacentersIntelligent Computing10.34133/2022/98106912022Online publication date: Jan-2022
  • (2022)Less Provisioning: A Hybrid Resource Scaling Engine for Long-Running Services With Tail Latency GuaranteesIEEE Transactions on Cloud Computing10.1109/TCC.2020.301634510:3(1941-1957)Online publication date: 1-Jul-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media