Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2749469.2749475acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article
Open access

Heracles: improving resource efficiency at scale

Published: 13 June 2015 Publication History
  • Get Citation Alerts
  • Abstract

    User-facing, latency-sensitive services, such as websearch, underutilize their computing resources during daily periods of low traffic. Reusing those resources for other tasks is rarely done in production services since the contention for shared resources can cause latency spikes that violate the service-level objectives of latency-sensitive tasks. The resulting under-utilization hurts both the affordability and energy-efficiency of large-scale datacenters. With technology scaling slowing down, it becomes important to address this opportunity.
    We present Heracles, a feedback-based controller that enables the safe colocation of best-effort tasks alongside a latency-critical service. Heracles dynamically manages multiple hardware and software isolation mechanisms, such as CPU, memory, and network isolation, to ensure that the latency-sensitive job meets latency targets while maximizing the resources given to best-effort tasks. We evaluate Heracles using production latency-critical and batch workloads from Google and demonstrate average server utilizations of 90% without latency violations across all the load and colocation scenarios that we evaluated.

    References

    [1]
    "Iperf - The TCP/UDP Bandwidth Measurement Tool," https://iperf.fr/.
    [2]
    "memcached," http://memcached.org/.
    [3]
    "Intel® 64 and IA-32 Architectures Software Developer's Manual," vol. 3B: System Programming Guide, Part 2, Sep 2014.
    [4]
    Mohammad Al-Fares et al., "A Scalable, Commodity Data Center Network Architecture," in Proc. of the ACM SIGCOMM 2008 Conference on Data Communication, ser. SIGCOMM '08. New York, NY: ACM, 2008.
    [5]
    Mohammad Alizadeh et al., "Data Center TCP (DCTCP)," in Proc. of the ACM SIGCOMM 2010 Conference, ser. SIGCOMM '10. New York, NY: ACM, 2010.
    [6]
    Luiz Barroso et al., "The Case for Energy-Proportional Computing," Computer, vol. 40, no. 12, Dec. 2007.
    [7]
    Luiz André Barroso et al., The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2nd ed. Morgan & Claypool Publishers, 2013.
    [8]
    Adam Belay et al., "IX: A Protected Dataplane Operating System for High Throughput and Low Latency," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). Broomfield, CO: USENIX Association, Oct. 2014.
    [9]
    Sergey Blagodurov et al., "A Case for NUMA-aware Contention Management on Multicore Systems," in Proc. of the 2011 USENIX Conference on USENIX Annual Technical Conference, ser. USENIXATC'11. Berkeley, CA: USENIX Association, 2011.
    [10]
    Eric Boutin et al., "Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing," in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). Broomfield, CO: USENIX Association, 2014.
    [11]
    Bob Briscoe, "Flow Rate Fairness: Dismantling a Religion," SIGCOMM Comput. Commun. Rev., vol. 37, no. 2, Mar. 2007.
    [12]
    Martin A. Brown, "Traffic Control HOWTO," http://linux-ip.net/articles/Traffic-Control-HOWTO/.
    [13]
    Marcus Carvalho et al., "Long-term SLOs for Reclaimed Cloud Computing Resources," in Proc. of SOCC, Seattle, WA, Dec. 2014.
    [14]
    McKinsey & Company, "Revolutionizing data center efficiency," Uptime Institute Symp., 2008.
    [15]
    Henry Cook et al., "A Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-efficiency While Preserving Responsiveness," in Proc. of the 40th Annual International Symposium on Computer Architecture, ser. ISCA '13. New York, NY: ACM, 2013.
    [16]
    Carlo Curino et al., "Reservation-based Scheduling: If You're Late Don't Blame Us!" in Proc. of the 5th annual Symposium on Cloud Computing, 2014.
    [17]
    Jeffrey Dean et al. "The tail at scale," Commun. ACM, vol. 56, no. 2, Feb. 2013.
    [18]
    Christina Delimitrou et al. "Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters," in Proc. of the 18th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Houston, TX, 2013.
    [19]
    Christina Delimitrou et al. "Quasar: Resource-Efficient and QoS-Aware Cluster Management," in Proc. of the Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Salt Lake City, UT, 2014.
    [20]
    Eiman Ebrahimi et al. "Fairness via Source Throttling: A Configurable and High-performance Fairness Substrate for Multi-core Memory Systems," in Proc. of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS XV. New York, NY: ACM, 2010.
    [21]
    H. Esmaeilzadeh et al. "Dark silicon and the end of multicore scaling," in Computer Architecture (ISCA), 2011 38th Annual International Symposium on, June 2011.
    [22]
    Sriram Govindan et al. "Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines," in Proc. of the 2nd ACM Symposium on Cloud Computing, 2011.
    [23]
    Fei Guo et al. "From Chaos to QoS: Case Studies in CMP Resource Management," SIGARCH Comput. Archit. News, vol. 35, no. 1, Mar. 2007.
    [24]
    Fei Guo et al. "A Framework for Providing Quality of Service in Chip Multi-Processors," in Proc. of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 40. Washington, DC: IEEE Computer Society, 2007.
    [25]
    Nikos Hardavellas et al. "Toward Dark Silicon in Servers," IEEE Micro, vol. 31, no. 4, 2011.
    [26]
    Lisa R. Hsu et al. "Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches As a Shared Resource," in Proc. of the 15th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '06. New York, NY: ACM, 2006.
    [27]
    Intel, "Serial ATA II Native Command Queuing Overview," http://download.intel.com/support/chipsets/imsm/sb/sata2_ncq_overview.pdf, 2003.
    [28]
    Teerawat Issariyakul et al. Introduction to Network Simulator NS2, 1st ed. Springer Publishing Company, Incorporated, 2010.
    [29]
    Ravi Iyer, "CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms," in Proc. of the 18th Annual International Conference on Supercomputing, ser. ICS '04. New York, NY: ACM, 2004.
    [30]
    Ravi Iyer et al. "QoS Policies and Architecture for Cache/Memory in CMP Platforms," in Proc. of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS '07. New York, NY: ACM, 2007.
    [31]
    Vijay Janapa Reddi et al. "Web Search Using Mobile Cores: Quantifying and Mitigating the Price of Efficiency," SIGARCH Comput. Archit. News, vol. 38, no. 3, Jun. 2010.
    [32]
    Min Kyu Jeong et al. "A QoS-aware Memory Controller for Dynamically Balancing GPU and CPU Bandwidth Use in an MPSoC," in Proc. of the 49th Annual Design Automation Conference, ser. DAC '12. New York, NY: ACM, 2012.
    [33]
    Vimalkumar Jeyakumar et al. "EyeQ: Practical Network Performance Isolation at the Edge," in Proc. of the 10th USENIX Conference on Networked Systems Design and Implementation, ser. nsdi'13. Berkeley, CA: USENIX Association, 2013.
    [34]
    Svilen Kanev et al. "Tradeoffs between Power Management and Tail Latency in Warehouse-Scale Applications," in IISWC, 2014.
    [35]
    Rishi Kapoor et al. "Chronos: Predictable Low Latency for Data Center Applications," in Proc. of the Third ACM Symposium on Cloud Computing, ser. SoCC '12. New York, NY: ACM, 2012.
    [36]
    Harshad Kasture et al. "Ubik: Efficient Cache Sharing with Strict QoS for Latency-Critical Workloads," in Proc. of the 19th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIX), March 2014.
    [37]
    Wonyoung Kim et al. "System level analysis of fast, per-core DVFS using on-chip switching regulators," in High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on, Feb 2008.
    [38]
    Quoc Le et al. "Building high-level features using large scale unsupervised learning," in International Conference in Machine Learning, 2012.
    [39]
    Jacob Leverich et al. "Reconciling High Server Utilization and Sub-millisecond Quality-of-Service," in SIGOPS European Conf. on Computer Systems (EuroSys), 2014.
    [40]
    Bin Li et al. "CoQoS: Coordinating QoS-aware Shared Resources in NoC-based SoCs," J. Parallel Distrib. Comput., vol. 71, no. 5, May 2011.
    [41]
    Kevin Lim et al. "Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached," in Proc. of the 40th Annual International Symposium on Computer Architecture, 2013.
    [42]
    Kevin Lim et al. "System-level Implications of Disaggregated Memory," in Proc. of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture, ser. HPCA '12. Washington, DC: IEEE Computer Society, 2012.
    [43]
    Jiang Lin et al. "Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems," in High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on, Feb 2008.
    [44]
    Huan Liu, "A Measurement Study of Server Utilization in Public Clouds," in Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth Intl. Conf. on, 2011.
    [45]
    Rose Liu et al. "Tessellation: Space-time Partitioning in a Manycore Client OS," in Proc. of the First USENIX Conference on Hot Topics in Parallelism, ser. HotPar'09. Berkeley, CA: USENIX Association, 2009.
    [46]
    Yanpei Liu et al. "SleepScale: Runtime Joint Speed Scaling and Sleep States Management for Power Efficient Data Centers," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ: IEEE Press, 2014.
    [47]
    David Lo et al. "Towards Energy Proportionality for Large-scale Latency-critical Workloads," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ: IEEE Press, 2014.
    [48]
    Krishna T. Malladi et al. "Towards Energy-proportional Datacenter Memory with Mobile DRAM," SIGARCH Comput. Archit. News, vol. 40, no. 3, Jun. 2012.
    [49]
    R Manikantan et al. "Probabilistic Shared Cache Management (PriSM)," in Proc. of the 39th Annual International Symposium on Computer Architecture, ser. ISCA '12. Washington, DC: IEEE Computer Society, 2012.
    [50]
    J. Mars et al. "Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up," Micro, IEEE, vol. 32, no. 3, May 2012.
    [51]
    Jason Mars et al. "Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations," in Proc. of the 44th Annual IEEE/ACM Intl. Symp. on Microarchitecture, ser. MICRO-44 '11, 2011.
    [52]
    Paul Marshall et al. "Improving Utilization of Infrastructure Clouds," in Proc. of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2011.
    [53]
    David Meisner et al. "PowerNap: Eliminating Server Idle Power," in Proc. of the 14th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS XIV, 2009.
    [54]
    David Meisner et al. "Power Management of Online Data-Intensive Services," in Proc. of the 38th ACM Intl. Symp. on Computer Architecture, 2011.
    [55]
    Paul Menage, "CGROUPS," https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt.
    [56]
    Sai Prashanth Muralidhara et al. "Reducing Memory Interference in Multicore Systems via Application-aware Memory Channel Partitioning," in Proc. of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-44. New York, NY: ACM, 2011.
    [57]
    Vijay Nagarajan et al. "ECMon: Exposing Cache Events for Monitoring," in Proc. of the 36th Annual International Symposium on Computer Architecture, ser. ISCA '09. New York, NY: ACM, 2009.
    [58]
    R. Nathuji et al. "Q-Clouds: Managing Performance Interference Effects for QoS-Aware Clouds," in Proc. of EuroSys, France, 2010.
    [59]
    K. J. Nesbit et al. "Fair Queuing Memory Systems," in Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec 2006.
    [60]
    Dejan Novakovic et al. "DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments," in Proc. of the USENIX Annual Technical Conference (ATC'13), San Jose, CA, 2013.
    [61]
    W. Pattara-Aukom et al. "Starvation prevention and quality of service in wireless LANs," in Wireless Personal Multimedia Communications, 2002. The 5th International Symposium on, vol. 3, Oct 2002.
    [62]
    M. Podlesny et al. "Solving the TCP-Incast Problem with Application-Level Scheduling," in Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2012 IEEE 20th International Symposium on, Aug 2012.
    [63]
    Andrew Putnam et al. "A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ: IEEE Press, 2014.
    [64]
    M. K. Qureshi et al. "Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches," in Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec 2006.
    [65]
    Parthasarathy Ranganathan et al. "Reconfigurable Caches and Their Application to Media Processing," in Proc. of the 27th Annual International Symposium on Computer Architecture, ser. ISCA '00. New York, NY: ACM, 2000.
    [66]
    Charles Reiss et al. "Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis," in ACM Symp. on Cloud Computing (SoCC), Oct. 2012.
    [67]
    Chuck Rosenberg, "Improving Photo Search: A Step Across the Semantic Gap," http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.
    [68]
    Daniel Sanchez et al. "Vantage: Scalable and Efficient Fine-grain Cache Partitioning," SIGARCH Comput. Archit. News, vol. 39, no. 3, Jun. 2011.
    [69]
    Yoon Jae Seong et al. "Hydra: A Block-Mapped Parallel Flash Memory Solid-State Disk Architecture," Computers, IEEE Transactions on, vol. 59, no. 7, July 2010.
    [70]
    Akbar Sharifi et al. "METE: Meeting End-to-end QoS in Multicores Through System-wide Resource Management," in Proc. of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS '11. New York, NY: ACM, 2011.
    [71]
    Shekhar Srikantaiah et al. "SHARP Control: Controlled Shared Cache Management in Chip Multiprocessors," in Proc. of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 42. New York, NY: ACM, 2009.
    [72]
    Shingo Tanaka et al. "High Performance Hardware-Accelerated Flash Key-Value Store," in The 2014 Non-volatile Memories Workshop (NVMW), 2014.
    [73]
    Lingjia Tang et al. "The impact of memory subsystem resource sharing on datacenter applications," in Computer Architecture (ISCA), 2011 38th Annual International Symposium on, June 2011.
    [74]
    Arunchandar Vasan et al. "Worth their watts? - an empirical study of datacenter servers," in Intl. Symp. on High-Performance Computer Architecture, 2010.
    [75]
    Nedeljko Vasić et al. "DejaVu: accelerating resource allocation in virtualized environments," in Proc. of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), London, UK, 2012.
    [76]
    Christo Wilson et al. "Better Never Than Late: Meeting Deadlines in Datacenter Networks," in Proc. of the ACM SIGCOMM 2011 Conference, ser. SIGCOMM '11. New York, NY: ACM, 2011.
    [77]
    Carole-Jean Wu et al. "A Comparison of Capacity Management Schemes for Shared CMP Caches," in Proc. of the 7th Workshop on Duplicating, Deconstructing, and Debunking, vol. 15. Citeseer, 2008.
    [78]
    Yuejian Xie et al. "PIPP: Promotion/Insertion Pseudo-partitioning of Multi-core Shared Caches," in Proc. of the 36th Annual International Symposium on Computer Architecture, ser. ISCA '09. New York, NY: ACM, 2009.
    [79]
    Hailong Yang et al. "Bubble-flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computers," in Proc. of the 40th Annual Intl. Symp. on Computer Architecture, ser. ISCA '13, 2013.
    [80]
    Xiao Zhang et al. "CPI2: CPU performance isolation for shared compute clusters," in Proc. of the 8th ACM European Conference on Computer Systems (EuroSys), Prague, Czech Republic, 2013.
    [81]
    Yunqi Zhang et al. "SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers," in International Symposium on Microarchitecture (MICRO), 2014.

    Cited By

    View all
    • (2024)Near-Optimal Stochastic Bin-Packing in Large Service Systems with Time-Varying Item SizesACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365507052:1(93-94)Online publication date: 13-Jun-2024
    • (2024)Near-Optimal Stochastic Bin-Packing in Large Service Systems with Time-Varying Item SizesAbstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3652963.3655070(93-94)Online publication date: 10-Jun-2024
    • (2024)Architectural Support for Sharing, Isolating and Virtualizing FPGA ResourcesACM Transactions on Architecture and Code Optimization10.1145/364847521:2(1-26)Online publication date: 21-May-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
    June 2015
    768 pages
    ISBN:9781450334020
    DOI:10.1145/2749469
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    ISCA '15
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)745
    • Downloads (Last 6 weeks)68
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Near-Optimal Stochastic Bin-Packing in Large Service Systems with Time-Varying Item SizesACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365507052:1(93-94)Online publication date: 13-Jun-2024
    • (2024)Near-Optimal Stochastic Bin-Packing in Large Service Systems with Time-Varying Item SizesAbstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3652963.3655070(93-94)Online publication date: 10-Jun-2024
    • (2024)Architectural Support for Sharing, Isolating and Virtualizing FPGA ResourcesACM Transactions on Architecture and Code Optimization10.1145/364847521:2(1-26)Online publication date: 21-May-2024
    • (2024)Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference WorkloadsProceedings of the 15th ACM International Conference on Future and Sustainable Energy Systems10.1145/3632775.3662830(506-513)Online publication date: 4-Jun-2024
    • (2024)Characterizing a Memory Allocator at Warehouse ScaleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651350(192-206)Online publication date: 27-Apr-2024
    • (2024)GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory StitchingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640423(450-466)Online publication date: 27-Apr-2024
    • (2024)Expanding Datacenter Capacity with DVFS Boosting: A safe and scalable deployment experienceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624853(150-165)Online publication date: 27-Apr-2024
    • (2024)Adaptive QoS-Aware Microservice Deployment With Excessive Loads via Intra- and Inter-Datacenter SchedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342593135:9(1565-1582)Online publication date: Oct-2024
    • (2024)Suppressing the Interference Within a Datacenter: Theorems, Metric and StrategyIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.335441835:5(732-750)Online publication date: May-2024
    • (2024)SonnetFuture Generation Computer Systems10.1016/j.future.2023.11.019153:C(169-181)Online publication date: 16-May-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media