Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Dynamo: facebook's data center-wide power management system

Published: 18 June 2016 Publication History

Abstract

Data center power is a scarce resource that often goes underutilized due to conservative planning. This is because the penalty for overloading the data center power delivery hierarchy and tripping a circuit breaker is very high, potentially causing long service outages. Recently, dynamic server power capping, which limits the amount of power consumed by a server, has been proposed and studied as a way to reduce this penalty, enabling more aggressive utilization of provisioned data center power. However, no real at-scale solution for data center-wide power monitoring and control has been presented in the literature.
In this paper, we describe Dynamo -- a data center-wide power management system that monitors the entire power hierarchy and makes coordinated control decisions to safely and efficiently use provisioned data center power. Dynamo has been developed and deployed across all of Facebook's data centers for the past three years. Our key insight is that in real-world data centers, different power and performance constraints at different levels in the power hierarchy necessitate coordinated data center-wide power management.
We make three main contributions. First, to understand the design space of Dynamo, we provide a characterization of power variation in data centers running a diverse set of modern workloads. This characterization uses fine-grained power samples from tens of thousands of servers and spanning a period of over six months. Second, we present the detailed design of Dynamo. Our design addresses several key issues not addressed by previous simulation-based studies. Third, the proposed techniques and design have been deployed and evaluated in large scale data centers serving billions of users. We present production results showing that Dynamo has prevented 18 potential power outages in the past 6 months due to unexpected power surges; that Dynamo enables optimizations leading to a 13% performance boost for a production Hadoop cluster and a nearly 40% performance increase for a search cluster; and that Dynamo has already enabled an 8% increase in the power capacity utilization of one of our data centers with more aggressive power subscription measures underway.

References

[1]
X. Fan, W.-D. Weber, and L. A. Barroso, "Power Provisioning for a Warehouse-sized Computer," ISCA, 2007.
[2]
W. Turner, J. Seader, and K. Brill, "Tier Classifications Define Site Infrastructure Performance," The Uptime Institute, White Paper, 2006.
[3]
J. Hamilton, "Internet-scale Service Infrastructure Efficiency," ISCA Keynote, 2009.
[4]
P. Ranganathan, P. Leech, D. Irwin, and J. Chase, "Ensemble-level Power Management for Dense Blade Servers," ISCA, 2006.
[5]
"Datacom Equipment Power Trends and Cooling Applications," ASHRAE, http://www.ashrae.org/, 2005.
[6]
S. Gorman, "Power Supply Still a Vexation for the NSA," The Baltimore Sun, 2007.
[7]
X. Fu, X. Wang, and C. Lefurgy, "How Much Power Oversubscription is Safe and Allowed in Data Centers?" ICAC, 2011.
[8]
C. Lefurgy, X. Wang, and M. Ware, "Server-level power control," ICAC, 2007.
[9]
D. Brooks and M. Martonosi, "Dynamic Thermal Management for High-Performance Microprocessors," HPCA, 2001.
[10]
J. Donald and M. Martonosi, "Techniques for Multicore Thermal Management: Classification and New Exploration," ISCA, 2006.
[11]
K. Taylor, "Power Management States: P-States, C-States, and Package C-States," Intel Developer Zone, 2014.
[12]
X. Wang, M. Chen, C. Lefurgy, and T. Keller, "SHIP: A Scalable Hierarchical Power Control Architecture for Large-Scale Data Centers," IEEE Trans. on Parallel and Distributed Systems, 2012.
[13]
R. Raghavendra, P. Ranganathan, V. Talwar, Z. Wang, and X. Zhu, "No "Power" Struggles: Coordinated Multi-level Power Management for the Data Center," ASPLOS, 2008.
[14]
"Open Compute Project," http://www.opencompute.org/.
[15]
S. Muralidhar, W. Lloyd, S. Roy, C. Hill, E. Lin, W. Liu, S. Pan, S. S. hankar, V. Sivakumar, L. Tang, and S. Kumar, "f4: Facebook's Warm BLOB Storage System," OSDI, 2014.
[16]
M. Slee, A. Agarwal, and M. Kwiatkowski, "Thrift: Scalable cross-language services implementation," http://thrift.apache.org/, 2007.
[17]
C. Isci and M. Martonosi, "Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data," MICRO, 2003.
[18]
"Yokogawa WT1600 Unit:," http://www.yokogawa.com/.
[19]
"RAPL for the Romley Platform, White Paper," Intel Documentation Number: 495964, 2012.
[20]
H. David, E. Gorbatov, U. Hanebutte, and R. Khanna, "RAPL: Memory Power Estimation and Capping," ISLPED, 2010.
[21]
"Intel Intelligent Power Node Manager 2.0: External Interface Specification Using IPMI," Intel Documentation Number: 434090, 2012.
[22]
A. A. Bhattacharya, D. Culler, A. Kansal, S. Govindan, and S. Sankar, "The Need for Speed and Stability in Data Center Power Capping," IGCC, 2012.
[23]
N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. Li, M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and V. Venkataramani, "Tao: Facebook's distributed data store for the social graph," USENIX ATC, 2013.
[24]
J. L. Hellerstein, Y. Diao, S. Parekh, and D. M. Tilbury, Feedback Control of Computing Systems. John Wiley & Sons, 2004.
[25]
"Intel Turbo Boost Technology 2.0," Intel Inc., http://www.intel.com/.
[26]
I. Goiri, W. Katsak, K. Le, T. D. Nguyen, and R. Bianchini, "Parasol and GreenSwitch: Managing Datacenters Powered by Renewable Energy," ASPLOS, 2013.
[27]
K. Skadron, T. Abdelzaher, and M. R. Stan, "Control-Theoretic Techniques and Thermal-RC Modeling for Accurate and Localized Dynamic Thermal Management," HPCA, 2001.
[28]
Y. Chen, A. Das, W. Qin, A. Sivasubramaniam, Q. Wang, and N. Gautam, "Managing Server Energy and Operational Costs in Hosting Centers," SIGMETRICS, 2005.
[29]
Q. Wu, V. Reddi, Y. Wu, J. Lee, D. Connors, D. Brooks, M. Martonosi, and D. Clark, "A dynamic compilation framework for controlling microprocessor energy and performance," MICRO, 2005.
[30]
C. Isci, A. Buyuktosunoglu, C.-Y. Cher, P. Bose, and M. Martonosi, "An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget," MICRO, 2006.
[31]
D. Meisner, C. M. Sadler, L. A. Barroso, W.-D. Weber, and T. F. Wenisch, "Power Management of Online Data-Intensive Services," ISCA, 2011.
[32]
Q. Deng, D. Meisner, A. Bhattacharjee, T. F. Wenisch, and R. Bianchini, "CoScale: Coordinating CPU and Memory System DVFS in Server Systems," MICRO, 2012.
[33]
A. Vega, A. Buyuktosunoglu, H. Hanson, P. Bose, and S. Ramani, "Crank It Up or Dial It Down: Coordinated Multiprocessor Frequency and Folding Control," MICRO, 2013.
[34]
J. S. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat, and R. P. Doyle, "Managing Energy and Server Resources in Hosting Centers," SOSP, 2001.

Cited By

View all
  • (2024)Dynamic Idle Resource Leasing To Safely Oversubscribe Capacity At MetaProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698537(792-810)Online publication date: 20-Nov-2024
  • (2024)Can Storage Devices be Power Adaptive?Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665945(47-54)Online publication date: 8-Jul-2024
  • (2024)Expanding Datacenter Capacity with DVFS Boosting: A safe and scalable deployment experienceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624853(150-165)Online publication date: 27-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 44, Issue 3
ISCA'16
June 2016
730 pages
ISSN:0163-5964
DOI:10.1145/3007787
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture
    June 2016
    756 pages
    ISBN:9781467389471
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2016
Published in SIGARCH Volume 44, Issue 3

Check for updates

Author Tags

  1. data center
  2. management
  3. power

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)192
  • Downloads (Last 6 weeks)20
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Dynamic Idle Resource Leasing To Safely Oversubscribe Capacity At MetaProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698537(792-810)Online publication date: 20-Nov-2024
  • (2024)Can Storage Devices be Power Adaptive?Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665945(47-54)Online publication date: 8-Jul-2024
  • (2024)Expanding Datacenter Capacity with DVFS Boosting: A safe and scalable deployment experienceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624853(150-165)Online publication date: 27-Apr-2024
  • (2024)Exploring the Frontiers of Energy Efficiency using Power Management at System ScaleSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00230(1835-1844)Online publication date: 17-Nov-2024
  • (2024)Data Center Power and Energy Management: Past, Present, and FutureIEEE Micro10.1109/MM.2024.342647844:5(30-36)Online publication date: 1-Sep-2024
  • (2024)Impact of power consumption in containerized clouds: A comprehensive analysis of open-source power measurement toolsComputer Networks10.1016/j.comnet.2024.110371245(110371)Online publication date: May-2024
  • (2023)An End-to-End HPC Framework for Dynamic Power ObjectivesProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624262(1801-1811)Online publication date: 12-Nov-2023
  • (2023)Peeling Back the Carbon Curtain: Carbon Optimization Challenges in Cloud ComputingProceedings of the 2nd Workshop on Sustainable Computer Systems10.1145/3604930.3605718(1-7)Online publication date: 9-Jul-2023
  • (2023)HHVM Performance Optimization for Large Scale Web ServicesProceedings of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578244.3583720(137-148)Online publication date: 15-Apr-2023
  • (2023)DDPC: Automated Data-Driven Power-Performance Controller Design on-the-fly for Latency-sensitive Web ServicesProceedings of the ACM Web Conference 202310.1145/3543507.3583437(3067-3076)Online publication date: 30-Apr-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media