Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Clearing the clouds: a study of emerging scale-out workloads on modern hardware

Published: 03 March 2012 Publication History

Abstract

Emerging scale-out workloads require extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the per-operation energy. Continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency to ensure that server hardware closely matches the needs of scale-out workloads.
In this work, we introduce CloudSuite, a benchmark suite of emerging scale-out workloads. We use performance counters on modern servers to study scale-out workloads, finding that today's predominant processor micro-architecture is inefficient for running these workloads. We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core micro-architecture. Moreover, while today's predominant micro-architecture is inefficient when executing scale-out workloads, we find that continuing the current trends will further exacerbate the inefficiency in the future. In this work, we identify the key micro-architectural needs of scale-out workloads, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers.

References

[1]
Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and David A. Wood. DBMSs on a modern processor: where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases, September 1999.
[2]
Alexa, The Web Information Company. http://www.alexa.com/.
[3]
Apache Mahout: scalable machine-learning and data-mining library. http://mahout.apache.org/.
[4]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008.
[5]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, volume 7, November 2006.
[6]
Liviu Ciortea, Cristian Zamfir, Stefan Bucur, Vitaly Chipounov, and George Candea. Cloud9: a software testing service. ACM SIGOPS Operating Systems Review, 43:5--10, January 2010.
[7]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, June 2010.
[8]
John D. Davis, James Laudon, and Kunle Olukotun. Maximizing CMP throughput with mediocre cores. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, September 2005.
[9]
Jeffrey Dean and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, volume 6, November 2004.
[10]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles, October 2007.
[11]
PowerEdge M1000e Blade Enclosure. http://www.dell.com/us/enterprise/p/poweredge-m1000e/pd.aspx.
[12]
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. Dark silicon and the end of multicore scaling. In Proceeding of the 38th Annual International Symposium on Computer Architecture, June 2011.
[13]
EuroCloud Server. http://www.eurocloudserver.com.
[14]
Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E. Smith. A performance counter architecture for computing accurate CPI components. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006.
[15]
Facebook Statistics. https://www.facebook.com/press/info.php?statistics.
[16]
Xiaobo Fan, Wolf-Dietrich Weber, and Luiz André Barroso. Power provisioning for a warehouse-sized computer. In Proceedings of the 34th Annual International Symposium on Computer Architecture, June 2007.
[17]
Google Data Centers. http://www.google.com/intl/en/corporate/datacenter/.
[18]
Zvika Guz, Oved Itzhak, Idit Keidar, Avinoam Kolod, Avi Mendelson, and Uri C. Weiser. Threads vs. caches: modeling the behavior of parallel workloads. In International Conference on Computer Design, October 2010.
[19]
Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture, June 2010.
[20]
Nikos Hardavellas, Michael Ferdman, Anastasia Ailamaki, and Babak Falsafi. Toward Dark Silicon in Servers. In IEEE Micro, 31(4):6--15, July-Aug, 2011.
[21]
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. Reactive NUCA: near-optimal block placement and replication in distributed caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture, June 2009.
[22]
Nikos Hardavellas, Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia Ailamaki, and Babak Falsafi. Database servers on chip multiprocessors: limitations and opportunities. In The 3rd Biennial Conference on Innovative Data Systems Research, January 2007.
[23]
Faban Harness and Benchmark Framework. http://java.net/projects/faban/.
[24]
Mark Horowitz, Elad Alon, Dinesh Patil, Samuel Naffziger, Rajesh Kumar, and Kerry Bernstein. Scaling, power, and the future of CMOS. In Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International, December 2005.
[25]
Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In 26th International Conference on Data Engineering Workshops, March 2010.
[26]
Intel VTune Amplifier XE Performance Profiler. http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/.
[27]
Tejas S. Karkhanis and James E. Smith. A first-order superscalar processor model. In Proceedings of the 31st Annual International Symposium on Computer Architecture, June 2004.
[28]
Kimberly Keeton, David A. Patterson, Yong Qiang He, Roger C. Raphael, and Walter E. Baker. Performance characterization of a quad Pentium Pro SMP using OLTP workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998.
[29]
Taeho Kgil, Shaun D'Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Trevor Mudge, Steven Reinhardt, and Krisztian Flautner. PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006.
[30]
Christos Kozyrakis, Aman Kansal, Sriram Sankar, and Kushagra Vaid. Server engineering insights for large-scale online services. IEEE Micro, 30(4):8--19, July-Aug, 2010.
[31]
Ang Li, Xiaowei Yang, Srikanth Kandula, and Ming Zhang. CloudCmp: comparing public cloud providers. In Proceedings of the 10th Annual Conference on Internet Measurement, November 2010.
[32]
Ang Li, Xiaowei Yang, Srikanth Kandula, and Ming Zhang. CloudCMP: Shopping for a cloud made easy. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, June 2010.
[33]
Kevin Lim, Parthasarathy Ranganathan, Jichuan Chang, Chandrakant Patel, Trevor Mudge, and Steven Reinhardt. Understanding and designing new server architectures for emerging warehouse-computing environments. In Proceedings of the 35th Annual International Symposium on Computer Architecture, June 2008.
[34]
Jack L. Lo, Luiz André Barroso, Susan J. Eggers, Kourosh Gharachorloo, Henry M. Levy, and Sujay S. Parekh. An analysis of database workload performance on simultaneous multithreaded processors. In Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998.
[35]
Open Compute Project. http://opencompute.org/.
[36]
Parthasarathy Ranganathan, Kourosh Gharachorloo, Sarita V. Adve, and Luiz André Barroso. Performance of database workloads on shared-memory systems with out-of-order processors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1998.
[37]
Parthasarathy Ranganathan and Norman Jouppi. Enterprise IT trends and implications for architecture research. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture, February 2005.
[38]
Vijay Janapa Reddi, Benjamin C. Lee, Trishul Chilimbi, and Kushagra Vaid. Web search using mobile cores: quantifying and mitigating the price of efficiency. In Proceedings of the 37th Annual International Symposium on Computer Architecture, June 2010.
[39]
SeaMicro Packs 768 Cores Into its Atom Server. http://www.datacenterknowledge.com/archives/2011/07/18/seamicro-packs-768-cores-into-its-atom-server/.
[40]
Will Sobel, Shanti Subramanyam, Akara Sucharitakul, Jimmy Nguyen, Hubert Wong, Arthur Klepchukov, Sheetal Patil, Armando Fox, and David Patterson. Cloudstone: multi-platform, multi-language benchmark and measurement tools for web 2.0. In the 1st Workshop on Cloud Computing and Its Applications, October 2008.
[41]
Vijayaraghavan Soundararajan and Jennifer M. Anderson. The impact of management operations on the virtualized datacenter. In Proceedings of the 37th Annual International Symposium on Computer Architecture, June 2010.
[42]
Lingjia Tang, Jason Mars, Veil Vachharajani, Robert Hundt, and Mary Lou Soffa. The impact of memory subsystem resource sharing on datacenter applications. In Proceeding of the 38th Annual International Symposium on Computer Architecture, June 2011.
[43]
The Apache Cassandra Project. http://cassandra.apache.org/.
[44]
TPC Transaction Processing Performance Council. http://www.tpc.org/default.asp.
[45]
Nathan Tuck and Dean M. Tullsen. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, September 2003.
[46]
Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael B. Taylor. Conservation cores: reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, March 2010.
[47]
Thomas F. Wenisch, Roland E. Wunderlich, Michael Ferdman, Anastassia Ailamaki, Babak Falsafi, and James C. Hoe. Simflex: Statistical sampling of computer system simulation. IEEE Micro, 26:18--31, July 2006.

Cited By

View all
  • (2024)BTIP: Branch Triggered Instruction Prefetcher Ensuring TimelinessElectronics10.3390/electronics1321432313:21(4323)Online publication date: 4-Nov-2024
  • (2024)The Case of Unsustainable CPU AffinityACM SIGEnergy Energy Informatics Review10.1145/3698365.36983714:3(32-38)Online publication date: 1-Jul-2024
  • (2024)CoolDC: A Cost-Effective Immersion-Cooled Datacenter with Workload-Aware Temperature ScalingACM Transactions on Architecture and Code Optimization10.1145/366492521:3(1-27)Online publication date: 14-May-2024
  • Show More Cited By

Index Terms

  1. Clearing the clouds: a study of emerging scale-out workloads on modern hardware

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 47, Issue 4
    ASPLOS '12
    April 2012
    453 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2248487
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
      March 2012
      476 pages
      ISBN:9781450307598
      DOI:10.1145/2150976
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 March 2012
    Published in SIGPLAN Volume 47, Issue 4

    Check for updates

    Author Tags

    1. architectural evaluation
    2. cloud computing
    3. design insights
    4. workload characterization

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)446
    • Downloads (Last 6 weeks)51
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)BTIP: Branch Triggered Instruction Prefetcher Ensuring TimelinessElectronics10.3390/electronics1321432313:21(4323)Online publication date: 4-Nov-2024
    • (2024)The Case of Unsustainable CPU AffinityACM SIGEnergy Energy Informatics Review10.1145/3698365.36983714:3(32-38)Online publication date: 1-Jul-2024
    • (2024)CoolDC: A Cost-Effective Immersion-Cooled Datacenter with Workload-Aware Temperature ScalingACM Transactions on Architecture and Code Optimization10.1145/366492521:3(1-27)Online publication date: 14-May-2024
    • (2024)Characterizing In-Kernel Observability of Latency-Sensitive Request-Level Metrics with eBPF2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00013(24-35)Online publication date: 5-May-2024
    • (2024)Compiler-Directed Whole-System Persistence2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00074(961-977)Online publication date: 29-Jun-2024
    • (2024)Counter-light Memory Encryption2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00058(724-738)Online publication date: 29-Jun-2024
    • (2024)AVM-BTB: Adaptive and Virtualized Multi-level Branch Target Buffer2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00012(17-31)Online publication date: 29-Jun-2024
    • (2024)START: Scalable Tracking for any Rowhammer Threshold2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00049(578-592)Online publication date: 2-Mar-2024
    • (2024)Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00030(280-296)Online publication date: 2-Mar-2024
    • (2024)Simultaneous Many-Row Activation in Off-the-Shelf DRAM Chips: Experimental Characterization and Analysis2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58291.2024.00024(99-114)Online publication date: 24-Jun-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media