Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Hadoop Superlinear Scalability: The perpetual motion of parallel performance

Published: 08 May 2015 Publication History
  • Get Citation Alerts
  • Abstract

    We often see more than 100 percent speedup efficiency! came the rejoinder to the innocent reminder that you can’t have more than 100 percent of anything. But this was just the first volley from software engineers during a presentation on how to quantify computer system scalability in terms of the speedup metric. In different venues, on subsequent occasions, that retort seemed to grow into a veritable chorus that not only was superlinear speedup commonly observed, but also the model used to quantify scalability for the past 20 years failed when applied to superlinear speedup data.

    References

    [1]
    Apache Whirr; https://whirr.apache.org.
    [2]
    Calvert, C., Kulkarni, D. 2009. Essential LINQ. Boston, MA: Pearson Education Inc.
    [3]
    Cloudera Hadoop; http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html.
    [4]
    Eijkhout, V. 2014. Introduction to high-performance scientific computing. Lulu.com.
    [5]
    Feynman, R. P. The Papp perpetual motion engine; http://hoaxes.org/comments/papparticle2.html.
    [6]
    Gunther, N. J. 1993. A simple capacity model of massively parallel transaction systems. In Proceedings of International Computer Measurement Group Conference; http://www.perfdynamics.com/Papers/njgCMG93.pdf.
    [7]
    Gunther, N. J. 2001. Performance and scalability models for a hypergrowth e-commerce Web site. In Performance Engineering, State of the Art and Current Trends. (Eds.) Dumke, R. R., Rautenstrauch, C., Schmietendorf, A., Scholz, A. Lecture Notes in Computer Science 2047: 267-282. Springer-Verlag.
    [8]
    Gunther, N. J. 2007. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer; http://www.springer.com/computer/communication+networks/book/978-3-540-26138-4.
    [9]
    Gunther, N. J. 2008. A general theory of computational scalability based on rational functions; http://arxiv.org/abs/0808.1431.
    [10]
    Gunther, N. J. 2012. PostgreSQL scalability analysis deconstructed; http://perfdynamics.blogspot.com/2012/04/postgresql-scalability-analysis.html.
    [11]
    Gunther, N. J., Subramanyam, S., Parvu, S. 2010. Hidden scalability gotchas in Memcached and friends. VELOCITY Web Performance and Operations Conference; http://velocityconf.com/velocity2010/public/schedule/detail/13046.
    [12]
    Haas, R. 2011. Scalability, in graphical form, analyzed; http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html.
    [13]
    Hadoop Log Tools; https://github.com/melrief/Hadoop-Log-Tools.
    [14]
    Hennessy, J. L., Patterson, D. A. 1996. Computer Architecture: A Quantitative Approach. Second edition. Waltham, MA: Morgan Kaufmann.
    [15]
    Hunt, P., Konar, M., Junqueira, F. P., Reed, B. 2010. ZooKeeper: Wait-free coordination for Internet-scale systems. In Proceedings of the Usenix Annual Technical Conference; https://www.usenix.org/legacy/event/usenix10/tech/full_papers/Hunt.pdf.
    [16]
    O'Malley, O. 2008. TeraByte Sort on Apache Hadoop; http://sortbenchmark.org/YahooHadoop.pdf.
    [17]
    O'Malley, O., Murthy, A. C. 2009. Winning a 60 second dash with a yellow elephant; http://sortbenchmark.org/Yahoo2009.pdf.
    [18]
    Parvu, S. 2012. Private communication.
    [19]
    Performance Dynamics Company. 2014. How to quantify scalability (including calculator tools); http://www.perfdynamics.com/Manifesto/USLscalability.html.
    [20]
    Schwartz, B. 2011. Is VoltDB really as scalable as they claim? Percona MySQL Performance Blog; http://www.percona.com/blog/2011/02/28/is-voltdb-really-as-scalable-as-they-claim/.
    [21]
    sFlow. 2010. SDN analytics and control using sFlow standard Superlinear; http://blog.sflow.com/2010/09/superlinear.html.
    [22]
    Stackoverflow. Where does superlinear speedup come from?; http://stackoverflow.com/questions/4332967/where-does-super-linear-speedup-come-from.
    [23]
    Sun Fire X2270 M2 super-linear scaling of Hadoop TeraSort and CloudBurst benchmarks. 2010; https://blogs.oracle.com/BestPerf/entry/20090920_x2270m2_hadoop.
    [24]
    Sutter, H. 2008. Going superlinear. Dr. Dobb's Journal 33(3); http://www.drdobbs.com/cpp/going-superlinear/206100542.
    [25]
    Sutter, H. 2008. Super linearity and the bigger machine. Dr. Dobb's Journal 33(4); http://www.drdobbs.com/parallel/super-linearity-and-the-bigger-machine/206903306.
    [26]
    TechCrunch. 2015. AuroraTek tried to pitch us a gadget that breaks the laws of physics at CES; http://techcrunch.com/2015/01/08/auroratek-tried-to-pitch-us-a-gadget-that-breaks-the-laws-of-physics-at-ces/.
    [27]
    White, T. 2012. Hadoop: The Definitive Guide. Storage and Analysis at Internet Scale, 3rd edition. O'Reilly Media, Inc.
    [28]
    Yahoo! Hadoop Tutorial; https://developer.yahoo.com/hadoop/tutorial/module1.html#scalability.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Queue
    Queue  Volume 13, Issue 5
    Testing
    May 2015
    34 pages
    ISSN:1542-7730
    EISSN:1542-7749
    DOI:10.1145/2773212
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 May 2015
    Published in QUEUE Volume 13, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Popular
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3,346
    • Downloads (Last 6 weeks)236
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SimCost: cost-effective resource provision prediction and recommendation for spark workloadsDistributed and Parallel Databases10.1007/s10619-023-07436-y42:1(73-102)Online publication date: 1-Mar-2024
    • (2022)Scalability in Computing and RoboticsIEEE Transactions on Computers10.1109/TC.2021.308904471:6(1453-1465)Online publication date: 1-Jun-2022
    • (2022)GeoRep—Resilient Storage for Wide Area NetworksIEEE Access10.1109/ACCESS.2022.319168610(75772-75788)Online publication date: 2022
    • (2022)Scalability and performance analysis of BDPS in cloudsComputing10.1007/s00607-022-01056-7104:6(1425-1460)Online publication date: 1-Jun-2022
    • (2020)PARMA-CCProceedings of the 21st International Conference on Distributed Computing and Networking10.1145/3369740.3369785(1-10)Online publication date: 4-Jan-2020
    • (2020)Logram: Efficient Log Parsing Using n-Gram DictionariesIEEE Transactions on Software Engineering10.1109/TSE.2020.3007554(1-1)Online publication date: 2020
    • (2020)Guerrilla Performance Analysis for Robot Swarms: Degrees of Collaboration and Chains of Interference EventsSwarm Intelligence10.1007/978-3-030-60376-2_11(134-147)Online publication date: 23-Oct-2020
    • (2020)Cloud Allocation and Consolidation Based on a Scalability MetricAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-60248-2_26(381-395)Online publication date: 2-Oct-2020
    • (2019)Constructing living buildings: a review of relevant technologies for a novel application of biohybrid roboticsJournal of The Royal Society Interface10.1098/rsif.2019.023816:156(20190238)Online publication date: 31-Jul-2019
    • (2019)SOVAS: a scalable online visual analytic system for big climate data analysisInternational Journal of Geographical Information Science10.1080/13658816.2019.160507334:6(1188-1209)Online publication date: 22-Apr-2019
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Magazine Site

    View this article on the magazine site (external)

    Magazine Site

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media