Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2465351.2465370acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

MeT: workload aware elasticity for NoSQL

Published: 15 April 2013 Publication History
  • Get Citation Alerts
  • Abstract

    NoSQL databases manage the bulk of data produced by modern Web applications such as social networks. This stems from their ability to partition and spread data to all available nodes, allowing NoSQL systems to scale. Unfortunately, current solutions' scale out is oblivious to the underlying data access patterns, resulting in both highly skewed load across nodes and suboptimal node configurations.
    In this paper, we first show that judicious placement of HBase partitions taking into account data access patterns can improve overall throughput by 35%. Next, we go beyond current state of the art elastic systems limited to uninformed replica addition and removal by: i) reconfiguring existing replicas according to access patterns and ii) adding replicas specifically configured to the expected access pattern.
    MeT is a prototype for a Cloud-enabled framework that can be used alone or in conjunction with OpenStack for the automatic and heterogeneous reconfiguration of a HBase deployment. Our evaluation, conducted using the YCSB workload generator and a TPC-C workload, shows that MeT is able to i) autonomously achieve the performance of a manual configured cluster and ii) quickly reconfigure the cluster according to unpredicted workload changes.

    References

    [1]
    Amazon.com. Amazon CloudWatch. http://aws.amazon.com/cloudwatch/.
    [2]
    Amazon.com. Auto Scaling. http://aws.amazon.com/autoscaling/.
    [3]
    Apache. Hadoop: Hadoop. http://hadoop.apache.org/ (january 2011).
    [4]
    M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. A view of cloud computing. Communications of the ACM, 2010.
    [5]
    R. G. Brown. Smoothing, forecasting and prediction of discrete time series. Prentice-Hall, 1963.
    [6]
    F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. In OSDI, 2006.
    [7]
    S. Chen, A. Ailamaki, M. Athanassoulis, P. B. Gibbons, R. Johnson, I. Pandis, and R. Stoica. TPC-E vs. TPC-C: characterizing the new TPC-E benchmark via an I/O comparison study. ACM SIGMOD, 2010.
    [8]
    B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In SoCC, 2010.
    [9]
    C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: a workload-driven approach to database replication and partitioning. In VLDB, 2010.
    [10]
    Barthélémy Dagenais. Py4J - A Bridge between Python and Java. http://py4j.sourceforge.net/.
    [11]
    L. George. HBase: The Definitive Guide. O'Reilly Media, 2011.
    [12]
    R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 1969.
    [13]
    I. Konstantinou, E. Angelou, C. Boumpouka, D. Tsoumakos, and N. Koziris. On the elasticity of NoSQL databases over cloud management platforms. In CIKM, 2011.
    [14]
    J. K. Lenstra, A. H. G. Rinnooy Kan, and P. Brucker. Complexity of machine scheduling problems. In Studies in integer programming (Proc. Workshop, Bonn, 1975). North-Holland, 1977.
    [15]
    H. C. Lim, S Babu, and J. S. Chase. Automated control for elastic storage. IEEE/ACM ICAC, 2010.
    [16]
    M. L. Massie, B. N. Chun, and D. E. Culler. The Ganglia Distributed Monitoring System: Design, Implementation And Experience. Parallel Computing, 2003.
    [17]
    Openstack blog entry: 'openstack foundation update'. http://www.openstack.org/blog/2012/04/openstack-foundation-update/. {Online; last accessed July-2012}.
    [18]
    D. Owens. Securing elasticity in the cloud. Communications of the ACM, 2010.
    [19]
    A. Pavlo, C. Curino, and S. Zdonik. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In ACM SIGMOD, 2012.
    [20]
    A. A. Soror, U. F. Minhas, A Aboulnaga, K Salem, P. Kokosielis, and S. Kamath. Automatic virtual machine configuration for database workloads. ACM SIGMOD, 2008.
    [21]
    M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era: (it's time for a complete rewrite). In VLDB, 2007.
    [22]
    A. L. Tatarowicz, C. Curino, E. P. C. Jones, and S. Madden. Lookup Tables: Fine-Grained Partitioning for Distributed Databases. In IEEE ICDE, 2012.
    [23]
    B Trushkowsky, P. Bodík, A Fox, M. J. Franklin, M. I. Jordan, and D. A. Patterson. The SCADS director: scaling a distributed storage system under stringent performance requirements. FAST, 2011.
    [24]
    L. M. Vaquero, L. Rodero-Merino, and R. Buyya. Dynamically scaling applications in the cloud. ACM SIGCOMM, 2011.
    [25]
    R. Vilaça, F. Cruz, and R. Oliveira. On the expressiveness and trade-offs of large scale tuple stores. In OTM, 2010.

    Cited By

    View all
    • (2023)InfiniStore: Elastic Serverless Cloud StorageProceedings of the VLDB Endowment10.14778/3587136.358713916:7(1629-1642)Online publication date: 1-Mar-2023
    • (2020)Automated Configuration of NoSQL Performance and Scalability Tactics for Data-Intensive ApplicationsInformatics10.3390/informatics70300297:3(29)Online publication date: 8-Aug-2020
    • (2020)Cost-Effective Data Feeds to Blockchains via Workload-Adaptive Data ReplicationProceedings of the 21st International Middleware Conference10.1145/3423211.3425696(371-385)Online publication date: 7-Dec-2020
    • Show More Cited By

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems
    April 2013
    401 pages
    ISBN:9781450319942
    DOI:10.1145/2465351
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 April 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    EuroSys '13
    Sponsor:
    EuroSys '13: Eighth Eurosys Conference 2013
    April 15 - 17, 2013
    Prague, Czech Republic

    Acceptance Rates

    EuroSys '13 Paper Acceptance Rate 28 of 143 submissions, 20%;
    Overall Acceptance Rate 241 of 1,308 submissions, 18%

    Upcoming Conference

    EuroSys '25
    Twentieth European Conference on Computer Systems
    March 30 - April 3, 2025
    Rotterdam , Netherlands

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)InfiniStore: Elastic Serverless Cloud StorageProceedings of the VLDB Endowment10.14778/3587136.358713916:7(1629-1642)Online publication date: 1-Mar-2023
    • (2020)Automated Configuration of NoSQL Performance and Scalability Tactics for Data-Intensive ApplicationsInformatics10.3390/informatics70300297:3(29)Online publication date: 8-Aug-2020
    • (2020)Cost-Effective Data Feeds to Blockchains via Workload-Adaptive Data ReplicationProceedings of the 21st International Middleware Conference10.1145/3423211.3425696(371-385)Online publication date: 7-Dec-2020
    • (2020)Dynamic Load Balance for Hot-spot and Unbalance Region Problems in HBase2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378465(2583-2589)Online publication date: 10-Dec-2020
    • (2020)A data distribution model for RDFDistributed and Parallel Databases10.1007/s10619-020-07296-wOnline publication date: 16-May-2020
    • (2019)BurScaleProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362706(126-138)Online publication date: 20-Nov-2019
    • (2019)Multi-Level Elasticity for Data Stream ProcessingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.290795030:10(2326-2337)Online publication date: 1-Oct-2019
    • (2019)Decision-Making Approaches for Performance QoS in Distributed Storage Systems: A SurveyIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2893940(1-1)Online publication date: 2019
    • (2019)A Case for Dynamically Programmable Storage Background Tasks2019 38th International Symposium on Reliable Distributed Systems Workshops (SRDSW)10.1109/SRDSW49218.2019.00009(7-12)Online publication date: Oct-2019
    • (2019)Kaa: Evaluating Elasticity of Cloud-Hosted DBMS2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)10.1109/CloudCom.2019.00020(54-61)Online publication date: Dec-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media