Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Clay: fine-grained adaptive partitioning for general database schemas

Published: 01 November 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Transaction processing database management systems (DBMSs) are critical for today's data-intensive applications because they enable an organization to quickly ingest and query new information. Many of these applications exceed the capabilities of a single server, and thus their database has to be deployed in a distributed DBMS. The key factor affecting such a system's performance is how the database is partitioned. If the database is partitioned incorrectly, the number of distributed transactions can be high. These transactions have to synchronize their operations over the network, which is considerably slower and leads to poor performance. Previous work on elastic database repartitioning has focused on a certain class of applications whose database schema can be represented in a hierarchical tree structure. But many applications cannot be partitioned in this manner, and thus are subject to distributed transactions that impede their performance and scalability.
    In this paper, we present a new on-line partitioning approach, called Clay, that supports both tree-based schemas and more complex "general" schemas with arbitrary foreign key relationships. Clay dynamically creates blocks of tuples to migrate among servers during repartitioning, placing no constraints on the schema but taking care to balance load and reduce the amount of data migrated. Clay achieves this goal by including in each block a set of hot tuples and other tuples co-accessed with these hot tuples. To evaluate our approach, we integrate Clay in a distributed, main-memory DBMS and show that it can generate partitioning schemes that enable the system to achieve up to 15× better throughput and 99% lower latency than existing approaches.

    References

    [1]
    A. Adya, D. Myers, J. Howell, J. Elson, C. Meek, V. Khemani, S. Fulger, P. Gu, L. Bhuvanagiri, J. Hunter, et al. Slicer: Auto-sharding for datacenter applications. In USENIX Symposium on Operating Systems Design and Implementation, pages 739--753, 2016.
    [2]
    J. Baker, C. Bond, J. C. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In Conference on Innovative Data Systems Research, pages 223--234, 2011.
    [3]
    P. Bernstein, I. Cseri, N. Dani, N. Ellis, A. Kalhan, G. Kakivaya, D. B. Lomet, R. Manne, L. Novik, and T. Talius. Adapting Microsoft SQL server for cloud computing. In IEEE International Conference on Data Engineering, pages 1255--1263, 2011.
    [4]
    M. Cha, H. Haddadi, F. Benevenuto, and P. K. Gummadi. Measuring user influence in twitter: The million follower fallacy. In AAAI Conference on Web and Social Media, pages 10--17, 2010.
    [5]
    C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: A workload-driven approach to database replication and partitioning. Proceedings of the VLDB Endowment, 3(1--2):48--57, 2010.
    [6]
    S. Das, D. Agrawal, and A. El Abbadi. Elastras: An elastic, scalable, and self-managing transactional database for the cloud. ACM Transactions on Database Systems, 38(1):5:1--5:45, 2013.
    [7]
    D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. Oltp-bench: An extensible testbed for benchmarking relational databases. Proceedings of the VLDB Endowment, 7(4):277--288, 2013.
    [8]
    A. J. Elmore, V. Arora, R. Taft, A. Pavlo, D. Agrawal, and A. El Abbadi. Squall: Fine-grained live reconfiguration for partitioned main memory databases. In ACM SIGMOD International Conference on Management of Data, pages 299--313, 2015.
    [9]
    M. Ghosh, W. Wang, G. Holla, and I. Gupta. Morphus: Supporting online reconfigurations in sharded nosql systems. In IEEE International Conference on Autonomic Computing, pages 1--10, 2015.
    [10]
    A. Gionis, F. Junqueira, V. Leroy, M. Serafini, and I. Weber. Piggybacking on social networks. Proceedings of the VLDB Endowment, 6(6):409--420, 2013.
    [11]
    Y.-J. Hong and M. Thottethodi. Understanding and mitigating the impact of load imbalance in the memory caching tier. In ACM Symposium on Cloud Computing, 2013.
    [12]
    H-Store: A Next Generation OLTP DBMS. http://hstore.cs.brown.edu.
    [13]
    B. A. Huberman, D. M. Romero, and F. Wu. Social networks that matter: Twitter under the microscope. SSRN 1313405, 2008.
    [14]
    A. Jindal and J. Dittrich. Relax and let the database do the partitioning online. In International Workshop on Enabling Real-Time Business Intelligence, pages 65--80. 2012.
    [15]
    R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-store: A high-performance, distributed main memory transaction processing system. Proceedings of the VLDB Endowment, 1(2):1496--1499, 2008.
    [16]
    G. Karypis and V. Kumar. Metis-unstructured graph partitioning and sparse matrix ordering system, version 5.0. http://glaros.dtc.umn.edu/gkhome/metis/metis/overview.
    [17]
    B. Liu, J. Tatemura, O. Po, W.-P. Hsiung, and H. Hacigumus. Automatic entity-grouping for OLTP workloads. In IEEE International Conference on Data Engineering, pages 712--723, 2014.
    [18]
    M. A. U. Nasir, G. D. F. Morales, D. García-Soriano, N. Kourtellis, and M. Serafini. The power of both choices: Practical load balancing for distributed stream processing engines. In IEEE International Conference on Data Engineering, pages 137--148, 2015.
    [19]
    M. A. U. Nasir, G. D. F. Morales, N. Kourtellis, and M. Serafini. When two choices are not enough: Balancing at scale in distributed stream processing. In IEEE International Conference on Data Engineering, pages 589--600, 2016.
    [20]
    R. Nehme and N. Bruno. Automated partitioning design in parallel database systems. In ACM SIGMOD International Conference on Management of data, pages 1137--1148, 2011.
    [21]
    NuoDB. http://www.nuodb.com.
    [22]
    I. Pandis, P. Tözün, R. Johnson, and A. Ailamaki. PLP: Page latch-free shared-everything OLTP. Proceedings of the VLDB Endowment, 4(10):610--621, 2011.
    [23]
    A. Pavlo, C. Curino, and S. Zdonik. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In ACM SIGMOD International Conference on Management of Data, pages 61--72, 2012.
    [24]
    D. Porobic, E. Liarou, P. Tözün, and A. Ailamaki. Atrapos: Adaptive transaction processing on hardware islands. In IEEE International Conference on Data Engineering, pages 688--699, 2014.
    [25]
    A. Quamar, K. A. Kumar, and A. Deshpande. Sword: scalable workload-aware data placement for transactional workloads. In International Conference on Extending Database Technology, pages 430--441, 2013.
    [26]
    M. Serafini, E. Mansour, A. Aboulnaga, K. Salem, T. Rafiq, and U. F. Minhas. Accordion: Elastic scalability for database systems supporting distributed transactions. Proceedings of the VLDB Endowment, 7(12):1035--1046, 2014.
    [27]
    M. Stonebraker, A. Pavlo, R. Taft, and M. L. Brodie. Enterprise database applications and the cloud: A difficult road ahead. In IEEE International Conference on Cloud Engineering, pages 1--6, 2014.
    [28]
    R. Taft, E. Mansour, M. Serafini, J. Duggan, A. J. Elmore, A. Aboulnaga, A. Pavlo, and M. Stonebraker. E-store: Fine-grained elastic partitioning for distributed transaction processing systems. Proceedings of the VLDB Endowment, 8(3):245--256, 2014.
    [29]
    The TPC-C Benchmark, 1992. http://www.tpc.org/tpcc/.
    [30]
    K. Q. Tran, J. F. Naughton, B. Sundarmurthy, and D. Tsirogiannis. JECB: A join-extension, code-based approach to OLTP data partitioning. In ACM SIGMOD International Conference on Management of Data, pages 39--50, 2014.
    [31]
    B. Trushkowsky, P. Bodík, A. Fox, M. J. Franklin, M. I. Jordan, and D. A. Patterson. The SCADS director: Scaling a distributed storage system under stringent performance requirements. In USENIX Conference on File and Storage Technologies, pages 163--176, 2011.
    [32]
    S. Yang, X. Yan, B. Zong, and A. Khan. Towards effective partition management for large graphs. In ACM SIGMOD International Conference on Management of Data, pages 517--528, 2012.
    [33]
    E. Zamanian, C. Binnig, and A. Salama. Locality-aware partitioning in parallel database systems. In ACM SIGMOD International Conference on Management of Data, pages 17--30, 2015.

    Cited By

    View all
    • (2024)In-memory key-value store live migration with NetMigrateProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650710(209-224)Online publication date: 27-Feb-2024
    • (2024)SeLeP: Learning Based Semantic Prefetching for Exploratory Database WorkloadsProceedings of the VLDB Endowment10.14778/3659437.365945817:8(2064-2076)Online publication date: 1-Apr-2024
    • (2024)Scalable High-Quality Hypergraph PartitioningACM Transactions on Algorithms10.1145/362652720:1(1-54)Online publication date: 22-Jan-2024
    • Show More Cited By

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 10, Issue 4
    November 2016
    180 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 November 2016
    Published in PVLDB Volume 10, Issue 4

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)44
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)In-memory key-value store live migration with NetMigrateProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650710(209-224)Online publication date: 27-Feb-2024
    • (2024)SeLeP: Learning Based Semantic Prefetching for Exploratory Database WorkloadsProceedings of the VLDB Endowment10.14778/3659437.365945817:8(2064-2076)Online publication date: 1-Apr-2024
    • (2024)Scalable High-Quality Hypergraph PartitioningACM Transactions on Algorithms10.1145/362652720:1(1-54)Online publication date: 22-Jan-2024
    • (2024)Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning TechniquesJournal of Computer Science and Technology10.1007/s11390-024-3538-139:2(346-368)Online publication date: 1-Mar-2024
    • (2024)RCBench: an RDMA-enabled transaction framework for analyzing concurrency control algorithmsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00821-033:2(543-567)Online publication date: 1-Mar-2024
    • (2023)Efficient Distributed Transaction Processing in Heterogeneous NetworksProceedings of the VLDB Endowment10.14778/3583140.358315316:6(1372-1385)Online publication date: 20-Apr-2023
    • (2023)Grep: A Graph Learning Based Database Partitioning SystemProceedings of the ACM on Management of Data10.1145/35889481:1(1-24)Online publication date: 30-May-2023
    • (2023)DBPA: A Benchmark for Transactional Database Performance AnomaliesProceedings of the ACM on Management of Data10.1145/35889261:1(1-26)Online publication date: 30-May-2023
    • (2023)EfShard: Toward Efficient State Sharding Blockchain via Flexible and Timely State AllocationIEEE Transactions on Network and Service Management10.1109/TNSM.2023.323643320:3(2817-2829)Online publication date: 1-Sep-2023
    • (2022)SkinnerMTProceedings of the VLDB Endowment10.14778/3574245.357427216:4(905-917)Online publication date: 1-Dec-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media