Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Scalable and adaptive log manager in distributed systems

Published: 08 August 2022 Publication History
  • Get Citation Alerts
  • Abstract

    On-line transaction processing (OLTP) systems rely on transaction logging and quorum-based consensus protocol to guarantee durability, high availability and strong consistency. This makes the log manager a key component of distributed database management systems (DDBMSs). The leader of DDBMSs commonly adopts a centralized logging method to writing log entries into a stable storage device and uses a constant log replication strategy to periodically synchronize its state to followers. With the advent of new hardware and high parallelism of transaction processing, the traditional centralized design of logging limits scalability, and the constant trigger condition of replication can not always maintain optimal performance under dynamic workloads.
    In this paper, we propose a new log manager named Salmo with scalable logging and adaptive replication for distributed database systems. The scalable logging eliminates centralized contention by utilizing a highly concurrent data structure and speedy log hole tracking. The kernel of adaptive replication is an adaptive log shipping method, which dynamically adjusts the number of log entries transmitted between leader and followers based on the real-time workload. We implemented and evaluated Salmo in the open-sourced transaction processing systems Cedar and DBx1000. Experimental results show that Salmo scales well by increasing the number of working threads, improves peak throughput by 1.56× and reduces latency by more than 4× over log replication of Raft, and maintains efficient and stable performance under dynamic workloads all the time.

    References

    [1]
    Stonebraker M Concurrency control and consistency of multiple copies of data in distributed INGRES IEEE Transactions on Software Engineering 1979 SE-5 3 188-194
    [2]
    Corbett J C, Dean J, Epstein M, Fikes A, Frost C, et al. Spanner: Google’s globally distributed database ACM Transactions on Computer Systems 2013 31 3 8
    [3]
    Lamport L The part-time parliament ACM Transactions on Computer Systems 1998 16 2 133-169
    [4]
    Ongaro D, Ousterhout J. In search of an understandable consensus algorithm. In: Proceedings of 2014 USENIX Conference on USENIX Annual Technical Conference. 2014, 305–320
    [5]
    Gray J, McJones P, Blasgen M, Lindsay B, Lorie R, Price T, Putzolu F, and Traiger I The recovery manager of the system R database manager ACM Computing Surveys 1981 13 2 223-243
    [6]
    Mohan C, Haderle D, Lindsay B, Pirahesh H, and Schwarz P ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging ACM Transactions on Database Systems 1992 17 1 94-162
    [7]
    Diaconu C, Freedman C, Ismert E, Larson P A, Mittal P, Stonecipher R, Verma N, Zwilling M. Hekaton: SQL server’s memory-optimized OLTP engine. In: Proceedings of 2013 ACM SIGMOD International Conference on Management of Data. 2013, 1243–1254
    [8]
    Levandoski J J, Lomet D B, Sengupta S, Stutsman R, Wang R. High performance transactions in deuteronomy. In: Proceedings of the 7th Biennial Conference on Innovative Data Systems Research. 2015
    [9]
    Lim H, Kaminsky M, Andersen D G. Cicada: dependably fast multicore in-memory transactions. In: Proceedings of 2017 ACM International Conference on Management of Data. 2017, 21–35
    [10]
    Johnson R, Pandis I, Stoica R, Athanassoulis M, and Ailamaki A Aether: a scalable approach to logging Proceedings of the VLDB Endowment 2010 3 1–2 681-692
    [11]
    Johnson R, Pandis I, Stoica R, Athanassoulis M, and Ailamaki A Scalability of write-ahead logging on multicore and multisocket hardware The VLDB Journal 2012 21 2 239-263
    [12]
    Kim K, Wang T, Johnson R, Pandis I. ERMIA: fast memory-optimized database system for heterogeneous workloads. In: Proceedings of 2016 International Conference on Management of Data. 2016, 1675–1687
    [13]
    Jung H, Han H, and Kang S Scalable database logging for multicores Proceedings of the VLDB Endowment 2017 11 2 135-148
    [14]
    Tu S, Zheng W, Kohler E, Liskov B, Madden S. Speedy transactions in multicore in-memory databases. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles. 2013, 18–32
    [15]
    Zheng W, Tu S, Kohler E, Liskov B. Fast databases with fast durability and recovery through multicore parallelism. In: Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation. 2014, 465–477
    [16]
    Kimura H. FOEDUS: OLTP engine for a thousand cores and NVRAM. In: Proceedings of 2015 ACM SIGMOD International Conference on Management of Data. 2015, 691–706
    [17]
    Kim J, Jang H, Son S, Han H, Kang S, Jung H. Border-collie: a wait-free, read-optimal algorithm for database logging on multicore hardware. In: Proceedings of 2019 International Conference on Management of Data. 2019, 723–740
    [18]
    Xia Y, Yu X, Pavlo A, and Devadas S Taurus: lightweight parallel logging for in-memory database management systems Proceedings of the VLDB Endowment 2020 14 2 189-201
    [19]
    Haubenschild M, Sauer C, Neumann T, Leis V. Rethinking logging, checkpoints, and recovery for high-performance storage engines. In: Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. 2020, 877–892
    [20]
    Zhou H, Guo J, Hu H, Qian W, Zhou X, and Zhou A Plover: parallel logging for replication systems Frontiers of Computer Science 2020 14 4 144606
    [21]
    Hong C, Zhou D, Yang M, Kuo C, Zhang L, Zhou L. KuaFu: closing the parallelism gap in database replication. In: Proceedings of the 29th International Conference on Data Engineering. 2013, 1186–1195
    [22]
    Poke M, Hoefler T. DARE: high-performance state machine replication on RDMA networks. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. 2015, 107–118
    [23]
    Qin D, Brown A D, and Goel A Scalable replay-based replication for fast databases Proceedings of the VLDB Endowment 2017 10 13 2025-2036
    [24]
    Wang T, Johnson R, and Pandis I Query fresh: log shipping on steroids Proceedings of the VLDB Endowment 2017 11 4 406-419
    [25]
    Guo J, Chu J, Cai P, Zhou M, and Zhou A Low-overhead Paxos replication Data Science and Engineering 2017 2 2 169-177
    [26]
    Arora V, Mittal T, Agrawal D, El-Abbadi A, Xue X, Zhi Y, Zhu J. Leader or majority: why have one when you can have both? Improving read scalability in raft-like consensus protocols. In: Proceedings of the 9th USENIX Conference on Hot Topics in Cloud Computing. 2017
    [27]
    Cao W, Liu Z, Wang P, Chen S, Zhu C, Zheng S, Wang Y, and Ma G PolarFS: an ultra-low latency and failure resilient distributed file system for shared storage cloud database Proceedings of the VLDB Endowment 2018 11 12 1849-1862
    [28]
    Zhang Z, Hu H, Yu Y, Qian W, Shu K. Dependency preserved raft for transactions. In: Proceedings of the 25th International Conference of Database Systems for Advanced Applications. 2020, 228–245
    [29]
    Pan Y, Li Y, Zhang C, Zhang R, Hong D. The design and implementation of an efficient order management system based on CEDAR. Journal of East China Normal University (Natural Science), 2018, (3): 88–96
    [30]
    Helland P, Sammer H, Lyon J, Carr R, Garrett P, Reuter A. Group commit timers and high volume transaction systems. In: Proceedings of the 2nd International Workshop on High Performance Transaction Systems. 1987, 301–329
    [31]
    Spiro P M, Joshi A M, and Rengarajan T K Designing an optimized transaction commit protocol Digital Technical Journal 1991 3 1 1-32
    [32]
    Howard H ARC: analysis of raft consensus 2014 Cambridge University of Cambridge
    [33]
    Huang D, Liu Q, Cui Q, Fang Z, Ma X, Xu F, Shen L, Tang L, Zhou Y, Huang M, Wei W, Liu C, Zhang J, Li J, Wu X, Song L, Sun R, Yu S, Zhao L, Cameron N, Pei L, and Tang X TiDB: a raft-based HTAP database Proceedings of the VLDB Endowment 2020 13 12 3072-3084
    [34]
    Wang T and Johnson R Scalable logging through emerging non-volatile memory Proceedings of the VLDB Endowment 2014 7 10 865-876
    [35]
    Li K and Han F Memory transaction engine of OceanBase Journal of East China Normal University (Natural Science) 2014 5 149-163
    [36]
    Cooper B F, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing. 2010, 143–154
    [37]
    Zhu T, Zhao Z, Li F, Qian W, Zhou A, Xie D, Stutsman R, Li H, and Hu H SolarDB: toward a shared-everything database on distributed log-structured storage ACM Transaction on Storage 2019 15 2 11
    [38]
    Gray J. Notes on data base operating systems. In: Proceedings of Operating Systems, an Advanced Course. 1978, 393–481
    [39]
    Harizopoulos S, Abadi D J, Madden S, Stonebraker M. OLTP through the looking glass, and what we found there. In: Proceedings of 2008 ACM SIGMOD International Conference on Management of Data. 2008, 981–992
    [40]
    Huang J, Schwan K, and Qureshi M K NVRAM-aware logging in transaction systems Proceedings of the VLDB Endowment 2014 8 4 389-400
    [41]
    Arulraj J, Perron M, and Pavlo A Write-behind logging Proceedings of the VLDB Endowment 2016 10 4 337-348
    [42]
    Hagmann R. Reimplementing the cedar file system using logging and group commit. In: Proceedings of the Eleventh ACM Symposium on Operating Systems Principles. 1987, 155–162
    [43]
    Lamport L. Paxos made simple, fast, and byzantine. In: Proceedings of the 6th International Conference on Principles of Distributed Systems. 2002, 7–9
    [44]
    Lamport L Fast Paxos Distributed Computing 2006 19 2 79-103
    [45]
    Burrows M. The Chubby lock service for loosely-coupled distributed systems. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation. 2006, 335–350
    [46]
    Baker J, Bond C, Corbett J C, Furman J J, Khorlin A, Larson J, Leon J M, Li Y, Lloyd A, Yushprakh V. Megastore: providing scalable, highly available storage for interactive services. In: Proceedings of the 5th Biennial Conference on Innovative Data Systems Research. 2011, 223–234
    [47]
    Shute J, Vingralek R, Samwel B, Handy B, Whipkey C, Rollins E, Oancea M, Littlefield K, Menestrina D, Ellner S, Cieslewicz J, Rae I, Stancescu T, and Apte H F1: a distributed SQL database that scales Proceedings of the VLDB Endowment 2013 6 11 1068-1079
    [48]
    Zheng J, Lin Q, Xu J, Wei C, Zeng C, Yang P, and Zhang Y PaxosStore: high-availability storage made practical in WeChat Proceedings of the VLDB Endowment 2017 10 12 1730-1741
    [49]
    Rao J, Shekita E J, and Tata S Using Paxos to build a scalable, consistent, and highly available datastore Proceedings of the VLDB Endowment 2011 4 4 243-254
    [50]
    Ousterhout J, Agrawal P, Erickson D, Kozyrakis C, Leverich J, Mazières D, Mitra S, Narayanan A, Parulkar G, Rosenblum M, Rumble S M, Stratmann E, and Stutsman R The case for RAMClouds: scalable high-performance storage entirely in DRAM ACM SIGOPS Operating Systems Review 2010 43 4 92-105

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Frontiers of Computer Science: Selected Publications from Chinese Universities
    Frontiers of Computer Science: Selected Publications from Chinese Universities  Volume 17, Issue 2
    Apr 2023
    235 pages
    ISSN:2095-2228
    EISSN:2095-2236
    Issue’s Table of Contents

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 08 August 2022
    Accepted: 07 March 2022
    Received: 02 July 2021

    Author Tags

    1. distributed database systems
    2. transaction logging
    3. log replication
    4. scalable
    5. adaptive

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Aug 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media