Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3389716acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage Engines

Published: 31 May 2020 Publication History

Abstract

For decades, ARIES has been the standard for logging and recovery in database systems. ARIES offers important features like support for arbitrary workloads, fuzzy checkpoints, and transparent index recovery. Nevertheless, many modern in-memory database systems use more lightweight approaches that have less overhead and better multi-core scalability but only work well for the in-memory setting. Recently, a new class of high-performance storage engines has emerged, which exploit fast SSDs to achieve performance close to pure in-memory systems but also allow out-of-memory workloads. For these systems, ARIES is too slow whereas in-memory logging proposals are not applicable. In this work, we propose a new logging and recovery design that supports incremental and fuzzy checkpointing, index recovery, out-of-memory workloads, and low-latency transaction commits. Our continuous checkpointing algorithm guarantees bounded recovery time. Using per-thread logging with minimal synchronization, our implementation achieves near-linear scalability on multi-core CPUs. We implemented and evaluated these techniques in our LeanStore storage engine. For working sets that fit in main memory, we achieve performance close to that of an in-memory approach, even with logging, checkpointing, and dirty page writing enabled. For the out-of-memory scenario, we outperform a state-of-the-art ARIES implementation by a factor of two.

Supplementary Material

MP4 File (3318464.3389716.mp4)
Presentation Video

References

[1]
Divyakant Agrawal, Amr El Abbadi, Richard Jeffers, and Lijing Lin. 1995. Ordered Shared Locks for Real-Time Databases. VLDB J., Vol. 4, 1 (1995).
[2]
Karolina Alexiou, Donald Kossmann, and Per-Åke Larson. 2013. Adaptive Range Filters for Cold Data: Avoiding Trips to Siberia. PVLDB, Vol. 6, 14 (2013).
[3]
Panagiotis Antonopoulos, Peter Byrne, Wayne Chen, Cristian Diaconu, Raghavendra Thallam Kodandaramaih, Hanuma Kodavalla, Prashanth Purnananda, Adrian-Leonard Radu, Chaitanya Sreenivas Ravella, and Girish Mittur Venkataramanappa. 2019. Constant Time Recovery in Azure SQL Database. PVLDB, Vol. 12, 12 (2019).
[4]
Raja Appuswamy, Renata Borovica-Gajic, Goetz Graefe, and Anastasia Ailamaki. 2017. The Five-minute Rule Thirty Years Later and its Impact on the Storage Hierarchy. In ADMS.
[5]
Joy Arulraj, Justin J. Levandoski, Umar Farooq Minhas, and Per-Åke Larson. 2018. BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory. PVLDB, Vol. 11, 5 (2018).
[6]
Joy Arulraj, Andrew Pavlo, and Subramanya Dulloor. 2015. Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems. In SIGMOD.
[7]
Joy Arulraj, Andy Pavlo, and Krishna Teja Malladi. 2019. Multi-Tier Buffer Management and Storage System Design for Non-Volatile Memory. CoRR (2019).
[8]
Joy Arulraj, Matthew Perron, and Andrew Pavlo. 2016. Write-Behind Logging. PVLDB, Vol. 10, 4 (2016).
[9]
Philip A. Bernstein and Sudipto Das. 2015. Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log. IEEE Data Eng. Bull., Vol. 38, 1 (2015).
[10]
Tuan Cao, Marcos Antonio Vaz Salles, Benjamin Sowell, Yao Yue, Alan J. Demers, Johannes Gehrke, and Walker M. White. 2011. Fast checkpoint recovery algorithms for frequently consistent applications. In SIGMOD.
[11]
Joel Coburn, Trevor Bunker, Meir Schwarz, Rajesh Gupta, and Steven Swanson. 2013. From ARIES to MARS: transaction support for next-generation, solid-state drives. In SOSP.
[12]
Justin DeBrabant, Andrew Pavlo, Stephen Tu, Michael Stonebraker, and Stanley B. Zdonik. 2013. Anti-Caching: A New Approach to Database Management System Architecture. PVLDB, Vol. 6, 14 (2013).
[13]
Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Åke Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL server's memory-optimized OLTP engine. In SIGMOD.
[14]
Ru Fang, Hui-I Hsiao, Bin He, C. Mohan, and Yun Wang. 2011. High performance database logging using storage class memory.
[15]
Shen Gao, Jianliang Xu, Theo H"a rder, Bingsheng He, Byron Choi, and Haibo Hu. 2015. PCMLogging: Optimizing Transaction Logging and Recovery Performance with PCM. IEEE Trans. Knowl. Data Eng., Vol. 27, 12 (2015).
[16]
Goetz Graefe, Wey Guy, and Caetano Sauer. 2016. Instant Recovery with Write-Ahead Logging: Page Repair, System Restart, Media Restore, and System Failover, Second Edition .Morgan & Claypool Publishers. https://doi.org/10.2200/S00710ED2V01Y201603DTM044
[17]
Goetz Graefe, Haris Volos, Hideaki Kimura, Harumi A. Kuno, Joseph Tucek, Mark Lillibridge, and Alistair C. Veitch. 2014. In-Memory Performance for Big Data. PVLDB, Vol. 8, 1 (2014).
[18]
Gabriel Haas, Michael Haubenschild, and Viktor Leis. 2020. Exploiting Directly-Attached NVMe Arrays in DBMS. In CIDR.
[19]
Theo H"a rder and Andreas Reuter. 1983. Principles of Transaction-Oriented Database Recovery. ACM Comput. Surv., Vol. 15, 4 (1983).
[20]
Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker. 2008. OLTP through the looking glass, and what we found there. In SIGMOD.
[21]
Jian Huang, Karsten Schwan, and Moinuddin K. Qureshi. 2014. NVRAM-aware Logging in Transaction Systems. PVLDB, Vol. 8, 4 (2014).
[22]
Ryan Johnson, Ippokratis Pandis, Radu Stoica, Manos Athanassoulis, and Anastasia Ailamaki. 2010. Aether: A Scalable Approach to Logging. PVLDB, Vol. 3, 1 (2010).
[23]
Ryan Johnson, Ippokratis Pandis, Radu Stoica, Manos Athanassoulis, and Anastasia Ailamaki. 2012. Scalability of write-ahead logging on multicore and multisocket hardware. VLDB J., Vol. 21, 2 (2012).
[24]
Hyungsoo Jung, Hyuck Han, and Sooyong Kang. 2017. Scalable Database Logging for Multicores. PVLDB, Vol. 11, 2 (2017).
[25]
Jong-Bin Kim, Hyeongwon Jang, Seohui Son, Hyuck Han, Sooyong Kang, and Hyungsoo Jung. 2019. Border-Collie: A Wait-free, Read-optimal Algorithm for Database Logging on Multicore Hardware. In SIGMOD.
[26]
Kangnyeon Kim, Tianzheng Wang, Ryan Johnson, and Ippokratis Pandis. 2016b. ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads. In SIGMOD.
[27]
Wook-Hee Kim, Jinwoong Kim, Woongki Baek, Beomseok Nam, and Youjip Won. 2016a. NVWAL: Exploiting NVRAM in Write-Ahead Logging. In ASPLOS.
[28]
Hideaki Kimura. 2015. FOEDUS: OLTP Engine for a Thousand Cores and NVRAM. In SIGMOD.
[29]
Leslie Lamport. 1978. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM, Vol. 21, 7 (1978).
[30]
Per-Åke Larson, Mike Zwilling, and Kevin Farlee. 2013. The Hekaton Memory-Optimized OLTP Engine. IEEE Data Eng. Bull., Vol. 36, 2 (2013).
[31]
Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In ICDE.
[32]
Viktor Leis, Michael Haubenschild, and Thomas Neumann. 2019. Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method. IEEE Data Eng. Bull., Vol. 42, 1 (2019).
[33]
Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of practical synchronization. In DaMoN. ACM.
[34]
Justin J. Levandoski, Per-Åke Larson, and Radu Stoica. 2013. Identifying hot and cold data in main-memory databases. In ICDE.
[35]
David B. Lomet, Alan Fekete, Rui Wang, and Peter Ward. 2012. Multi-version Concurrency via Timestamp Range Conflict Management. In ICDE.
[36]
Nirmesh Malviya, Ariel Weisberg, Samuel Madden, and Michael Stonebraker. 2014. Rethinking main memory OLTP recovery. In ICDE.
[37]
C. Mohan. 1999. Repeating History Beyond ARIES. In VLDB.
[38]
C. Mohan, Don Haderle, Bruce G. Lindsay, Hamid Pirahesh, and Peter M. Schwarz. 1992. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM TODS (1992).
[39]
Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR.
[40]
Patrick E. O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth J. O'Neil. 1996. The Log-Structured Merge-Tree (LSM-Tree). Acta Inf., Vol. 33, 4 (1996).
[41]
Ismail Oukid, Daniel Booss, Wolfgang Lehner, Peter Bumbulis, and Thomas Willhalm. 2014. SOFORT: a hybrid SCM-DRAM storage engine for fast data recovery. In DaMoN.
[42]
Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, and Wolfgang Lehner. 2016. FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory. In SIGMOD.
[43]
Ismail Oukid, Wolfgang Lehner, Thomas Kissinger, Thomas Willhalm, and Peter Bumbulis. 2015. Instant Recovery for Main Memory Databases. In CIDR.
[44]
Steven Pelley, Thomas F. Wenisch, Brian T. Gold, and Bill Bridge. 2013. Storage Management in the NVRAM Era. PVLDB, Vol. 7, 2 (2013).
[45]
Kun Ren, Thaddeus Diamond, Daniel J. Abadi, and Alexander Thomson. 2016. Low-Overhead Asynchronous Checkpointing in Main-Memory Database Systems. In SIGMOD.
[46]
Caetano Sauer, Goetz Graefe, and Theo H"a rder. 2018. FineLine: log-structured transactional storage and recovery. PVLDB, Vol. 11, 13 (2018).
[47]
Vishal Sikka, Franz F"a rber, Wolfgang Lehner, Sang Kyun Cha, Thomas Peh, and Christof Bornhö vd. 2012. Efficient transaction processing in SAP HANA database: the end of a column store myth. In SIGMOD.
[48]
Hironobu Suzuki. 2016. The Internals of PostgreSQL .Self-published. http://www.interdb.jp/pg/
[49]
Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy transactions in multicore in-memory databases. In SIGOPS.
[50]
Alexander van Renen, Viktor Leis, Alfons Kemper, Thomas Neumann, Takushi Hashida, Kazuichi Oe, Yoshiyasu Doi, Lilian Harada, and Mitsuru Sato. 2018. Managing Non-Volatile Memory in Database Systems. In SIGMOD.
[51]
Alexander van Renen, Lukas Vogel, Viktor Leis, Thomas Neumann, and Alfons Kemper. 2019. Persistent Memory I/O Primitives. In DaMoN.
[52]
Tianzheng Wang and Ryan Johnson. 2014. Scalable Logging through Emerging Non-Volatile Memory. PVLDB, Vol. 7, 10 (2014).
[53]
Tianzheng Wang, Ryan Johnson, and Ippokratis Pandis. 2017. Query Fresh: Log Shipping on Steroids. PVLDB, Vol. 11, 4 (2017).
[54]
Chang Yao, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, and Sai Wu. 2016. Adaptive Logging: Optimizing Logging and Recovery Costs in Distributed In-memory Databases. In SIGMOD.
[55]
Wenting Zheng, Stephen Tu, Eddie Kohler, and Barbara Liskov. 2014. Fast Databases with Fast Durability and Recovery Through Multicore Parallelism. In OSDI.

Cited By

View all
  • (2024)Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAPProceedings of the VLDB Endowment10.14778/3681954.368200117:11(3290-3303)Online publication date: 1-Jul-2024
  • (2024)TimeCloth: Fast Point-in-Time Database Recovery in The CloudCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653382(214-226)Online publication date: 9-Jun-2024
  • (2024)Hybrid-Memcached: A Novel Approach for Memcached Persistence Optimization With Hybrid MemoryIEEE Transactions on Computers10.1109/TC.2024.338527973:7(1866-1874)Online publication date: 4-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. checkpointing
  2. leanstore
  3. logging
  4. recovery
  5. storage engine

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)737
  • Downloads (Last 6 weeks)109
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAPProceedings of the VLDB Endowment10.14778/3681954.368200117:11(3290-3303)Online publication date: 1-Jul-2024
  • (2024)TimeCloth: Fast Point-in-Time Database Recovery in The CloudCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653382(214-226)Online publication date: 9-Jun-2024
  • (2024)Hybrid-Memcached: A Novel Approach for Memcached Persistence Optimization With Hybrid MemoryIEEE Transactions on Computers10.1109/TC.2024.338527973:7(1866-1874)Online publication date: 4-Apr-2024
  • (2024)Riveter: Adaptive Query Suspension and Resumption Framework for Cloud Native Databases2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00304(3975-3988)Online publication date: 13-May-2024
  • (2024)Why Files If You Have a DBMS?2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00297(3878-3892)Online publication date: 13-May-2024
  • (2024)Fast Parallel Recovery for Transactional Stream Processing on Multicores2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00122(1478-1491)Online publication date: 13-May-2024
  • (2023)Breathing New Life into an Old Tree: Resolving Logging Dilemma of B+-tree on Modern Computational Storage DrivesProceedings of the VLDB Endowment10.14778/3626292.362629717:2(134-147)Online publication date: 1-Oct-2023
  • (2023)DecLog: Decentralized Logging in Non-Volatile Memory for Time Series Database SystemsProceedings of the VLDB Endowment10.14778/3617838.361783917:1(1-14)Online publication date: 4-Dec-2023
  • (2023)What Modern NVMe Storage Can Do, and How to Exploit it: High-Performance I/O for High-Performance Storage EnginesProceedings of the VLDB Endowment10.14778/3598581.359858416:9(2090-2102)Online publication date: 10-Jul-2023
  • (2023)Scalable and Robust Snapshot Isolation for High-Performance Storage EnginesProceedings of the VLDB Endowment10.14778/3583140.358315716:6(1426-1438)Online publication date: 20-Apr-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media