Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Magma: a high data density storage engine used in couchbase

Published: 01 August 2022 Publication History

Abstract

We present Magma, a write-optimized high data density key-value storage engine used in the Couchbase NoSQL distributed document database. Today's write-heavy data-intensive applications like ad-serving, internet-of-things, messaging, and online gaming, generate massive amounts of data. As a result, the requirement for storing and retrieving large volumes of data has grown rapidly. Distributed databases that can scale out horizontally by adding more nodes can be used to serve the requirements of these internet-scale applications. To maintain a reasonable cost of ownership, we need to improve storage efficiency in handling large data volumes per node, such that we don't have to rely on adding more nodes. Our current generation storage engine, Couchstore is based on a log-structured append-only copy-on-write B+Tree architecture. To make substantial improvements to support higher data density and write throughput, we needed a storage engine architecture that lowers write amplification and avoids compaction operations that rewrite the whole database files periodically.
We introduce Magma, a hybrid key-value storage engine that combines LSM Trees and a segmented log approach from log-structured file systems. We present a novel approach to performing garbage collection of stale document versions avoiding index lookup during log segment compaction. This is the key to achieving storage efficiency for Magma and eliminates the need for random I/Os during compaction. Magma offers significantly lower write amplification, scalable incremental compaction, and lower space amplification while not regressing the read amplification. Through the efficiency improvements, we improved the single machine data density supported by the Couchbase Server by 3.3x and lowered the memory requirement by 10x, thereby reducing the total cost of ownership up to 10x. Our evaluation results show that Magma outperforms Couchstore and RocksDB in write-heavy workloads.

References

[1]
[n.d.]. LZ4 - Extremely fast compression. Retrieved June 27, 2022 from https://lz4.github.io/lz4/
[2]
J Chris Anderson, Jan Lehnardt, and Noah Slater. 2010. CouchDB: the definitive guide: time to relax. " O'Reilly Media, Inc.".
[3]
Remzi H Arpaci-Dusseau and Andrea C Arpaci-Dusseau. 2018. Operating systems: Three easy pieces. Arpaci-Dusseau Books LLC Boston.
[4]
Manos Athanassoulis, Michael S Kester, Lukas M Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. Designing Access Methods: The RUM Conjecture. In EDBT, Vol. 2016. 461--466.
[5]
Suparna Bhattacharya, Steven Pratt, Badari Pulavarty, and Janet Morgan. 2003. Asynchronous I/O support in Linux 2.5. In Proceedings of the Linux Symposium. 371--386.
[6]
Dipti Borkar, Ravi Mayuram, Gerald Sangudi, and Michael Carey. 2016. Have your data and query it too: From key-value caching to big data management. In Proceedings of the 2016 International Conference on Management of Data. 239--251.
[7]
Douglas Comer. 1979. Ubiquitous B-tree. ACM Computing Surveys (CSUR) 11, 2 (1979), 121--137.
[8]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing. 143--154.
[9]
Couchbase. [n.d.]. Couchbase Multi-Dimensional Scaling. Retrieved March 1, 2022 from https://docs.couchbase.com/operator/current/concept-mds.html
[10]
Couchbase. [n.d.]. Couchstore. Retrieved March 1, 2022 from https://github.com/couchbase/couchstore
[11]
Couchbase. 2015. Couchbase Database Change Protocol. Retrieved March 1, 2022 from https://blog.couchbase.com/inside-couchbase-server-database-change-protocol-the-super-conductor-that-wires-couchbase-server
[12]
Facebook. [n.d.]. RocksDB write stalls. Retrieved March 1, 2022 from https://github.com/facebook/rocksdb/wiki/Write-Stalls
[13]
Facebook. [n.d.]. Zstandard - Realtime data compression algorithm. Retrieved June 27, 2022 from https://facebook.github.io/zstd/
[14]
Facebook. 2013. RocksDB. Retrieved March 1, 2022 from http://rocksdb.org
[15]
Sanjay Ghemawat and Jeff Dean. 2011. LevelDB. Retrieved March 1, 2022 from http://code.google.com/p/leveldb
[16]
Google. [n.d.]. Snappy, a fast compressor/decompressor. Retrieved June 27, 2022 from https://github.com/google/snappy
[17]
Christoph Hellwig. 2009. XFS: the big storage file system for Linux.; login:: the magazine of USENIX & SAGE 34, 5 (2009), 10--18.
[18]
Xiao-Yu Hu, Evangelos Eleftheriou, Robert Haas, Ilias Iliadis, and Roman Pletka. 2009. Write amplification analysis in flash-based solid state drives. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference. 1--9.
[19]
Murtadha AI Hubail, Ali Alsuliman, Michael Blow, Michael Carey, Dmitry Lychagin, Ian Maxon, and Till Westmann. 2019. Couchbase analytics. Proceedings of the VLDB Endowment 12, 12 (Aug. 2019), 2275--2286.
[20]
Sarath Lakshman, Sriram Melkote, John Liang, and Ravi Mayuram. 2016. Nitro: a fast, scalable in-memory storage engine for nosql global secondary index. Proceedings of the VLDB Endowment 9, 13 (2016), 1413--1424.
[21]
Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. {F2FS}: A New File System for Flash Storage. In 13th USENIX Conference on File and Storage Technologies (FAST 15). 273--286.
[22]
David Lomet. 1995. The case for log structuring in database systems. In Int'l Workshop on High Performance Transaction Systems. Citeseer.
[23]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2017. Wisckey: Separating keys from values in ssd-conscious storage. ACM Transactions on Storage (TOS) 13, 1 (2017), 1--28.
[24]
Chen Luo and Michael J. Carey. 2020. LSM-Based Storage Techniques: A Survey. The VLDB Journal 29, 1 (jan 2020), 393--418.
[25]
Changwoo Min, Kangnyeon Kim, Hyunjin Cho, Sang-Won Lee, and Young Ik Eom. 2012. SFS: random write considered harmful in solid state drives. In FAST, Vol. 12. 1--16.
[26]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The Log-Structured Merge-Tree (LSM-Tree).
[27]
William Pugh. 1990. Skip lists: a probabilistic alternative to balanced trees. Commun. ACM 33, 6 (1990), 668--676.
[28]
Mendel Rosenblum and John K Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems (TOCS) 10, 1 (1992), 26--52.
[29]
Abraham Silberschatz, Peter B Galvin, and Greg Gagne. 2006. Operating system concepts. John Wiley & Sons.
[30]
Håkan Sundell and Philippas Tsigas. 2005. Fast and lock-free concurrent priority queues for multi-thread systems. J. Parallel and Distrib. Comput. 65, 5 (2005), 609--627.

Cited By

View all
  • (2024)BG3: A Cost Effective and I/O Efficient Graph Database in BytedanceCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653373(360-372)Online publication date: 9-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 15, Issue 12
August 2022
551 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2022
Published in PVLDB Volume 15, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)4
Reflects downloads up to 03 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)BG3: A Cost Effective and I/O Efficient Graph Database in BytedanceCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653373(360-372)Online publication date: 9-Jun-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media