Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3323165.3323210acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article
Public Access

Small Refinements to the DAM Can Have Big Consequences for Data-Structure Design

Published: 17 June 2019 Publication History

Abstract

Storage devices have complex performance profiles, including costs to initiate IOs (e.g., seek times in hard drives), parallelism and bank conflicts (in SSDs), costs to transfer data, and firmware-internal operations. The Disk-Access Machine (DAM) model simplifies reality by assuming that storage devices transfer data in blocks of size B and that all transfers have unit cost. Despite its simplifications, the DAM model is reasonably accurate. In fact, if B is set to the half-bandwidth point, where the latency and bandwidth of the hardware are equal, the DAM approximates the IO cost on any hardware to within a factor of 2. Furthermore, the DAM explains the popularity of B-trees in the 70s and the current popularity of B-trees and log-structured merge trees. But it fails to explain why some B-trees use small nodes, whereas all B-trees use large nodes. In a DAM, all IOs, and hence all nodes, are the same size. In this paper, we show that the affine and PDAM models, which are small refinements of the DAM model, yield a surprisingly large improvement in predictability without sacrificing ease of use. We present benchmarks on a large collection of storage devices showing that the affine and PDAM models give good approximations of the performance characteristics of hard drives and SSDs, respectively. We show that the affine model explains node-size choices in B-trees and B+-trees. Furthermore, the models predict that the B-tree is highly sensitive to variations in the node size whereas B-trees are much less sensitive. These predictions are born out empirically. Finally, we show that in both the affine and PDAM models, it pays to organize data structures to exploit varying IO size. In the affine model, B-trees can be optimized so that all operations are simultaneously optimal, even up to lower order terms. In the PDAM model, B-trees (or B+-trees) can be organized so that both sequential and concurrent workloads are handled efficiently. We conclude that the DAM model is useful as a first cut when designing or analyzing an algorithm or data structure but the affine and PDAM models enable the algorithm designer to optimize parameter choices and fill in design details.

References

[1]
Alok Aggarwal and Jeffrey Scott Vitter. 1988. The Input/Output Complexity of Sorting and Related Problems. Commun. ACM, Vol. 31, 9 (Sept. 1988), 1116--1127.
[2]
Matthew Andrews, Michael A. Bender, and Lisa Zhang. 2002. New Algorithms for Disk Scheduling. Algorithmica, Vol. 32, 2 (2002), 277--301.
[3]
Lars Arge. 2002. External Memory Geometric Data Structures. Lecture notes of EEF Summer School on Massive Data Sets, Aarhus (2002).
[4]
Microsoft Azure. 2016. How to use batching to improve SQL Database application performance. https://docs.microsoft.com/en-us/azure/sql-database/sql-database-use-batching-to-improve-performance .
[5]
Rudolf Bayer and Edward M. McCreight. 1972. Organization and Maintenance of Large Ordered Indexes. Acta Informatica, Vol. 1, 3 (Feb. 1972), 173--189.
[6]
Naama Ben-David, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, Charles McGuffey, and Julian Shun. 2016. Parallel Algorithms for Asymmetric Read-Write Costs. In Proceedings of the 28th ACM on Symposium on Parallelism in Algorithms and Architectures (SPAA). 145--156.
[7]
Michael A. Bender, Jon Berry, Rob Johnson, Thomas M. Kroeger, Samuel McCauley, Cynthia A. Phillips, Bertrand Simon, Shikha Singh, and David Zage. 2016. Anti-Persistence on Persistent Storage: History-Independent Sparse Tables and Dictionaries. In Proceedings of the 35th ACM Symposium on Principles of Database Systems (PODS). 289--302.
[8]
Michael A. Bender, Jake Christensen, Alex Conway, Martin Farach-Colton, Rob Johnson, and Meng-Tsung Tsai. 2019 a. Optimal Ball Recycling. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 2527--2546.
[9]
Michael A. Bender, Erik Demaine, and Martin Farach-Colton. 2005. Cache-Oblivious B-Trees., Vol. 35, 2 (2005), 341--358.
[10]
Michael A. Bender, Martin Farach-Colton, Jeremy T. Fineman, Yonatan R. Fogel, Bradley C. Kuszmaul, and Jelani Nelson. 2007. Cache-Oblivious Streaming B-trees. In Proceedings of the 19th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA). 81--92.
[11]
Michael A Bender, Martin Farach-Colton, Mayank Goswami, Rob Johnson, Samuel McCauley, and Shikha Singh. 2018. Bloom filters, adaptivity, and the dictionary problem. In Proceedings to the IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS). 182--193.
[12]
Michael A. Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Bradley C. Kuszmaul, Donald E. Porter, Jun Yuan, and Yang Zhan. 2015. An Introduction to B$^ε$-Trees and Write-Optimization. :login; magazine, Vol. 40, 5 (October 2015), 22--28.
[13]
Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok. 2012. Don't Thrash: How to Cache Your Hash on Flash. In Proceedings of the Very Large Data Bases (VLDB) Endowment, Vol. 5, 11 (2012), 1627--1637.
[14]
Michael A. Bender, Martin Farach-Colton, Rob Johnson, Simon Mauras, Tyler Mayer, Cynthia A. Phillips, and Helen Xu. 2017. Write-Optimized Skip Lists. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS). 69--78.
[15]
Michael A. Bender, Martin Farach-Colton, and Bradley Kuszmaul. 2006. Cache-Oblivious String B-Trees. In Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 233--242.
[16]
Michael A. Bender, Mart'in Farach-Colton, and William Kuszmaul. 2019 b. Achieving Optimal Backlog in Multi-Processor Cup Games. In Proceedings of the 51st Annual ACM Symposium on the Theory of Computing (STOC) .
[17]
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, and Julian Shun. 2016. Efficient Algorithms with Asymmetric Read and Write Costs. In Proceedings of the 24th Annual European Symposium on Algorithms (ESA). 14:1--14:18.
[18]
Guy E. Blelloch, Phillip B. Gibbons, Yan Gu, Charles McGuffey, and Julian Shun. 2018. The Parallel Persistent Memory Model. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (SPAA). 247--258.
[19]
Gerth S. Brodal, Erik D. Demaine, Jeremy T. Fineman, John Iacono, Stefan Langerman, and J. Ian Munro. 2010. Cache-Oblivious Dynamic Dictionaries with Update/Query Tradeoffs. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 1448--1456.
[20]
Gerth S. Brodal and Rolf Fagerberg. 2003. Lower Bounds for External Memory Dictionaries. In Proceedings of the 14th Annual ACM-SIAM symposium on Discrete Algorithms (SODA). 546--554.
[21]
Adam L. Buchsbaum, Michael H. Goldwasser, Suresh Venkatasubramanian, and Jeffery Westbrook. 2000. On External Memory Graph Traversal. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 859--860.
[22]
Mark Callaghan. 2011. Something awesome in InnoDB -- the insert buffer. https://www.facebook.com/notes/mysql-at-facebook/something-awesome-in-innodb-the-insert-buffer/492969385932/.
[23]
Mustafa Canim, Christian A. Lang, George A. Mihaila, and Kenneth A. Ross. 2010. Buffered Bloom filters on solid state storage. In International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures - (ADMS) .
[24]
Feng Chen, Binbing Hou, and Rubao Lee. 2016. Internal Parallelism of Flash Memory-Based Solid-State Drives. Transactions on Storage (TOS), Vol. 12, 3, Article 13 (May 2016), bibinfonumpages39 pages.
[25]
Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, and Gary Valentin. 2002. Fractal Prefetching B$^*$-Trees: Optimizing Both Cache and Disk Performance. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. 157--168.
[26]
Douglas Comer. 1979. The Ubiquitous B-Tree., Vol. 11, 2 (June 1979), 121--137.
[27]
Alexander Conway, Ainesh Bakshi, Yizheng Jiao, William Jannen, Yang Zhan, Jun Yuan, Michael A. Bender, Rob Johnson, Bradley C. Kuszmaul, Donald E. Porter, and Martin Farach-Colton. 2017a. File Systems Fated for Senescence? Nonsense, Says Science!. In 15th USENIX Conference on File and Storage Technologies (FAST). 45--58.
[28]
Alex Conway, Ainesh Bakshi, Yizheng Jiao, Yang Zhan, Michael A. Bender, William Jannen, Rob Johnson, Bradley C. Kuszmaul, Donald E. Porter, Jun Yuan, and Martin Farach-Colton. 2017b. How to Fragment Your File System. ;login:, Vol. 42, 2 (2017). https://www.usenix.org/publications/login/summer2017/conway
[29]
Alexander Conway, Martin Farach-Colton, and Philip Shilane. 2018. Optimal Hashing in External Memory. In Proceedings of the 45th International Colloquium on Automata, Languages and Programming (ICALP). 39:1--39:14.
[30]
Alex Conway, Eric Knorr, Yizheng Jiao, Michael A. Bender, William Jannen, Rob Johnson, Donald E. Porter, and Martin Farach-Colton. 2019. Filesystem Aging: It's more Usage than Fullness. In 11th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage) .
[31]
Peter Desnoyers. 2013. What Systems Researchers Need to Know about NAND Flash. In Proceedings of the 5th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage) .
[32]
John Esmet, Michael A. Bender, Martin Farach-Colton, and Bradley C. Kuszmaul. 2012. The TokuFS Streaming File System. In Proceedings of the 4th USENIX Conference on Hot Topics in Storage and File Systems (HotStorage). 14.
[33]
Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. 2012. Cache-Oblivious Algorithms. ACM Transactions on Algorithms (TALG), Vol. 8, 1 (2012), 4.
[34]
Pedram Ghodsnia, Ivan T. Bowman, and Anisoara Nica. 2014. Parallel I/O Aware Query Optimization. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 349--360.
[35]
Google, Inc. {n. d.}. LevelDB: A fast and lightweight key/value database library by Google. https://github.com/google/leveldb, Last Accessed Sep. 26, 2018.
[36]
Jun He, Sudarsun Kannan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. The Unwritten Contract of Solid State Drives. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys). 127--144.
[37]
IBM. 2017. Buffered inserts in partitioned database environments. https://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.apdv.embed.doc/doc/c0061906.html .
[38]
IBM Informix. {n. d.}. Understanding SQL insert cursors. https://www.ibm.com/support/knowledgecenter/en/SSBJG3_2.5.0/com.ibm.gen_busug.doc/c_fgl_InsertCursors_002.htm
[39]
Riko Jacob and Nodari Sitchinava. 2017. Lower Bounds in the Asymmetric External Memory Model. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 247--254.
[40]
William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015. BetrFS: A Right-Optimized Write-Optimized File System. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST). 301--315.
[41]
Chris Jermaine, Anindya Datta, and Edward Omiecinski. 1999. A Novel Index Supporting High Volume Data Warehouse Insertion. In Proceedings of 25th International Conference on Very Large Data Bases (VLDB). 235--246. http://www.vldb.org/conf/1999/P23.pdf
[42]
Bradley C. Kuszmaul. 2009. How Fractal Trees Work. In OpenSQL Camp. Portland, OR, USA. An expanded version was presented at the MySQL User Conference, Santa Clara, CA, USA April 2010.
[43]
Amanda McPherson. {n. d.}. A Conversation with Chris Mason on Btrfs: the next generation file system for Linux. https://www.linuxfoundation.org/blog/2009/06/a-conversation-with-chris-mason-on-btrfs/, Last Accessed Sep. 26, 2018.
[44]
MySQL 5.7 Reference Manual. {n. d.}. Chapter 15 The InnoDB Storage Engine. http://dev.mysql.com/doc/refman/5.7/en/innodb-storage-engine.html .
[45]
NuDB. 2016. NuDB: A fast key/value insert-only database for SSD drives in C+ 11. https://github.com/vinniefalco/NuDB .
[46]
Patrick O'Neil, Edward Cheng, Dieter Gawlic, and Elizabeth O'Neil. 1996. The Log-Structured Merge-Tree (LSM-tree). Acta Informatica, Vol. 33, 4 (1996), 351--385.
[47]
Oracle. 2017. Tuning the Database Buffer Cache. https://docs.oracle.com/database/121/TGDBA/tune_buffer_cache.htm .
[48]
Oracle Corporation. {n. d.}. MySQL 5.5 Reference Manual. https://dev.mysql.com/doc/refman/5.5/en/innodb-file-space.html, Last Accessed Sep. 26, 2018.
[49]
Oracle Corporation. 2015. Oracle BerkeleyDB Reference Guide. http://sepp.oetiker.ch/subversion-1.5.4-rp/ref/am_conf/pagesize.html, Last Accessed August 12, 2015.
[50]
Oracle Corporation. 2016. Setting Up Your Data Warehouse System. https://docs.oracle.com/cd/B28359_01/server.111/b28314/tdpdw_system.htm .
[51]
Anastasios Papagiannis, Giorgos Saloustros, Pilar Gonzá lez-Fé rez, and Angelos Bilas. 2016. Tucana: Design and Implementation of a Fast and Efficient Scale-up Key-value Store. In Proceedings of the USENIX 2016 Annual Technical Conference (USENIX ATC). 537--550.
[52]
John Paul. {n. d.}. Teradata Thoughts. http://teradata-thoughts.blogspot.com/2013/10/teradata-13-vs-teradata-14_20.html, Last Accessed Sep. 26, 2018.
[53]
Harald Prokop. 1999. Cache-Oblivious Algorithms. Master's thesis. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology.
[54]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP). 497--514.
[55]
Mendel Rosenblum and John K. Ousterhout. 1992. The Design and Implementation of a Log-structured File System. ACM Trans. Comput. Syst., Vol. 10, 1 (Feb. 1992), 26--52.
[56]
C Ruemmler and J. Wilkes. 1994. An introduction to disk drive modeling. IEEE Computer, Vol. 27, 3 (1994), 17--29.
[57]
SAP. 2017. RLV Data Store for Write-Optimized Storage. http://help-legacy.sap.com/saphelp_iq1611_iqnfs/helpdata/en/a3/13783784f21015bf03c9b06ad16fc0/content.htm .
[58]
Keith A. Smith and Margo I. Seltzer. 1997. File System Aging -- Increasing the Relevance of File System Benchmarks. In Measurement and Modeling of Computer Systems. 203--213.
[59]
TokuDB. {n. d.}. https://github.com/percona/PerconaFT, Last Accessed Sep. 24 2018.
[60]
Tokutek, Inc. {n. d.}. TokuMX--MongoDB Performance Engine. https://www.percona.com/software/mongo-database/percona-tokumx, Last Accessed Sep. 26, 2018.
[61]
Tokutek, Inc. 2013. TokuDB: MySQL Performance, MariaDB Performance. http://www.tokutek.com/products/tokudb-for-mysql/.
[62]
Vertica. 2017. WOS (Write Optimized Store). https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/Glossary/WOSWriteOptimizedStore.htm .
[63]
Jeffrey Scott Vitter. 2001. External memory algorithms and data structures: Dealing with massive data. ACM Computing surveys (CsUR), Vol. 33, 2 (2001), 209--271.
[64]
James Christopher Wyllie. 1979. The Complexity of Parallel Computations. Ph.D. Dissertation. Ithaca, NY, USA. AAI8004008.
[65]
Jimmy Xiang. 2012. Apache HBase Write Path. http://blog.cloudera.com/blog/2012/06/hbase-write-path/.
[66]
Jun Yuan, Yang Zhan, William Jannen, Prashant Pandey, Amogh Akshintala, Kanchan Chandnani, Pooja Deo, Zardosht Kasheff, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2016. Optimizing Every Operation in a Write-optimized File System. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST). 1--14.
[67]
Jun Yuan, Yang Zhan, William Jannen, Prashant Pandey, Amogh Akshintala, Kanchan Chandnani, Pooja Deo, Zardosht Kasheff, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2017. Writes Wrought Right, and Other Adventures in File System Optimization. TOS, Vol. 13, 1 (2017), 3:1--3:26.
[68]
Yang Zhan, Alexander Conway, Yizheng Jiao, Eric Knorr, Michael A. Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Donald E. Porter, and Jun Yuan. 2018. The Full Path to Full-Path Indexing. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST). 123--138.

Cited By

View all
  • (2024)CPMA: An Efficient Batch-Parallel Compressed Set Without PointersProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638492(348-363)Online publication date: 2-Mar-2024
  • (2024)Brief Announcement: Root-to-Leaf Scheduling in Write-Optimized TreesProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660514(475-477)Online publication date: 17-Jun-2024
  • (2023)BP-Tree: Overcoming the Point-Range Operation Tradeoff for In-Memory B-TreesProceedings of the VLDB Endowment10.14778/3611479.361150216:11(2976-2989)Online publication date: 24-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '19: The 31st ACM Symposium on Parallelism in Algorithms and Architectures
June 2019
410 pages
ISBN:9781450361842
DOI:10.1145/3323165
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • EATCS: European Association for Theoretical Computer Science

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BTree
  2. DAM
  3. LSM
  4. NVME
  5. PDAM
  6. SSD
  7. external-memory algorithms
  8. write-optimization

Qualifiers

  • Research-article

Funding Sources

Conference

SPAA '19

Acceptance Rates

SPAA '19 Paper Acceptance Rate 34 of 109 submissions, 31%;
Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)146
  • Downloads (Last 6 weeks)19
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CPMA: An Efficient Batch-Parallel Compressed Set Without PointersProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638492(348-363)Online publication date: 2-Mar-2024
  • (2024)Brief Announcement: Root-to-Leaf Scheduling in Write-Optimized TreesProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660514(475-477)Online publication date: 17-Jun-2024
  • (2023)BP-Tree: Overcoming the Point-Range Operation Tradeoff for In-Memory B-TreesProceedings of the VLDB Endowment10.14778/3611479.361150216:11(2976-2989)Online publication date: 24-Aug-2023
  • (2022)Kangaroo: Theory and Practice of Caching Billions of Tiny Objects on FlashACM Transactions on Storage10.1145/3542928Online publication date: 13-Jun-2022
  • (2022)Automatic HBM ManagementProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538570(147-159)Online publication date: 11-Jul-2022
  • (2021)Synbit: synthesizing bidirectional programs using unidirectional sketchesProceedings of the ACM on Programming Languages10.1145/34854825:OOPSLA(1-31)Online publication date: 15-Oct-2021
  • (2021)SecRSL: security separation logic for C11 release-acquire concurrencyProceedings of the ACM on Programming Languages10.1145/34854765:OOPSLA(1-26)Online publication date: 15-Oct-2021
  • (2021)On Directed Densest Subgraph DiscoveryACM Transactions on Database Systems10.1145/348394046:4(1-45)Online publication date: 15-Nov-2021
  • (2021)KangarooProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483568(243-262)Online publication date: 26-Oct-2021
  • (2021)Timely Reporting of Heavy Hitters Using External MemoryACM Transactions on Database Systems10.1145/347239246:4(1-35)Online publication date: 15-Nov-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media