Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2912569acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Design Tradeoffs of Data Access Methods

Published: 26 June 2016 Publication History

Abstract

Database researchers and practitioners have been building methods to store, access, and update data for more than five decades. Designing access methods has been a constant effort to adapt to the ever changing underlying hardware and workload requirements. The recent explosion in data system designs - including, in addition to traditional SQL systems, NoSQL, NewSQL, and other relational and non-relational systems - makes understanding the tradeoffs of designing access methods more important than ever. Access methods are at the core of any new data system. In this tutorial we survey recent developments in access method design and we place them in the design space where each approach focuses primarily on one or a subset of read performance, update performance, and memory utilization. We discuss how to utilize designs and lessons-learned from past research. In addition, we discuss new ideas on how to build access methods that have tunable behavior, as well as, what is the scenery of open research problems.

References

[1]
D. J. Abadi et al. The Design and Implementation of Modern Column-Oriented Database Systems. Found. Trends Databases, 5(3):197--280, 2013.
[2]
D. Agrawal, D. Ganesan, R. Sitaraman, Y. Diao, and S. Singh. Lazy-Adaptive Tree: An Optimized Index Structure for Flash Devices. PVLDB, 2(1):361--372, 2009.
[3]
M. Athanassoulis and A. Ailamaki. BF-Tree: Approximate Tree Indexing. PVLDB, 7(14):1881--1892, 2014.
[4]
M. Athanassoulis, S. Chen, A. Ailamaki, P. B. Gibbons, and R. Stoica. MaSM: Efficient Online Updates in Data Warehouses. In SIGMOD, 2011.
[5]
M. Athanassoulis, S. Chen, A. Ailamaki, P. B. Gibbons, and R. Stoica. Online Updates on Data Warehouses via Judicious Use of Solid-State Storage. TODS, 40(1), 2015.
[6]
M. Athanassoulis, M. S. Kester, L. M. Maas, R. Stoica, S. Idreos, A. Ailamaki, and M. Callaghan. Designing Access Methods: The RUM Conjecture. In EDBT, 2016.
[7]
M. Athanassoulis, Z. Yan, and S. Idreos. UpBit: Scalable In-Memory Updatable Bitmap Indexing. In SIGMOD, 2016.
[8]
R. Bayer and K. Unterauer. Prefix B-trees. TODS, 2(1):11--26, 1977.
[9]
M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-Oblivious Streaming B-trees. In SPAA, 2007.
[10]
M. A. Bender et al. Don't Thrash: How to Cache Your Hash on Flash. PVLDB, 5(11):1627--1637, 2012.
[11]
B. H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. CACM, 13(7):422--426, 1970.
[12]
C. Böhm, S. Berchtold, amd D. A. Keim. Searching in High-dimensional Spaces: Index Structures for Improving the Performance of Multimedia Databases. Comp. Surv., 33(3):322--373, 2001.
[13]
N. Bruno and S. Chaudhuri. An Online Approach to Physical Design Tuning. In ICDE, 2007.
[14]
M. Cain and K. Milligan. IBM DB2 for i indexing methods and strategies. IBM White Paper, 2011.
[15]
C.-Y. Chan and Y. E. Ioannidis. Bitmap index design and evaluation. SIGMOD Rec., 27(2):355--366, 1998.
[16]
C.-Y. Chan and Y. E. Ioannidis. An efficient bitmap encoding scheme for selection queries. SIGMOD Rec., 28(2):215--226, 1999.
[17]
F. Chang et al. Bigtable: A Distributed Storage System for Structured Data. In OSDI, 2006.
[18]
S. Chaudhuri and U. Dayal. An Overview of Data Warehousing and OLAP Technology. SIGMOD Rec., 26(1):65--74, 1997.
[19]
S. Chaudhuri and V. R. Narasayya. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB, 1997.
[20]
B. Chazelle and L. Guibas. Fractional Cascading: I. A Data Structuring Technique. Algorithmica, 1(2):133--162, 1986.
[21]
S. Chen, P. B. Gibbons, T. C. Mowry, and G. Valentin. Fractal prefetching B+-Trees. In SIGMOD, 2002.
[22]
G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005.
[23]
P. Cudré-Mauroux, E. Wu, and S. Madden. The Case for RodentStore: An Adaptive, Declarative Storage System. In CIDR, 2009.
[24]
J. Dittrich and A. Jindal. Towards a One Size Fits All Database Architecture. In CIDR, 2011.
[25]
J. Dongarra, P. Koev, X. Li, J. Demmel, and H. van der Vorst. 10. Common Issues. In Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, page 22. 2000.
[26]
B. Fan, D. G. Andersen, M. Kaminsky, and M. Mitzenmacher. Cuckoo Filter: Practically Better Than Bloom. In CoNEXT, 2014.
[27]
P. Francisco. The Netezza Data Appliance Architecture: A Platform for High Performance Data Warehousing and Analytics. IBM Redbooks, 2011.
[28]
E. Fredkin. Trie memory. CACM, 3(9):490--499, 1960.
[29]
G. Graefe. Sorting And Indexing With Partitioned B-Trees. In CIDR, 2003.
[30]
G. Graefe. Modern B-Tree Techniques. Found. Trends Databases, 3(4):203--402, 2011.
[31]
G. Graefe, F. Halim, S. Idreos, H. Kuno, and S. Manegold. Concurrency control for adaptive indexing. PVLDB, 5(7):656--667, 2012.
[32]
G. Graefe, F. Halim, S. Idreos, H. A. Kuno, S. Manegold, and B. Seeger. Transactional support for adaptive indexing. VLDBJ, 23(2):303--328, 2014.
[33]
G. Graefe and H. Kuno. Self-selecting, self-tuning, incrementally optimized indexes. In EDBT, 2010.
[34]
A. Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. In SIGMOD, 1984.
[35]
F. Halim, S. Idreos, P. Karras, and R. H. C. Yap. Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores. PVLDB, 5(6):502--513, 2012.
[36]
J. M. Hellerstein, M. Stonebraker, and J. R. Hamilton. Architecture of a Database System. Found. Trends Databases, 1(2):141--259, 2007.
[37]
S. Héman, M. Zukowski, and N. J. Nes. Positional update handling in column stores. In SIGMOD, 2010.
[38]
S. Idreos, M. L. Kersten, and S. Manegold. Database Cracking. In CIDR, 2007.
[39]
S. Idreos, M. L. Kersten, and S. Manegold. Updating a cracked database. In SIGMOD, 2007.
[40]
S. Idreos, M. L. Kersten, and S. Manegold. Self-organizing tuple reconstruction in column-stores. In SIGMOD, 2009.
[41]
S. Idreos, S. Manegold, H. Kuno, and G. Graefe. Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores. PVLDB, 4(9):586--597, 2011.
[42]
H. V. Jagadish, P. P. S. Narayan, S. Seshadri, S. Sudarshan, and R. Kanneganti. Incremental Organization for Data Recording and Warehousing. In VLDB, 1997.
[43]
O. Kennedy and L. Ziarek. Just-In-Time Data Structures. In CIDR, 2015.
[44]
E. J. Keogh, K. Chakrabarti, M. J. Pazzani, and S. Mehrotra. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. KAIS, 3(3):263--286, 2001.
[45]
B. C. Kuszmaul. A Comparison of Fractal Trees to Log-Structured Merge (LSM) Trees. White Paper, 2014.
[46]
A. Lakshman and P. Malik. Cassandra - A Decentralized Structured Storage System. SIGOPS Op. Sys. Rev., 44(2):35--40, 2010.
[47]
V. Leis, A. Kemper, and T. Neumann. The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. In ICDE, 2013.
[48]
J. J. Levandoski, D. B. Lomet, and S. Sengupta. The Bw-Tree: A B-tree for new hardware platforms. In ICDE, 2013.
[49]
Y. Li, B. He, J. Yang, Q. Luo, K. Yi, and R. J. Yang. Tree Indexing on Solid State Drives. PVLDB, 3(1--2):1195--1206, 2010.
[50]
H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky. SILT: a memory-efficient, high-performance key-value store. In SOSP, 2011.
[51]
J. Lin, E. J. Keogh, S. Lonardi, and B. Y.-c. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In DMKD, 2003.
[52]
W. Litwin and D. B. Lomet. The Bounded Disorder Access Method. In ICDE, 1986.
[53]
D. B. Lomet. A simple bounded disorder file organization with good performance. TODS, 13(4):525--551, 1988.
[54]
P. Macko, V. J. Marathe, D. W. Margo, and M. I. Seltzer. LLAMA: Efficient graph analytics using Large Multiversioned Arrays. In ICDE, 2015.
[55]
R. MacNicol and B. French. Sybase IQ Multiplex - Designed For Analytics. In VLDB, 2004.
[56]
Y. Manolopoulos, A. Nanopoulos, A. N. Papadopoulos, and Y. Theodoridis. R-Trees: Theory and Applications. Springer, 2006.
[57]
Y. Mao, E. Kohler, and R. T. Morris. Cache craftiness for fast multicore key-value storage. In EuroSys, 2012.
[58]
G. Moerkotte. Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing. In VLDB, 1998.
[59]
C. Mohan. Tutorial: An In-Depth Look at Modern Database Systems. In EDBT, 2014.
[60]
M. F. Mokbel, T. M. Ghanem, and W. G. Aref. Spatio-Temporal Access Methods. IEEE DEBULL, 26(2):40--49, 2003.
[61]
D. R. Morrison. PATRICIA - Practical Algorithm To Retrieve Information Coded in Alphanumeric. Journal of the ACM, 15(4):514--534, 1968.
[62]
L.-V. Nguyen-Dinh, W. G. Aref, and M. F. Mokbel. Spatio-Temporal Access Methods: Part 2 (2003 - 2010). IEEE DEBULL, 33(2):46--55, 2010.
[63]
P. E. O'Neil. Model 204 Architecture and Performance. In HPTS, 1987.
[64]
P. E. O'Neil. The SB-Tree: An Index-Sequential Structure for High-Performance Sequential Access. Acta Informatica, 29(3):241--265, 1992.
[65]
P. E. O'Neil, E. Cheng, D. Gawlick, and E. J. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, 1996.
[66]
P. E. O'Neil and D. Quass. Improved query performance with variant indexes. SIGMOD Rec., 26(2):38--49, 1997.
[67]
W. Pugh. Skip lists: a probabilistic alternative to balanced trees. CACM, 33(6):668--676, 1990.
[68]
V. Raman et al. DB2 with BLU acceleration: so much more than just a column store. PVLDB, 6(11):1080--1091, 2013.
[69]
J. Rao and K. A. Ross. Making B+- trees cache conscious in main memory. In SIGMOD, 2000.
[70]
P. Russom. High-Performance Data Warehousing. TDWI Best Practices Report, 2012.
[71]
F. M. Schuhknecht, A. Jindal, and J. Dittrich. The Uncracked Pieces in Database Cracking. PVLDB, 7(2):97--108, 2013.
[72]
R. Sears and R. Ramakrishnan. bLSM: A General Purpose Log Structured Merge Tree. In SIGMOD, 2012.
[73]
D. G. Severance and G. M. Lohman. Differential files: their application to the maintenance of large databases. TODS, 1(3):256--267, 1976.
[74]
V. Sharma. Bitmap Index vs. B-tree Index: Which and When? Oracle White Paper, 2005.
[75]
P. Shetty, R. P. Spillane, R. Malpani, B. Andrews, J. Seyster, and E. Zadok. Building Workload-Independent Storage with VT-trees. In FAST, 2013.
[76]
J. Shieh and E. J. Keogh. iSAX: indexing and mining terabyte sized time series. In SIGKDD, 2008.
[77]
L. Sidirourgos and M. L. Kersten. Column Imprints: A Secondary Index Structure. In SIGMOD, 2013.
[78]
K. Stockinger. Bitmap Indices for Speeding Up High-Dimensional Data Analysis. In DEXA, 2002.
[79]
M. Stonebraker et al. C-Store: A Column-oriented DBMS. In VLDB, 2005.
[80]
L. Sun, M. J. Franklin, S. Krishnan, and R. S. Xin. Fine-grained Partitioning for Aggressive Data Skipping. In SIGMOD, 2014.
[81]
P. Wang et al. An Efficient Design and Implementation of LSM-Tree based Key-Value Store on Open-Channel SSD. In EuroSys, 2014.
[82]
H. K. T. Wong, H.-F. Liu, F. Olken, D. Rotem, and L. Wong. Bit Transposed Files. In VLDB, 1985.
[83]
K. Wu et al. FastBit: interactively searching massive data. J. of Physics: Conference Series, 180(1):012053, 2009.
[84]
M.-C. Wu and A. P. Buchmann. Encoded Bitmap Indexing for Data Warehouses. In ICDE, 1998.
[85]
X. Wu, Y. Xu, Z. Shao, and S. Jiang. LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items. In ATC, 2015.
[86]
R. S. Xin et al. Shark: SQL and Rich Analytics at Scale. In SIGMOD, 2013.
[87]
B.-K. Yi and C. Faloutsos. Fast Time Sequence Indexing for Arbitrary Lp Norms. In VLDB, 2000.
[88]
K. Zoumpatianos, S. Idreos, and T. Palpanas. Indexing for interactive exploration of big data series. In SIGMOD, 2014.

Cited By

View all
  • (2024)Optimizing Data Retrieval from Secondary Storage with a Proactive Intermediate CacheSoutheastCon 202410.1109/SoutheastCon52093.2024.10500105(216-221)Online publication date: 15-Mar-2024
  • (2023)Fine-Tuning Data Structures for Query ProcessingProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580016(149-161)Online publication date: 17-Feb-2023
  • (2023)ObjDedup: High-Throughput Object Storage Layer for Backup Systems With Block-Level DeduplicationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.325050134:7(2180-2197)Online publication date: Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. access methods
  2. approximate indexing
  3. cache optimizations
  4. continuous reorganization
  5. data skipping
  6. design tradeoffs
  7. differential updates
  8. log-structure design
  9. logarithmic structure
  10. read-optimized indexing
  11. rum tradeoffs
  12. space-efficient indexing
  13. update-optimized indexing

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)412
  • Downloads (Last 6 weeks)50
Reflects downloads up to 11 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing Data Retrieval from Secondary Storage with a Proactive Intermediate CacheSoutheastCon 202410.1109/SoutheastCon52093.2024.10500105(216-221)Online publication date: 15-Mar-2024
  • (2023)Fine-Tuning Data Structures for Query ProcessingProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580016(149-161)Online publication date: 17-Feb-2023
  • (2023)ObjDedup: High-Throughput Object Storage Layer for Backup Systems With Block-Level DeduplicationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.325050134:7(2180-2197)Online publication date: Jul-2023
  • (2023)The LSM Design Space and its Read Optimizations2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00273(3578-3584)Online publication date: Apr-2023
  • (2022)Dissecting, Designing, and Optimizing LSM-based Data StoresProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3522563(2489-2497)Online publication date: 10-Jun-2022
  • (2022)A design space for RDF data representationsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00725-x31:2(347-373)Online publication date: 21-Jan-2022
  • (2021)Constructing and analyzing the LSM compaction design spaceProceedings of the VLDB Endowment10.14778/3476249.347627414:11(2216-2229)Online publication date: 27-Oct-2021
  • (2019)JungleProceedings of the 11th USENIX Conference on Hot Topics in Storage and File Systems10.5555/3357062.3357074(9-9)Online publication date: 8-Jul-2019
  • (2019)Optimal column layout for hybrid workloadsProceedings of the VLDB Endowment10.14778/3358701.335870712:13(2393-2407)Online publication date: 1-Sep-2019
  • (2019)FITing-TreeProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3319860(1189-1206)Online publication date: 25-Jun-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media