Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

CORES: Towards Scan-Optimized Columnar Storage for Nested Records

Published: 26 June 2019 Publication History

Abstract

The relatively high cost of record deserialization is increasingly becoming the bottleneck of column-based storage systems in tree-structured applications [58]. Due to record transformation in the storage layer, unnecessary processing costs derived from fields and rows irrelevant to queries may be very heavy in nested schemas, significantly wasting the computational resources in large-scale analytical workloads. This leads to the question of how to reduce both the deserialization and IO costs of queries with highly selective filters following arbitrary paths in a nested schema.
We present CORES (Column-Oriented Regeneration Embedding Scheme) to push highly selective filters down into column-based storage engines, where each filter consists of several filtering conditions on a field. By applying highly selective filters in the storage layer, we demonstrate that both the deserialization and IO costs could be significantly reduced. We show how to introduce fine-grained composition on filtering results. We generalize this technique by two pair-wise operations, rollup and drilldown, such that a series of conjunctive filters can effectively deliver their payloads in nested schema. The proposed methods are implemented on an open-source platform. For practical purposes, we highlight how to build a column storage engine and how to drive a query efficiently based on a cost model. We apply this design to the nested relational model especially when hierarchical entities are frequently required by ad hoc queries. The experiments, including a real workload and the modified TPCH benchmark, demonstrate that CORES improves the performance by 0.7×--26.9× compared to state-of-the-art platforms in scan-intensive workloads.

References

[1]
Apache. 2017. Apache Hive TM. Retrieved June 13, 2019 from https://hive.apache.org
[2]
Apache. 2017. Apache Parquet. Retrieved June 13, 2019 from https://parquet.apache.org.
[3]
Apache. 2017. Apache Spark. Retrieved June 13, 2019 from https://spark.apache.org.
[4]
Apache. 2017. Apache Tez. Retrieved June 13, 2019 from https://tez.apache.org.
[5]
Apache. 2018. Apache AsterixDB. Retrieved June 13, 2019 from https://asterixdb.apache.org.
[6]
Apache. 2018. Apache Avro. Retrieved June 13, 2019 from https://avro.apache.org.
[7]
Google. 2017. Protocol buffer. Retrieved June 13, 2019 from http://code.google.com/p/protobuf/.
[8]
Yang Li. 2018. Cores. Retrieved June 13, 2019 from https://github.com/lwhay/cores.
[9]
NCBI. 2018. Retrieved June 13, 2019 from http://www.ncbi.nlm.nih.gov.
[10]
TPC. 2017. TPC-H benchmark. Retrieved June 13, 2019 from http://www.tpc.org/tpch.
[11]
Foto N. Afrati, Dan Delorey, Mosha Pasumansky, and Jeffrey D. Ullman. 2014. Storing and querying tree-structured records in Dremel. PVLDB 7, 12 (2014), 1131--1142.
[12]
Anastassia Ailamaki, David J. Dewitt, and Mark D. Hill. 2002. Data Page Layouts for Relational Databases on Deep Memory Hierarchies. Springer-Verlag New York, Inc. 198--215.
[13]
Sattam Alsubaiee, Yasser Altowim, et al. 2014. AsterixDB: a scalable, open source BDMS. PVLDB 7, 14 (2014), 1905--1916.
[14]
Sattam Alsubaiee, Alexander Behm, Vinayak R. Borkar, et al. 2014. Storage management in AsterixDB. PVLDB 7, 10 (2014), 841--852.
[15]
Gopi Attaluri, Shaorong Liu, and Guy M. Lohman. 2013. DB2 with BLU acceleration: So much more than just a column store. Proceedings of the VLDB Endowment 6, 11 (2013), 1080--1091.
[16]
François Bancilhon, Philippe Richard, and Michel Scholl. 1982. On line processing of compacted relations. In Proceedings of the 8th International Conference on Very Large Data Bases. 263--269.
[17]
Babak Behzad, Huong Vu Thanh Luu, et al. 2013. Taming parallel I/O complexity with auto-tuning. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis. 68--79.
[18]
Kevin Beyer and Raghu Ramakrishnan. 1999. Bottom-up computation of sparse and iceberg CUBEs. Sigmod Record 28, 2 (1999), 359--370.
[19]
Medha Bhadkamkar, Fernando Farfan, Vagelis Hristidis, and Raju Rangaswami. 2009. Storing semi-structured data on disk drives. Trans. Storage 5, 2, Article 6 (June 2009), 35 pages.
[20]
Peter Boncz, Torsten Grust, Maurice Van Keulen, Stefan Manegold, Jan Rittinger, and Jens Teubner. 2006. MonetDB/XQuery: A fast XQuery processor powered by a relational engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 479--490.
[21]
Vinayak Borkar, Michael Carey, Raman Grover, et al. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing. In Proceedings of the IEEE International Conference on Data Engineering. 1151--1162.
[22]
C. Chasseur, Yinan Li, and J. M. Patel. 2013. Enabling JSON document stores in relational systems (long version). In Proceedings of the International Workshop on the Web and Databases. 1--16.
[23]
Shuo-Han Chen, Tseng-Yi Chen, Yuan-Hao Chang, Hsin-Wen Wei, and Wei-Kuan Shih. 2018. UnistorFS: A union storage file system design for resource sharing between memory and storage on persistent RAM-based systems. ACM Trans. Storage 14, 1, Article 3 (Feb. 2018), 22 pages.
[24]
Douglas W. Comer and Philip S. Yu. 1987. A vertical partitioning algorithm for relational databases. In Proceedings of the IEEE International Conference on Data Engineering. 30--35.
[25]
Graham Cormode, Minos Garofalakis, et al. 2012. Synopses for massive data: Samples, histograms, wavelets, sketches. Found. 8 Trends Datab. 4, 1 (2012), 1--294.
[26]
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 10.
[27]
M. J. Egenhofer. 1994. Spatial SQL: A query and presentation language. IEEE Trans. Knowl. Data Eng. 6, 1 (1994), 86--95.
[28]
Avrilia Floratou and Umar Farooq Minhas. 2014. SQL-on-Hadoop: Full circle back to shared-nothing database architectures. Proceedings of the VLDB Endowment 7, 12 (Jan. 2014), 1295--1306.
[29]
Raúl Gracia-Tinedo, Josep Sampé, et al. 2017. Crystal: Software-defined storage for multi-tenant object stores. In Proceedings of the USENIX Conference on File and Storage Technologies.
[30]
Bin He, Hui I. Hsiao, Ziyang Liu, Yu Huang, and Yi Chen. 2012. Efficient iceberg query evaluation using compressed bitmap index. IEEE Trans. Knowl. Data Eng. 24, 9 (2012), 1570--1583.
[31]
Jianfeng Jia, Chen Li, and Michael J. Carey. 2017. Drum: A rhythmic approach to interactive analytics on large data. In Proceedings of the IEEE International Conference on Big Data.
[32]
Martin Kaufmann and Donald Kossmann. 2013. Storing and processing temporal data in a main memory column store. In Proceedings of the VLDB Endowment 6, 12 (2013), 1444--1449.
[33]
Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, Amit Ganesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp, and Teck Hua Lee. 2015. Oracle database in-memory: A dual format in-memory database. In Proceedings of the IEEE International Conference on Data Engineering. 1253--1258.
[34]
Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, et al. 2012. The vertica analytic database: C-store 7 years later. PVLDB 5, 12 (2012), 1790--1801.
[35]
Eunji Lee and Hyokyung Bahn. 2014. Caching strategies for high-performance storage media. Trans. Storage 10, 3, Article 11 (Aug. 2014), 22 pages.
[36]
Daniel Lemire, Robert Godin, et al. 2016. Optimizing Druid with Roaring bitmaps. In Proceedings of the International Database Engineering 8 Applications Symposium. 77--86.
[37]
Hang Liu and H. Howie Huang. 2017. Graphene: Fine-grained IO management for graph computing. In Proceedings of the USENIX Conference on File and Storage Technologies. 285--299.
[38]
Zhen Hua Liu, Beda Hammerschmidt, and Doug Mcmahon. 2014. JSON data management: Supporting schema-less development in RDBMS. In Proceedings of the ACM SIGMOD International Conference on Management of Data 7, 2 (2014), 1247--1258.
[39]
Peng Lu, Sai Wu, Lidan Shou, and Kian-Lee Tan. 2013. An efficient and compact indexing scheme for large-scale data store. In Proceedings of the IEEE International Conference on Data Engineering. 326--337.
[40]
Sagar S. Mane and M. Emmanuel. 2015. Review and comparative study of bitmap indexing techniques. Data Mining Knowl. Eng. 7, 1 (2015).
[41]
Sergey Melnik, Andrey Gubarev, Jing Jing Long, et al. 2010. Dremel: Interactive analysis of web-scale datasets. Commun. ACM 3, 12 (2010), 114--123.
[42]
Jan Paredaens and Dirk Van Gucht. 1988. Possibilities and limitations of using flat operators in nested algebra expressions. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 29--38.
[43]
H. B. Paul, H. J. Schek, and M. H. Scholl. 1987. Architecture and implementation of the Darmstadt database kernel system. In Proceedings of the ACM SIGMOD Conference. 196--207.
[44]
Mark A. Roth, Herry F. Korth, and Abraham Silberschatz. 1988. Extended algebra and calculus for nested relational databases. ACM Trans. Datab. Syst. 13, 4 (1988), 389--417.
[45]
Michael Rys and Gerhard Weikum. 1994. Heuristic optimization of speedup and benefit/cost for parallel database scans on shared-memory multiprocessors. In Proceedings of the International Parallel Processing Symposium. 894--901.
[46]
Marc H. Scholl, H.-Bernhard Paul, and Hans-Jörg Schek. 1987. Supporting flat relations by a nested relational kernel. In Proceedings of the International Conference on Very Large Data Bases. 137--146.
[47]
Anil Shanbhag, Alekh Jindal, Yi Lu, and Samuel Madden. 2016. A moeba: A shape changing storage system for big data. PVLDB 9, 13 (2016), 1569--1572.
[48]
Jeff Shute, Radek Vingralek, et al. 2013. F1: A distributed SQL database that scales. Proceedings of the VLDB Endowment 6, 11 (2013), 1068--1079.
[49]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop distributed file system. In Proceedings of the IEEE Symposium on MASS Storage Systems and Technologies. 1--10.
[50]
Laure Soulier and Lynda Tamine. 2017. On the collaboration support in information retrieval. ACM Comput. Surv. 50, 4, Article 51 (2017), 34 pages.
[51]
Kurt Stockinger. 2001. Design and implementation of bitmap indices for scientific data. In Proceedings of the International Database Engineering and Applications Symposium. 47--57.
[52]
Mike Stonebraker, Daniel J. Abadi, Adam Batkin, et al. 2005. C-store: A column-oriented DBMS. In Proceedings of the International Conference on Very Large Data Bases. 553--564.
[53]
Liwen Sun, Sanjay Krishnan, Reynold S. Xin, and Michael J. Franklin. 2014. A partitioning framework for aggressive data skipping. PVLDB 7, 13 (2014), 1617--1620.
[54]
Yuliang Sun, Yu Wang, and Huazhong Yang. 2018. Bidirectional database storage and SQL query exploiting RRAM-based process-in-memory structure. ACM Trans. Storage 14, 1, Article 8 (March 2018), 19 pages.
[55]
Daniel Tahara, Thaddeus Diamond, and Daniel J. Abadi. 2014. Sinew: A SQL system for multi-structured data. In Proceedings of ACM SIGMOD Conference. 815--826.
[56]
Aubrey L. Tatarowicz, Carlo Curino, Evan P. C. Jones, and Sam Madden. 2012. Lookup tables: Fine-grained partitioning for distributed databases. In Proceedings of the IEEE International Conference on Data Engineering. 102--113.
[57]
Sebastian Wandelt, Dong Deng, Stefan Gerdjikov, et al. 2014. State-of-the-art in string similarity search and join. SIGMOD Record 43, 1 (2014), 64--76.
[58]
Zhiyi Wang and Shimin Chen. 2017. Exploiting common patterns for tree-structured data. In Proceedings of the ACM SIGMOD Conference. 883--896.
[59]
Brent Welch, Marc Unangst, Zainul Abbasi, et al. 2008. Scalable performance of the Panasas parallel file system. In Proceedings of the USENIX Conference on File and Storage Technologies. 2.
[60]
Chin-Hsien Wu and Kuo-Yi Huang. 2015. Data sorting in flash memory. Trans. Storage 11, 2, Article 7 (March 2015), 25 pages.
[61]
Pengfei Xuan, Walter B. Ligon, Pradip K. Srimani, Rong Ge, and Feng Luo. 2016. Accelerating big data analytics on HPC clusters using two-level storage. Parallel Comput. 61 (2016).
[62]
Atsuo Yoshitaka and Tadao Ichikawa. 1999. A survey on content-based retrieval for multimedia databases. IEEE Trans. Knowl. Data Eng. 11, 1 (1999), 81--93.
[63]
Yuan Yu, Michael Isard, Dennis Fetterly, et al. 2009. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 1--14.
[64]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, et al. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 2.
[65]
Yansong Zhang, Xuan Zhou, Ying Zhang, et al. 2016. Virtual denormalization via array index reference for main memory OLAP. IEEE Trans. Knowl. Data Eng. 28, 4 (2016), 1061--1074.

Cited By

View all
  • (2023)Accelerating Columnar Storage Based on Asynchronous Skipping StrategyBig Data Research10.1016/j.bdr.2022.10035231(100352)Online publication date: Feb-2023
  • (2022)In-Memory Indexed Caching for Distributed Data Processing2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00019(104-114)Online publication date: May-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 15, Issue 3
August 2019
173 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3336116
  • Editor:
  • Sam H. Noh
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2019
Accepted: 01 March 2019
Revised: 01 November 2018
Received: 01 April 2018
Published in TOS Volume 15, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Columnar storage
  2. bitset composition
  3. filtering-pushdown
  4. nested schema
  5. sequential scan
  6. skipping scheme

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)7
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Accelerating Columnar Storage Based on Asynchronous Skipping StrategyBig Data Research10.1016/j.bdr.2022.10035231(100352)Online publication date: Feb-2023
  • (2022)In-Memory Indexed Caching for Distributed Data Processing2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00019(104-114)Online publication date: May-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media