research-article

CORES: Towards Scan-Optimized Columnar Storage for Nested Records

Authors:

Yanxiang HeAuthors Info & Claims

ACM Transactions on Storage (TOS), Volume 15, Issue 3

Article No.: 16, Pages 1 - 46

https://doi.org/10.1145/3321704

Published: 26 June 2019 Publication History

Abstract

The relatively high cost of record deserialization is increasingly becoming the bottleneck of column-based storage systems in tree-structured applications [58]. Due to record transformation in the storage layer, unnecessary processing costs derived from fields and rows irrelevant to queries may be very heavy in nested schemas, significantly wasting the computational resources in large-scale analytical workloads. This leads to the question of how to reduce both the deserialization and IO costs of queries with highly selective filters following arbitrary paths in a nested schema.

We present CORES (Column-Oriented Regeneration Embedding Scheme) to push highly selective filters down into column-based storage engines, where each filter consists of several filtering conditions on a field. By applying highly selective filters in the storage layer, we demonstrate that both the deserialization and IO costs could be significantly reduced. We show how to introduce fine-grained composition on filtering results. We generalize this technique by two pair-wise operations, rollup and drilldown, such that a series of conjunctive filters can effectively deliver their payloads in nested schema. The proposed methods are implemented on an open-source platform. For practical purposes, we highlight how to build a column storage engine and how to drive a query efficiently based on a cost model. We apply this design to the nested relational model especially when hierarchical entities are frequently required by ad hoc queries. The experiments, including a real workload and the modified TPCH benchmark, demonstrate that CORES improves the performance by 0.7×--26.9× compared to state-of-the-art platforms in scan-intensive workloads.

References

[1]

Apache. 2017. Apache Hive TM. Retrieved June 13, 2019 from https://hive.apache.org

[2]

Apache. 2017. Apache Parquet. Retrieved June 13, 2019 from https://parquet.apache.org.

[3]

Apache. 2017. Apache Spark. Retrieved June 13, 2019 from https://spark.apache.org.

[4]

Apache. 2017. Apache Tez. Retrieved June 13, 2019 from https://tez.apache.org.

[5]

Apache. 2018. Apache AsterixDB. Retrieved June 13, 2019 from https://asterixdb.apache.org.

[6]

Apache. 2018. Apache Avro. Retrieved June 13, 2019 from https://avro.apache.org.

[7]

Google. 2017. Protocol buffer. Retrieved June 13, 2019 from http://code.google.com/p/protobuf/.

[8]

Yang Li. 2018. Cores. Retrieved June 13, 2019 from https://github.com/lwhay/cores.

[9]

NCBI. 2018. Retrieved June 13, 2019 from http://www.ncbi.nlm.nih.gov.

[10]

TPC. 2017. TPC-H benchmark. Retrieved June 13, 2019 from http://www.tpc.org/tpch.

[11]

Foto N. Afrati, Dan Delorey, Mosha Pasumansky, and Jeffrey D. Ullman. 2014. Storing and querying tree-structured records in Dremel. PVLDB 7, 12 (2014), 1131--1142.

Digital Library

[12]

Anastassia Ailamaki, David J. Dewitt, and Mark D. Hill. 2002. Data Page Layouts for Relational Databases on Deep Memory Hierarchies. Springer-Verlag New York, Inc. 198--215.

[13]

Sattam Alsubaiee, Yasser Altowim, et al. 2014. AsterixDB: a scalable, open source BDMS. PVLDB 7, 14 (2014), 1905--1916.

Digital Library

[14]

Sattam Alsubaiee, Alexander Behm, Vinayak R. Borkar, et al. 2014. Storage management in AsterixDB. PVLDB 7, 10 (2014), 841--852.

Digital Library

[15]

Gopi Attaluri, Shaorong Liu, and Guy M. Lohman. 2013. DB2 with BLU acceleration: So much more than just a column store. Proceedings of the VLDB Endowment 6, 11 (2013), 1080--1091.

Digital Library

[16]

François Bancilhon, Philippe Richard, and Michel Scholl. 1982. On line processing of compacted relations. In Proceedings of the 8th International Conference on Very Large Data Bases. 263--269.

Digital Library

[17]

Babak Behzad, Huong Vu Thanh Luu, et al. 2013. Taming parallel I/O complexity with auto-tuning. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis. 68--79.

Digital Library

[18]

Kevin Beyer and Raghu Ramakrishnan. 1999. Bottom-up computation of sparse and iceberg CUBEs. Sigmod Record 28, 2 (1999), 359--370.

Digital Library

[19]

Medha Bhadkamkar, Fernando Farfan, Vagelis Hristidis, and Raju Rangaswami. 2009. Storing semi-structured data on disk drives. Trans. Storage 5, 2, Article 6 (June 2009), 35 pages.

Digital Library

[20]

Peter Boncz, Torsten Grust, Maurice Van Keulen, Stefan Manegold, Jan Rittinger, and Jens Teubner. 2006. MonetDB/XQuery: A fast XQuery processor powered by a relational engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 479--490.

Digital Library

[21]

Vinayak Borkar, Michael Carey, Raman Grover, et al. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing. In Proceedings of the IEEE International Conference on Data Engineering. 1151--1162.

Digital Library

[22]

C. Chasseur, Yinan Li, and J. M. Patel. 2013. Enabling JSON document stores in relational systems (long version). In Proceedings of the International Workshop on the Web and Databases. 1--16.

[23]

Shuo-Han Chen, Tseng-Yi Chen, Yuan-Hao Chang, Hsin-Wen Wei, and Wei-Kuan Shih. 2018. UnistorFS: A union storage file system design for resource sharing between memory and storage on persistent RAM-based systems. ACM Trans. Storage 14, 1, Article 3 (Feb. 2018), 22 pages.

Digital Library

[24]

Douglas W. Comer and Philip S. Yu. 1987. A vertical partitioning algorithm for relational databases. In Proceedings of the IEEE International Conference on Data Engineering. 30--35.

[25]

Graham Cormode, Minos Garofalakis, et al. 2012. Synopses for massive data: Samples, histograms, wavelets, sketches. Found. 8 Trends Datab. 4, 1 (2012), 1--294.

Digital Library

[26]

Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 10.

Digital Library

[27]

M. J. Egenhofer. 1994. Spatial SQL: A query and presentation language. IEEE Trans. Knowl. Data Eng. 6, 1 (1994), 86--95.

Digital Library

[28]

Avrilia Floratou and Umar Farooq Minhas. 2014. SQL-on-Hadoop: Full circle back to shared-nothing database architectures. Proceedings of the VLDB Endowment 7, 12 (Jan. 2014), 1295--1306.

Digital Library

[29]

Raúl Gracia-Tinedo, Josep Sampé, et al. 2017. Crystal: Software-defined storage for multi-tenant object stores. In Proceedings of the USENIX Conference on File and Storage Technologies.

Digital Library

[30]

Bin He, Hui I. Hsiao, Ziyang Liu, Yu Huang, and Yi Chen. 2012. Efficient iceberg query evaluation using compressed bitmap index. IEEE Trans. Knowl. Data Eng. 24, 9 (2012), 1570--1583.

Digital Library

[31]

Jianfeng Jia, Chen Li, and Michael J. Carey. 2017. Drum: A rhythmic approach to interactive analytics on large data. In Proceedings of the IEEE International Conference on Big Data.

[32]

Martin Kaufmann and Donald Kossmann. 2013. Storing and processing temporal data in a main memory column store. In Proceedings of the VLDB Endowment 6, 12 (2013), 1444--1449.

Digital Library

[33]

Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, Amit Ganesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp, and Teck Hua Lee. 2015. Oracle database in-memory: A dual format in-memory database. In Proceedings of the IEEE International Conference on Data Engineering. 1253--1258.

[34]

Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, et al. 2012. The vertica analytic database: C-store 7 years later. PVLDB 5, 12 (2012), 1790--1801.

Digital Library

[35]

Eunji Lee and Hyokyung Bahn. 2014. Caching strategies for high-performance storage media. Trans. Storage 10, 3, Article 11 (Aug. 2014), 22 pages.

Digital Library

[36]

Daniel Lemire, Robert Godin, et al. 2016. Optimizing Druid with Roaring bitmaps. In Proceedings of the International Database Engineering 8 Applications Symposium. 77--86.

Digital Library

[37]

Hang Liu and H. Howie Huang. 2017. Graphene: Fine-grained IO management for graph computing. In Proceedings of the USENIX Conference on File and Storage Technologies. 285--299.

Digital Library

[38]

Zhen Hua Liu, Beda Hammerschmidt, and Doug Mcmahon. 2014. JSON data management: Supporting schema-less development in RDBMS. In Proceedings of the ACM SIGMOD International Conference on Management of Data 7, 2 (2014), 1247--1258.

Digital Library

[39]

Peng Lu, Sai Wu, Lidan Shou, and Kian-Lee Tan. 2013. An efficient and compact indexing scheme for large-scale data store. In Proceedings of the IEEE International Conference on Data Engineering. 326--337.

Digital Library

[40]

Sagar S. Mane and M. Emmanuel. 2015. Review and comparative study of bitmap indexing techniques. Data Mining Knowl. Eng. 7, 1 (2015).

[41]

Sergey Melnik, Andrey Gubarev, Jing Jing Long, et al. 2010. Dremel: Interactive analysis of web-scale datasets. Commun. ACM 3, 12 (2010), 114--123.

[42]

Jan Paredaens and Dirk Van Gucht. 1988. Possibilities and limitations of using flat operators in nested algebra expressions. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 29--38.

Digital Library

[43]

H. B. Paul, H. J. Schek, and M. H. Scholl. 1987. Architecture and implementation of the Darmstadt database kernel system. In Proceedings of the ACM SIGMOD Conference. 196--207.

Digital Library

[44]

Mark A. Roth, Herry F. Korth, and Abraham Silberschatz. 1988. Extended algebra and calculus for nested relational databases. ACM Trans. Datab. Syst. 13, 4 (1988), 389--417.

Digital Library

[45]

Michael Rys and Gerhard Weikum. 1994. Heuristic optimization of speedup and benefit/cost for parallel database scans on shared-memory multiprocessors. In Proceedings of the International Parallel Processing Symposium. 894--901.

Digital Library

[46]

Marc H. Scholl, H.-Bernhard Paul, and Hans-Jörg Schek. 1987. Supporting flat relations by a nested relational kernel. In Proceedings of the International Conference on Very Large Data Bases. 137--146.

Digital Library

[47]

Anil Shanbhag, Alekh Jindal, Yi Lu, and Samuel Madden. 2016. A moeba: A shape changing storage system for big data. PVLDB 9, 13 (2016), 1569--1572.

Digital Library

[48]

Jeff Shute, Radek Vingralek, et al. 2013. F1: A distributed SQL database that scales. Proceedings of the VLDB Endowment 6, 11 (2013), 1068--1079.

Digital Library

[49]

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop distributed file system. In Proceedings of the IEEE Symposium on MASS Storage Systems and Technologies. 1--10.

Digital Library

[50]

Laure Soulier and Lynda Tamine. 2017. On the collaboration support in information retrieval. ACM Comput. Surv. 50, 4, Article 51 (2017), 34 pages.

Digital Library

[51]

Kurt Stockinger. 2001. Design and implementation of bitmap indices for scientific data. In Proceedings of the International Database Engineering and Applications Symposium. 47--57.

Digital Library

[52]

Mike Stonebraker, Daniel J. Abadi, Adam Batkin, et al. 2005. C-store: A column-oriented DBMS. In Proceedings of the International Conference on Very Large Data Bases. 553--564.

Digital Library

[53]

Liwen Sun, Sanjay Krishnan, Reynold S. Xin, and Michael J. Franklin. 2014. A partitioning framework for aggressive data skipping. PVLDB 7, 13 (2014), 1617--1620.

Digital Library

[54]

Yuliang Sun, Yu Wang, and Huazhong Yang. 2018. Bidirectional database storage and SQL query exploiting RRAM-based process-in-memory structure. ACM Trans. Storage 14, 1, Article 8 (March 2018), 19 pages.

Digital Library

[55]

Daniel Tahara, Thaddeus Diamond, and Daniel J. Abadi. 2014. Sinew: A SQL system for multi-structured data. In Proceedings of ACM SIGMOD Conference. 815--826.

Digital Library

[56]

Aubrey L. Tatarowicz, Carlo Curino, Evan P. C. Jones, and Sam Madden. 2012. Lookup tables: Fine-grained partitioning for distributed databases. In Proceedings of the IEEE International Conference on Data Engineering. 102--113.

Digital Library

[57]

Sebastian Wandelt, Dong Deng, Stefan Gerdjikov, et al. 2014. State-of-the-art in string similarity search and join. SIGMOD Record 43, 1 (2014), 64--76.

Digital Library

[58]

Zhiyi Wang and Shimin Chen. 2017. Exploiting common patterns for tree-structured data. In Proceedings of the ACM SIGMOD Conference. 883--896.

Digital Library

[59]

Brent Welch, Marc Unangst, Zainul Abbasi, et al. 2008. Scalable performance of the Panasas parallel file system. In Proceedings of the USENIX Conference on File and Storage Technologies. 2.

Digital Library

[60]

Chin-Hsien Wu and Kuo-Yi Huang. 2015. Data sorting in flash memory. Trans. Storage 11, 2, Article 7 (March 2015), 25 pages.

Digital Library

[61]

Pengfei Xuan, Walter B. Ligon, Pradip K. Srimani, Rong Ge, and Feng Luo. 2016. Accelerating big data analytics on HPC clusters using two-level storage. Parallel Comput. 61 (2016).

[62]

Atsuo Yoshitaka and Tadao Ichikawa. 1999. A survey on content-based retrieval for multimedia databases. IEEE Trans. Knowl. Data Eng. 11, 1 (1999), 81--93.

Digital Library

[63]

Yuan Yu, Michael Isard, Dennis Fetterly, et al. 2009. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 1--14.

Digital Library

[64]

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, et al. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 2.

Digital Library

[65]

Yansong Zhang, Xuan Zhou, Ying Zhang, et al. 2016. Virtual denormalization via array index reference for main memory OLAP. IEEE Trans. Knowl. Data Eng. 28, 4 (2016), 1061--1074.

Digital Library

Cited By

Li WYang ZDeng LCheng ZWen WHe Y(2023)Accelerating Columnar Storage Based on Asynchronous Skipping StrategyBig Data Research10.1016/j.bdr.2022.10035231(100352)Online publication date: Feb-2023
https://doi.org/10.1016/j.bdr.2022.100352
Uta AGhit BDave ARellermeyer JBoncz P(2022)In-Memory Indexed Caching for Distributed Data Processing2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00019(104-114)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00019

Index Terms

CORES: Towards Scan-Optimized Columnar Storage for Nested Records
1. Information systems
  1. Information storage systems
    1. Record storage systems
      1. Relational storage
        Column based storage
    2. Storage management
      1. Hierarchical storage management

Recommendations

Query processing techniques for solid state drives
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Solid state drives perform random reads more than 100x faster than traditional magnetic hard disks, while offering comparable sequential read and write bandwidth. Because of their potential to speed up applications, as well as their reduced power ...
A flash-based decomposition storage model
DASFAA'12: Proceedings of the 17th international conference on Database Systems for Advanced Applications

The traditional HDD-based columnar storage is an important technology to improve the performance of query-intensive database. However, some features of HDD weaken the advantages of columnar storage. In this paper, we study the advantages of SSD over HDD ...
Hockey: A Hybrid PMem-SSD Storage Engine for Analytical Database
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Storage engines for analytic databases are being developed to be setup with different devices for both cost price and performance considerations. Persistent Memory(PMem), as a novel storage device, also provides a new promising option for the hybrid ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage

ACM Transactions on Storage Volume 15, Issue 3

August 2019

173 pages

ISSN:1553-3077

EISSN:1553-3093

DOI:10.1145/3336116

Editor:
Sam H. Noh
Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2019

Accepted: 01 March 2019

Revised: 01 November 2018

Received: 01 April 2018

Published in TOS Volume 15, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Natural Science Foundation of China
National High Technology Research and Development Program of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
295
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)7

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li WYang ZDeng LCheng ZWen WHe Y(2023)Accelerating Columnar Storage Based on Asynchronous Skipping StrategyBig Data Research10.1016/j.bdr.2022.10035231(100352)Online publication date: Feb-2023
https://doi.org/10.1016/j.bdr.2022.100352
Uta AGhit BDave ARellermeyer JBoncz P(2022)In-Memory Indexed Caching for Distributed Data Processing2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00019(104-114)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00019

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents