Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

The FastLanes Compression Layout: Decoding > 100 Billion Integers per Second with Scalar Code

Published: 01 May 2023 Publication History

Abstract

The open-source FastLanes project aims to improve big data formats, such as Parquet, ORC and columnar database formats, in multiple ways. In this paper, we significantly accelerate decoding of all common Light-Weight Compression (LWC) schemes: DICT, FOR, DELTA and RLE through better data-parallelism. We do so by re-designing the compression layout using two main ideas: (i) generalizing the value interleaving technique in the basic operation of bit-(un)packing by targeting a virtual 1024-bits SIMD register, (ii) reordering the tuples in all columns of a table in the same Unified Transposed Layout that puts tuple chunks in a common "04261537" order (explained in the paper); allowing for maximum independent work for all possible basic SIMD lane widths: 8, 16, 32, and 64 bits.
We address the software development, maintenance and future-proofness challenges of increasing hardware diversity, by defining a virtual 1024-bits instruction set that consists of simple operators supported by all SIMD dialects; and also, importantly, by scalar code. The interleaved and tuple-reordered layout actually makes scalar decoding faster, extracting more data-parallelism from today's wide-issue CPUs. Importantly, the scalar version can be fully auto-vectorized by modern compilers, eliminating technical debt in software caused by platform-specific SIMD intrinsics.
Micro-benchmarks on Intel, AMD, Apple and AWS CPUs show that FastLanes accelerates decoding by factors (decoding >40 values per CPU cycle). FastLanes can make queries faster, as compressing the data reduces bandwidth needs, while decoding is almost free.

References

[1]
[n.d.]. Apache Parquet. http://parquet.apache.org/.
[2]
Daniel J. Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating compression and execution in column-oriented database systems. In Proceedings of the ACM SIGMOD, Surajit Chaudhuri, Vagelis Hristidis, and Neoklis Polyzotis (Eds.). ACM, 671--682.
[3]
A Afroozeh. 2020. Towards a New File Format for Big Data: SIMD-Friendly Composable Compression. https://homepages.cwi.nl/~boncz/msc/2020-AzimAfroozeh.pdf
[4]
Guy E. Blelloch. 2004. Prefix sums and their applications. (5 2004).
[5]
Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR.
[6]
Biswapesh Chattopadhyay, Priyam Dutta, Weiran Liu, Ott Tinn, Andrew McCormick, Aniket Mokashi, Paul Harvey, Hector Gonzalez, David Lomax, Sagar Mittal, Roee Aharon Ebenstein, Nikita Mikhaylin, Hung ching Lee, Xiaoyan Zhao, Guanzhong Xu, Luis Antonio Perez, Farhad Shahmohammadi, Tran Bui, Neil McKay, Vera Lychagina, and Brett Elliott. 2019. Procella: Unifying serving and analytical data at YouTube. PVLDB 12(12) (2019), 2022--2034.
[7]
Patrick Damme, Dirk Habich, Juliana Hildebrandt, and Wolfgang Lehner. 2017. Lightweight Data Compression Algorithms: An Experimental Survey (Experiments and Analyses). In EDBT.
[8]
Ziqiang Feng, Eric Lo, Ben Kao, and Wenjian Xu. 2015. ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, Timos K. Sellis, Susan B. Davidson, and Zachary G. Ives (Eds.). ACM, 31--46.
[9]
Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. 1998. Compressing Relations and Indexes. In Proceedings of the Fourteenth International Conference on Data Engineering, Orlando, Florida, USA, February 23--27, 1998, Susan Darling Urban and Elisa Bertino (Eds.). IEEE Computer Society, 370--379.
[10]
Dirk Habich, Patrick Damme, Annett Ungethüm, and Wolfgang Lehner. 2018. Make Larger Vector Register Sizes New Challenges? Lessons Learned from the Area of Vectorized Lightweight Compression Algorithms. In Proceedings of the Workshop on Testing Database Systems (Houston, TX, USA) (DBTest'18). Association for Computing Machinery, New York, NY, USA, Article 8, 6 pages.
[11]
W. Daniel Hillis and Guy L. Steele. 1986. Data Parallel Algorithms. Commun. ACM 29, 12 (dec 1986), 1170--1183.
[12]
Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, and Peter A. Boncz. 2018. Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask. Proc. VLDB Endow. 11, 13 (2018), 2209--2222.
[13]
Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage Using Both Vectorization and Compilation. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 311--326.
[14]
Harald Lang, Linnea Passing, Andreas Kipf, Peter Boncz, Thomas Neumann, and Alfons Kemper. 2020. Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines. The VLDB Journal 29, 2 (01 May 2020), 757--774.
[15]
Florian Lemaitre, Arthur Hennequin, and Lionel Lacassagne. 2020. How to Speed Connected Component Labeling up with SIMD RLE Algorithms. In Proceedings of the 2020 Sixth Workshop on Programming Models for SIMD/Vector Processing (San Diego, CA, USA) (WPMVP'20). Association for Computing Machinery, New York, NY, USA, Article 2, 8 pages.
[16]
Daniel Lemire and Leonid Boytsov. 2015. Decoding billions of integers per second through vectorization. Software: Practice and Experience 45 (01 2015).
[17]
Daniel Lemire, Leonid Boytsov, and Nathan Kurz. 2016. SIMD Compression and the Intersection of Sorted Integers. Softw. Pract. Exper. 46, 6 (jun 2016), 723--749.
[18]
Yinan Li and Jignesh Patel. 2013. BitWeaving: Fast scans for main memory data processing. Proceedings of the ACM SIGMOD International Conference on Management of Data, 289--300.
[19]
Wee Keong Ng and Chinya V. Ravishankar. 1997. Block-Oriented Compression Techniques for Large Statistical Databases. IEEE Trans. on Knowl. and Data Eng. 9, 2 (March 1997), 314--328.
[20]
Pedro Pedreira, Orri Erling, Maria Basmanova, Kevin Wilfong, Laith S. Sakka, Krishna Pai, Wei He, and Biswapesh Chattopadhyay. 2022. Velox: Meta's Unified Execution Engine. Proc. VLDB Endow. 15, 12 (2022), 3372--3384.
[21]
Johannes Pietrzyk, Annett Ungethüm, Dirk Habich, and Wolfgang Lehner. 2018. Beyond Straightforward Vectorization of Lightweight Data Compression Algorithms for Larger Vector Sizes. In Grundlagen von Datenbanken.
[22]
Jeff Plaisance, Nathan Kurz, and Daniel Lemire. 2015. Vectorized VByte Decoding. ArXiv (2015).
[23]
Orestis Polychroniou, Arun Raghavan, and Kenneth A. Ross. 2015. Rethinking SIMD Vectorization for In-Memory Databases. In ACM SIGMOD, Timos K. Sellis, Susan B. Davidson, and ZacharyG. Ives (Eds.). ACM, 1493--1508.
[24]
Orestis Polychroniou and Kenneth A. Ross. 2015. Efficient Lightweight Compression Alongside Fast Scans. In Proceedings of the 11th International Workshop on Data Management on New Hardware (Melbourne, VIC, Australia) (DaMoN'15). Association for Computing Machinery, New York, NY, USA, Article 9, 6 pages.
[25]
Mark Raasveldt and Hannes Mühleisen. 2020. Data Management for Data Science - Towards Embedded Analytics. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12--15, 2020, Online Proceedings. www.cidrdb.org. http://www.duckdb.org
[26]
Vijayshankar Raman and Garret Swart. 2006. How to Wring a Table Dry: Entropy Compression of Relations and Querying of Compressed Relations. In Proceedings of the 32nd International Conference on Very Large Data Bases (Seoul, Korea) (VLDB '06). VLDB Endowment, 858--869.
[27]
Benjamin Schlegel, Rainer Gemulla, and Wolfgang Lehner. 2010. Fast integer compression using SIMD instructions. 34--40.
[28]
Anil Shanbhag, Bobbi W. Yogatama, Xiangyao Yu, and Samuel Madden. 2022. Tile-Based Lightweight Integer Compression in GPU. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 1390--1403.
[29]
Alexander A. Stepanov, Anil R. Gangolli, Daniel E. Rose, Ryan J. Ernst, and Paramjit S. Oberoi. 2011. SIMD-Based Decoding of Posting Lists. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (Glasgow, Scotland, UK) (CIKM '11). Association for Computing Machinery, New York, NY, USA, 317--326.
[30]
Nigel Stephens, Stuart Biles, Matthias Boettcher, Jacob Eapen, Mbou Eyole, Giacomo Gabrielli, Matt Horsnell, Grigorios Magklis, Alejandro Martinez, Nathanael Prémillieu, Alastair Reid, Alejandro Rico, and Paul Walker. 2018. The ARM Scalable Vector Extension. CoRR abs/1803.06185 (2018).
[31]
Annett Ungethüm, Johannes Pietrzyk, Patrick Damme, Dirk Habich, and Wolfgang Lehner. 2018. Conflict Detection-Based Run-Length Encoding - AVX-512 CD Instruction Set in Action. 96--101.
[32]
Annett Ungethüm, Johannes Pietrzyk, Patrick Damme, Alexander Krause, Dirk Habich, Wolfgang Lehner, and Erich Focht. 2020. Hardware-Oblivious SIMD Parallelism for In-Memory Column-Stores. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12--15, 2020, Online Proceedings. www.cidrdb.org.
[33]
Richard Michael Grantham Wesley and Pawel Terlecki. 2014. Leveraging Compression in the Tableau Data Engine. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD '14). Association for Computing Machinery, New York, NY, USA, 563--573.
[34]
Till Westmann, Donald Kossmann, Sven Helmer, and Guido Moerkotte. 2000. The Implementation and Performance of Compressed Databases. SIGMOD Rec. 29, 3 (sep 2000), 55--67.
[35]
Thomas Willhalm, Nicolae Popovici, Yazan Boshmaf, Hasso Plattner, Alexander Zeier, and Jan Schaffner. 2009. SIMD-Scan: Ultra Fast in-Memory Table Scan Using on-Chip Vector Processing Units. Proc. VLDB Endow. 2, 1 (aug 2009), 385--394.
[36]
Wangda Zhang, Yanbin Wang, and Kenneth Ross. 2020. Parallel Prefix Sum with SIMD. (09 2020).
[37]
Wayne Zhao, Xudong Zhang, Daniel Lemire, Dongdong Shan, Jian-yun Nie, Hongfei Yan, and Ji-Rong Wen. 2015. A General SIMD-Based Approach to Accelerating Compression Algorithms. ACM Transactions on Information Systems 33 (02 2015).
[38]
Jingren Zhou and Kenneth A. Ross. 2002. Implementing Database Operations Using SIMD Instructions. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (Madison, Wisconsin) (SIGMOD '02). Association for Computing Machinery, New York, NY, USA, 145--156.
[39]
Marcin Zukowski, Sándor Héman, Niels Nes, and Peter A. Boncz. 2006. Super-Scalar RAM-CPU Cache Compression. In Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, 3--8 April 2006, Atlanta, GA, USA, Ling Liu, Andreas Reuter, Kyu-Young Whang, and Jianjun Zhang (Eds.). IEEE Computer Society, 59.

Cited By

View all
  • (2024)Composable Data Management: An Execution OverviewProceedings of the VLDB Endowment10.14778/3685800.368584717:12(4249-4252)Online publication date: 8-Nov-2024
  • (2024)NULLS!: Revisiting Null Representation in Modern Columnar FormatsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663452(1-10)Online publication date: 10-Jun-2024
  • (2024)Accelerating GPU Data Processing using FastLanes CompressionProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663450(1-11)Online publication date: 10-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 9
May 2023
330 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 May 2023
Published in PVLDB Volume 16, Issue 9

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)181
  • Downloads (Last 6 weeks)18
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Composable Data Management: An Execution OverviewProceedings of the VLDB Endowment10.14778/3685800.368584717:12(4249-4252)Online publication date: 8-Nov-2024
  • (2024)NULLS!: Revisiting Null Representation in Modern Columnar FormatsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663452(1-10)Online publication date: 10-Jun-2024
  • (2024)Accelerating GPU Data Processing using FastLanes CompressionProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663450(1-11)Online publication date: 10-Jun-2024
  • (2023)An Empirical Evaluation of Columnar Storage FormatsProceedings of the VLDB Endowment10.14778/3626292.362629817:2(148-161)Online publication date: 1-Oct-2023
  • (2023)ALP: Adaptive Lossless floating-Point CompressionProceedings of the ACM on Management of Data10.1145/36267171:4(1-26)Online publication date: 12-Dec-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media