research-article

The FastLanes Compression Layout: Decoding > 100 Billion Integers per Second with Scalar Code

Authors:

Peter BonczAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 16, Issue 9

Pages 2132 - 2144

https://doi.org/10.14778/3598581.3598587

Published: 01 May 2023 Publication History

Abstract

The open-source FastLanes project aims to improve big data formats, such as Parquet, ORC and columnar database formats, in multiple ways. In this paper, we significantly accelerate decoding of all common Light-Weight Compression (LWC) schemes: DICT, FOR, DELTA and RLE through better data-parallelism. We do so by re-designing the compression layout using two main ideas: (i) generalizing the value interleaving technique in the basic operation of bit-(un)packing by targeting a virtual 1024-bits SIMD register, (ii) reordering the tuples in all columns of a table in the same Unified Transposed Layout that puts tuple chunks in a common "04261537" order (explained in the paper); allowing for maximum independent work for all possible basic SIMD lane widths: 8, 16, 32, and 64 bits.

We address the software development, maintenance and future-proofness challenges of increasing hardware diversity, by defining a virtual 1024-bits instruction set that consists of simple operators supported by all SIMD dialects; and also, importantly, by scalar code. The interleaved and tuple-reordered layout actually makes scalar decoding faster, extracting more data-parallelism from today's wide-issue CPUs. Importantly, the scalar version can be fully auto-vectorized by modern compilers, eliminating technical debt in software caused by platform-specific SIMD intrinsics.

Micro-benchmarks on Intel, AMD, Apple and AWS CPUs show that FastLanes accelerates decoding by factors (decoding >40 values per CPU cycle). FastLanes can make queries faster, as compressing the data reduces bandwidth needs, while decoding is almost free.

References

[1]

[n.d.]. Apache Parquet. http://parquet.apache.org/.

[2]

Daniel J. Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating compression and execution in column-oriented database systems. In Proceedings of the ACM SIGMOD, Surajit Chaudhuri, Vagelis Hristidis, and Neoklis Polyzotis (Eds.). ACM, 671--682.

Digital Library

[3]

A Afroozeh. 2020. Towards a New File Format for Big Data: SIMD-Friendly Composable Compression. https://homepages.cwi.nl/~boncz/msc/2020-AzimAfroozeh.pdf

[4]

Guy E. Blelloch. 2004. Prefix sums and their applications. (5 2004).

[5]

Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR.

[6]

Biswapesh Chattopadhyay, Priyam Dutta, Weiran Liu, Ott Tinn, Andrew McCormick, Aniket Mokashi, Paul Harvey, Hector Gonzalez, David Lomax, Sagar Mittal, Roee Aharon Ebenstein, Nikita Mikhaylin, Hung ching Lee, Xiaoyan Zhao, Guanzhong Xu, Luis Antonio Perez, Farhad Shahmohammadi, Tran Bui, Neil McKay, Vera Lychagina, and Brett Elliott. 2019. Procella: Unifying serving and analytical data at YouTube. PVLDB 12(12) (2019), 2022--2034.

[7]

Patrick Damme, Dirk Habich, Juliana Hildebrandt, and Wolfgang Lehner. 2017. Lightweight Data Compression Algorithms: An Experimental Survey (Experiments and Analyses). In EDBT.

[8]

Ziqiang Feng, Eric Lo, Ben Kao, and Wenjian Xu. 2015. ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, Timos K. Sellis, Susan B. Davidson, and Zachary G. Ives (Eds.). ACM, 31--46.

Digital Library

[9]

Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. 1998. Compressing Relations and Indexes. In Proceedings of the Fourteenth International Conference on Data Engineering, Orlando, Florida, USA, February 23--27, 1998, Susan Darling Urban and Elisa Bertino (Eds.). IEEE Computer Society, 370--379.

Digital Library

[10]

Dirk Habich, Patrick Damme, Annett Ungethüm, and Wolfgang Lehner. 2018. Make Larger Vector Register Sizes New Challenges? Lessons Learned from the Area of Vectorized Lightweight Compression Algorithms. In Proceedings of the Workshop on Testing Database Systems (Houston, TX, USA) (DBTest'18). Association for Computing Machinery, New York, NY, USA, Article 8, 6 pages.

Digital Library

[11]

W. Daniel Hillis and Guy L. Steele. 1986. Data Parallel Algorithms. Commun. ACM 29, 12 (dec 1986), 1170--1183.

Digital Library

[12]

Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, and Peter A. Boncz. 2018. Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask. Proc. VLDB Endow. 11, 13 (2018), 2209--2222.

[13]

Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage Using Both Vectorization and Compilation. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 311--326.

Digital Library

[14]

Harald Lang, Linnea Passing, Andreas Kipf, Peter Boncz, Thomas Neumann, and Alfons Kemper. 2020. Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines. The VLDB Journal 29, 2 (01 May 2020), 757--774.

[15]

Florian Lemaitre, Arthur Hennequin, and Lionel Lacassagne. 2020. How to Speed Connected Component Labeling up with SIMD RLE Algorithms. In Proceedings of the 2020 Sixth Workshop on Programming Models for SIMD/Vector Processing (San Diego, CA, USA) (WPMVP'20). Association for Computing Machinery, New York, NY, USA, Article 2, 8 pages.

Digital Library

[16]

Daniel Lemire and Leonid Boytsov. 2015. Decoding billions of integers per second through vectorization. Software: Practice and Experience 45 (01 2015).

[17]

Daniel Lemire, Leonid Boytsov, and Nathan Kurz. 2016. SIMD Compression and the Intersection of Sorted Integers. Softw. Pract. Exper. 46, 6 (jun 2016), 723--749.

Digital Library

[18]

Yinan Li and Jignesh Patel. 2013. BitWeaving: Fast scans for main memory data processing. Proceedings of the ACM SIGMOD International Conference on Management of Data, 289--300.

Digital Library

[19]

Wee Keong Ng and Chinya V. Ravishankar. 1997. Block-Oriented Compression Techniques for Large Statistical Databases. IEEE Trans. on Knowl. and Data Eng. 9, 2 (March 1997), 314--328.

[20]

Pedro Pedreira, Orri Erling, Maria Basmanova, Kevin Wilfong, Laith S. Sakka, Krishna Pai, Wei He, and Biswapesh Chattopadhyay. 2022. Velox: Meta's Unified Execution Engine. Proc. VLDB Endow. 15, 12 (2022), 3372--3384.

Digital Library

[21]

Johannes Pietrzyk, Annett Ungethüm, Dirk Habich, and Wolfgang Lehner. 2018. Beyond Straightforward Vectorization of Lightweight Data Compression Algorithms for Larger Vector Sizes. In Grundlagen von Datenbanken.

[22]

Jeff Plaisance, Nathan Kurz, and Daniel Lemire. 2015. Vectorized VByte Decoding. ArXiv (2015).

[23]

Orestis Polychroniou, Arun Raghavan, and Kenneth A. Ross. 2015. Rethinking SIMD Vectorization for In-Memory Databases. In ACM SIGMOD, Timos K. Sellis, Susan B. Davidson, and ZacharyG. Ives (Eds.). ACM, 1493--1508.

Digital Library

[24]

Orestis Polychroniou and Kenneth A. Ross. 2015. Efficient Lightweight Compression Alongside Fast Scans. In Proceedings of the 11th International Workshop on Data Management on New Hardware (Melbourne, VIC, Australia) (DaMoN'15). Association for Computing Machinery, New York, NY, USA, Article 9, 6 pages.

[25]

Mark Raasveldt and Hannes Mühleisen. 2020. Data Management for Data Science - Towards Embedded Analytics. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12--15, 2020, Online Proceedings. www.cidrdb.org. http://www.duckdb.org

[26]

Vijayshankar Raman and Garret Swart. 2006. How to Wring a Table Dry: Entropy Compression of Relations and Querying of Compressed Relations. In Proceedings of the 32nd International Conference on Very Large Data Bases (Seoul, Korea) (VLDB '06). VLDB Endowment, 858--869.

[27]

Benjamin Schlegel, Rainer Gemulla, and Wolfgang Lehner. 2010. Fast integer compression using SIMD instructions. 34--40.

[28]

Anil Shanbhag, Bobbi W. Yogatama, Xiangyao Yu, and Samuel Madden. 2022. Tile-Based Lightweight Integer Compression in GPU. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 1390--1403.

Digital Library

[29]

Alexander A. Stepanov, Anil R. Gangolli, Daniel E. Rose, Ryan J. Ernst, and Paramjit S. Oberoi. 2011. SIMD-Based Decoding of Posting Lists. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (Glasgow, Scotland, UK) (CIKM '11). Association for Computing Machinery, New York, NY, USA, 317--326.

[30]

Nigel Stephens, Stuart Biles, Matthias Boettcher, Jacob Eapen, Mbou Eyole, Giacomo Gabrielli, Matt Horsnell, Grigorios Magklis, Alejandro Martinez, Nathanael Prémillieu, Alastair Reid, Alejandro Rico, and Paul Walker. 2018. The ARM Scalable Vector Extension. CoRR abs/1803.06185 (2018).

[31]

Annett Ungethüm, Johannes Pietrzyk, Patrick Damme, Dirk Habich, and Wolfgang Lehner. 2018. Conflict Detection-Based Run-Length Encoding - AVX-512 CD Instruction Set in Action. 96--101.

[32]

Annett Ungethüm, Johannes Pietrzyk, Patrick Damme, Alexander Krause, Dirk Habich, Wolfgang Lehner, and Erich Focht. 2020. Hardware-Oblivious SIMD Parallelism for In-Memory Column-Stores. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12--15, 2020, Online Proceedings. www.cidrdb.org.

[33]

Richard Michael Grantham Wesley and Pawel Terlecki. 2014. Leveraging Compression in the Tableau Data Engine. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD '14). Association for Computing Machinery, New York, NY, USA, 563--573.

Digital Library

[34]

Till Westmann, Donald Kossmann, Sven Helmer, and Guido Moerkotte. 2000. The Implementation and Performance of Compressed Databases. SIGMOD Rec. 29, 3 (sep 2000), 55--67.

Digital Library

[35]

Thomas Willhalm, Nicolae Popovici, Yazan Boshmaf, Hasso Plattner, Alexander Zeier, and Jan Schaffner. 2009. SIMD-Scan: Ultra Fast in-Memory Table Scan Using on-Chip Vector Processing Units. Proc. VLDB Endow. 2, 1 (aug 2009), 385--394.

Digital Library

[36]

Wangda Zhang, Yanbin Wang, and Kenneth Ross. 2020. Parallel Prefix Sum with SIMD. (09 2020).

[37]

Wayne Zhao, Xudong Zhang, Daniel Lemire, Dongdong Shan, Jian-yun Nie, Hongfei Yan, and Ji-Rong Wen. 2015. A General SIMD-Based Approach to Accelerating Compression Algorithms. ACM Transactions on Information Systems 33 (02 2015).

[38]

Jingren Zhou and Kenneth A. Ross. 2002. Implementing Database Operations Using SIMD Instructions. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (Madison, Wisconsin) (SIGMOD '02). Association for Computing Machinery, New York, NY, USA, 145--156.

[39]

Marcin Zukowski, Sándor Héman, Niels Nes, and Peter A. Boncz. 2006. Super-Scalar RAM-CPU Cache Compression. In Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, 3--8 April 2006, Atlanta, GA, USA, Ling Liu, Andreas Reuter, Kyu-Young Whang, and Jianjun Zhang (Eds.). IEEE Computer Society, 59.

Cited By

Pedreira PMajeti DErling O(2024)Composable Data Management: An Execution OverviewProceedings of the VLDB Endowment10.14778/3685800.368584717:12(4249-4252)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685847
Zeng XMeng RPavlo AMcKinney WZhang H(2024)NULLS!: Revisiting Null Representation in Modern Columnar FormatsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663452(1-10)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3662010.3663452
Afroozeh AFelius LBoncz P(2024)Accelerating GPU Data Processing using FastLanes CompressionProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663450(1-11)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3662010.3663450
Show More Cited By

Recommendations

Accelerating GPU Data Processing using FastLanes Compression
DaMoN '24: Proceedings of the 20th International Workshop on Data Management on New Hardware

We show that compression can be a win-win for GPU data processing: it not only allows to store more data in GPU global memory, but can also accelerate data processing. We show that the complete redesign of compressed columnar storage in FastLanes, with ...
Code compression algorithms and architectures for embedded systems
Code compression

Current research in compiler optimization counts mainly CPU time and perhaps the first cache level or two. This view has been important but is becoming myopic, at least from a system-wide viewpoint, as the ratio of network and disk speeds to CPU speeds ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 16, Issue 9

May 2023

330 pages

ISSN:2150-8097

Editors:
Georgia Koutrika
Athena Research Center
,
Jun Yang
Duke University

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 May 2023

Published in PVLDB Volume 16, Issue 9

Check for updates

Badges

Artifacts Available / v1.1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
246
Total Downloads

Downloads (Last 12 months)181
Downloads (Last 6 weeks)18

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pedreira PMajeti DErling O(2024)Composable Data Management: An Execution OverviewProceedings of the VLDB Endowment10.14778/3685800.368584717:12(4249-4252)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685847
Zeng XMeng RPavlo AMcKinney WZhang H(2024)NULLS!: Revisiting Null Representation in Modern Columnar FormatsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663452(1-10)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3662010.3663452
Afroozeh AFelius LBoncz P(2024)Accelerating GPU Data Processing using FastLanes CompressionProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663450(1-11)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3662010.3663450
Zeng XHui YShen JPavlo AMcKinney WZhang H(2023)An Empirical Evaluation of Columnar Storage FormatsProceedings of the VLDB Endowment10.14778/3626292.362629817:2(148-161)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.14778/3626292.3626298
Afroozeh AKuffo LBoncz P(2023)ALP: Adaptive Lossless floating-Point CompressionProceedings of the ACM on Management of Data10.1145/36267171:4(1-26)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626717

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents