Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Camel: Efficient Compression of Floating-Point Time Series

Published: 20 December 2024 Publication History

Abstract

Time series compression encodes the information in a time-ordered sequence of data points into fewer bits, thereby reducing storage costs and possibly other costs. Compression methods are either general or XOR-based. General compression methods are time-consuming and are not suitable in streaming scenarios, while XOR-based methods are unable to consistently maintain high compression ratios. Further, existing methods compress the integer and decimal parts of floating-point values as a whole, thus disregarding the different characteristics of the two parts. We propose Camel, a new compression method for floating-point time series with the goal of advancing the compression ratios and efficiency achievable. Camel compresses the integer and decimal parts of the double-precision floating-point numbers in time series separately; and instead of performing XOR operations on values using their previous value, Camel identifies values that enable higher compression ratios. Camel also includes means of indexing compressed data, thereby making it possible to query compressed data efficiently. We report on an empirical study of Camel and 11 lossless and 6 lossy compression methods on 22 public datasets and three industrial datasets from AliCloud. The study offers evidence that Camel is capable of outperforming existing methods in terms of both compression ratio and efficiency and is capable of excellent compression performance on both time series and non-time series data.

References

[1]
2016. Zstandard - Fast real-time compression algorithm. https://github.com/facebook/zstd.
[2]
2018. Points of Interest POI Database. https://www.kaggle.com/datasets/ehallmar/points-of-interest-poi-database.
[3]
2020. Daily Temperature of Major Cities. https://www.kaggle.com/datasets/sudalairajkumar/daily-temperature-of-major-cities.
[4]
2020. Financial data set used in INFORE project. https://zenodo.org/record/3886895#.Y4DdzHZByM_.
[5]
2021. Global Food Prices Database (WFP). https://data.humdata.org/dataset/wfp-food-prices.
[6]
2021. World Cities of different countries. https://www.kaggle.com/datasets/kuntalmaity/world-city.
[7]
2022. 2D wind speed and direction. https://data.neonscience.org/data-products/DP1.00001.001/RELEASE-2022.
[8]
2022. Barometric pressure. https://data.neonscience.org/data-products/DP1.00004.001/RELEASE-2022.
[9]
2022. Blockchair Bitcoin Transactions. https://gz.blockchair.com/bitcoin/transactions.
[10]
2022. Dust and particulate size distribution. https://data.neonscience.org/data-products/DP1.00017.001/RELEASE-2022.
[11]
2022. Electric Vehicle Charging Dataset. https://www.kaggle.com/datasets/michaelbryantds/electric-vehicle-charging-dataset.
[12]
2022. IR biological temperature. https://data.neonscience.org/data-products/DP1.00005.001/RELEASE-2022.
[13]
2022. Relative humidity above water on-buoy. https://data.neonscience.org/data-products/DP1.20271.001/RELEASE-2022.
[14]
2022. SSD and HDD Benchmarks. https://www.kaggle.com/datasets/alanjo/ssd-and-hdd-benchmarks.
[15]
2023. Historical Weather Data. https://www.meteoblue.com/en/weather/archive/export/basel_switzerland.
[16]
2024. InfluxDB 2.0 Sample Data. https://github.com/influxdata/influxdb2-sample-data.
[17]
2024. LZ4 - Extremely fast compression. https://github.com/lz4/lz4.
[18]
2024. Quantization Algorithms. https://intellabs.github.io/distiller/algo_quantization.html.
[19]
2024. Snappy - About A fast compressor/decompressor. https://github.com/google/snappy.
[20]
2024. Xz - The xz File Format. https://tukaani.org/xz.
[21]
Jyrki Alakuijala, Andrea Farruggia, Paolo Ferragina, Eugene Kliuchnikov, Robert Obryk, Zoltan Szabadka, and Lode Vandevenne. 2019. Brotli: A General-Purpose Data Compressor. ACM Trans. Inf. Syst. 37, 1, 4:1--4:30.
[22]
Yanzhe An, Yue Su, Yuqing Zhu, and Jianmin Wang. 2022. TVStore: Automatically Bounding Time Series Storage via Time-Varying Compression. In FAST. USENIX Association, 83--100.
[23]
Davis W. Blalock, Samuel Madden, and John V. Guttag. 2018. Sprintz: Time Series Compression for the Internet of Things. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 3 (2018), 93:1--93:23.
[24]
Martin Burtscher and Paruj Ratanaworabhan. 2007. High Throughput Compression of Double-Precision Floating-Point Data. In DCC. 293--302.
[25]
Martin Burtscher and Paruj Ratanaworabhan. 2009. FPC: A High-Speed Compressor for Double-Precision Floating-Point Data. IEEE Trans. Computers 58, 1, 18--31.
[26]
Shubham Chandak, Kedar Tatwawadi, ChengtaoWen, LingyunWang, Juan Aparicio Ojea, and TsachyWeissman. 2020. LFZip: Lossy Compression of Multivariate Floating-Point Time Series Data via Improved Prediction. In DCC. 342--351.
[27]
Antonios Deligiannakis, Yannis Kotidis, and Nick Roussopoulos. 2007. Dissemination of compressed historical information in sensor networks. VLDB J. 16, 4, 439--461.
[28]
Sheng Di and Franck Cappello. 2016. Fast Error-Bounded Lossy HPC Data Compression with SZ. In IPDPS. 730--739.
[29]
Hazem Elmeleegy, Ahmed K. Elmagarmid, Emmanuel Cecchet, Walid G. Aref, and Willy Zwaenepoel. 2009. Online Piece-wise Linear Approximation of Numerical Streams with Precision Guarantees. Proc. VLDB Endow. 2, 1, 145--156.
[30]
Søren Kejser Jensen, Torben Bach Pedersen, and Christian Thomsen. 2018. ModelarDB: Modular Model-Based Time Series Management with Spark and Cassandra. Proc. VLDB Endow. 11, 11, 1688--1701.
[31]
Søren Kejser Jensen, Torben Bach Pedersen, and Christian Thomsen. 2021. Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB. In ICDE. 1380--1391.
[32]
Xenophon Kitsios, Panagiotis Liakos, Katia Papakonstantinopoulou, and Yannis Kotidis. 2023. Sim-Piece: Highly Accurate Piecewise Linear Approximation through Similar Segment Merging. Proc. VLDB Endow. 16, 8, 1910--1922.
[33]
Iosif Lazaridis and Sharad Mehrotra. 2003. Capturing Sensor-Generated Time Series with Quality Guarantees. In ICDE. 429--440.
[34]
Ruiyuan Li, Zheng Li, Yi Wu, Chao Chen, Songtao Guo, Ming Zhang, and Yu Zheng. 2023. Erasing-based lossless compression method for streaming floating-point time series. CoRR abs/2306.16053.
[35]
Ruiyuan Li, Zheng Li, Yi Wu, Chao Chen, and Yu Zheng. 2023. Elf: Erasing-based Lossless Floating-Point Compression. Proc. VLDB Endow. 16, 7, 1763--1776.
[36]
Tianyi Li, Lu Chen, Christian S. Jensen, and Torben Bach Pedersen. 2021. TRACE: Real-time compression of streaming trajectories in road networks. Proc. VLDB Endow 14, 7, 1175--1187.
[37]
Tianyi Li, Lu Chen, Christian S. Jensen, Torben Bach Pedersen, Yunjun Gao, and Jilin Hu. 2022. Evolutionary Clustering of Moving Objects. In ICDE. IEEE, 2399--2411.
[38]
Tianyi Li, Ruikai Huang, Lu Chen, Christian S. Jensen, and Torben Bach Pedersen. 2020. Compression of uncertain trajectories in road networks. Proc. VLDB Endow 13, 7, 1050--1063.
[39]
Panagiotis Liakos, Katia Papakonstantinopoulou, and Yannis Kotidis. 2022. Chimp: Efficient Lossless Floating Point Compression for Time Series Databases. Proc. VLDB Endow. 15, 11, 3058--3070.
[40]
Peter Lindstrom. 2014. Fixed-Rate Compressed Floating-Point Arrays. IEEE Trans. Vis. Comput. Graph. 20, 12, 2674--2683.
[41]
Tong Liu, Jinzhen Wang, Qing Liu, Shakeel Alibhai, Tao Lu, and Xubin He. 2023. High-Ratio Lossy Compression: Exploring the Autoencoder to Compress Scientific Data. IEEE Trans. Big Data 9, 1, 22--36.
[42]
Tuomas Pelkonen, Scott Franklin, Paul Cavallaro, Qi Huang, Justin Meza, Justin Teller, and Kaushik Veeraraghavan. 2015. Gorilla: A Fast, Scalable, In-Memory Time Series Database. Proc. VLDB Endow. 8, 12, 1816--1827.
[43]
Paruj Ratanaworabhan, Jian Ke, and Martin Burtscher. 2006. Fast Lossless Compression of Scientific Floating-Point Data. In DCC. 133--142.
[44]
Dingwen Tao, Sheng Di, Zizhong Chen, and Franck Cappello. 2017. Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization. In IPDPS. 1129--1139.
[45]
Jiaqi Wang, Tianyi Li, Anni Wang, Xiaoze Liu, Lu Chen, Jie Chen, Jianye Liu, Junyang Wu, Feifei Li, and Yunjun Gao. 2023. Real-time Workload Pattern Analysis for Large-scale Cloud Databases. Proc. VLDB Endow. 16, 12 (2023), 3689--3701.
[46]
Qing Xie, Chaoyi Pang, Xiaofang Zhou, Xiangliang Zhang, and Ke Deng. 2014. Maximum error-bounded Piecewise Linear Representation for online stream approximation. VLDB J. 23, 6, 915--937.
[47]
Yunlong Xu, Peizhen Yang, and Zhengbin Tao. 2023. Dangoron: Network Construction on Large-scale Time Series Data across Sliding Windows. In SIGMOD. 269--271.
[48]
Yuanyuan Yao, Dimeng Li, Hailiang Jie, Lu Chen, Tianyi Li, Jie Chen, Jiaqi Wang, Feifei Li, and Yunjun Gao. 2023. SimpleTS: An Efficient and Universal Model Selection Framework for Time Series Forecasting. Proc. VLDB Endow. 16, 12, 3741--3753.
[49]
Xinyang Yu, Yanqing Peng, Feifei Li, Sheng Wang, Xiaowei Shen, Huijun Mai, and Yue Xie. 2020. Two-Level Data Compression using Machine Learning in Time Series Database. In ICDE. 1333--1344.
[50]
Xianyuan Zhan, Haoran Xu, Yue Zhang, Xiangyu Zhu, Honglei Yin, and Yu Zheng. 2022. DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning. In AAAI. 4680--4688.
[51]
Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D. Tonellot, Zizhong Chen, and Franck Cappello. 2021. Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation. In ICDE. 1643--1654.
[52]
Kai Zhao, Sheng Di, Danny Perez, Xin Liang, Zizhong Chen, and Franck Cappello. 2022. MDZ: An Efficient Errorbounded Lossy Compressor for Molecular Dynamics. In ICDE. 27--40.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 6
SIGMOD
December 2024
792 pages
EISSN:2836-6573
DOI:10.1145/3709598
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2024
Published in PACMMOD Volume 2, Issue 6

Permissions

Request permissions for this article.

Author Tags

  1. data compression
  2. time series
  3. time series query

Qualifiers

  • Research-article

Funding Sources

  • NSFC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 123
    Total Downloads
  • Downloads (Last 12 months)123
  • Downloads (Last 6 weeks)72
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media