research-article

Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask

Authors:

Yue ChengAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 17, Issue 8

Pages 2036 - 2049

https://doi.org/10.14778/3659437.3659456

Published: 01 April 2024 Publication History

Abstract

As the number of pre-trained machine learning (ML) models is growing exponentially, data reduction tools are not catching up. Existing data reduction techniques are not specifically designed for pre-trained model (PTM) dataset files. This is largely due to a lack of understanding of the patterns and characteristics of these datasets, especially those relevant to data reduction and compressibility.

This paper presents the first, exhaustive analysis to date of PTM datasets on storage compressibility. Our analysis spans different types of data reduction and compression techniques, from hash-based data deduplication, data similarity detection, to dictionary-coding compression. Our analysis explores these techniques at three data granularity levels, from model layers, model chunks, to model parameters. We draw new observations that indicate that modern data reduction tools are not effective when handling PTM datasets. There is a pressing need for new compression methods that take into account PTMs' data characteristics for effective storage reduction.

Motivated by our findings, we design Elf, a simple yet effective, error-bounded, lossy floating-point compression method. Elf transforms floating-point parameters in such a way that the common exponent field of the transformed parameters can be completely eliminated to save storage space. We develop Elves, a compression framework that integrates Elf along with several other data reduction methods. Elves uses the most effective method to compress PTMs that exhibit different patterns. Evaluation shows that Elves achieves an overall compression ratio of 1.52×, which is 1.31×, 1.32× and 1.29× higher than a general-purpose compressor (zstd), an error-bounded lossy compressor (SZ3), and the uniform model quantization, respectively, with negligible model accuracy loss.

References

[1]

[n.d.]. gzip. https://www.gzip.org/.

[2]

[n.d.]. How Much Energy Do Data Centers Really Use? . https://energyinnovation.org/2020/03/17/how-much-energy-do-data-centers-really-use/.

[3]

[n.d.]. Hugging Face: The AI community building the future. https://huggingface.co/.

[4]

[n.d.]. Introducing LLaMA: A foundational, 65-billion-parameter large language model. https://ai.meta.com/blog/large-language-model-llama-meta-ai/.

[5]

[n.d.]. pigz: A parallel implementation of gzip for modern multi-processor, multi-core machines. https://zlib.net/pigz/.

[6]

[n.d.]. Snappy, a fast compressor/decompressor. https://github.com/google/snappy.

[7]

[n.d.]. TensorFlow Hub. https://www.tensorflow.org/hub.

[8]

[n.d.]. zfp. https://computing.llnl.gov/projects/zfp.

[9]

[n.d.]. zip. https://www.iana.org/assignments/media-types/application/zip.

[10]

[n.d.]. Zstandard. https://facebook.github.io/zstd/.

[11]

2012. Delta Compressed and Deduplicated Storage Using Stream-Informed Locality. In 4th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 12). USENIX Association, Boston, MA. https://www.usenix.org/conference/hotstorage12/workshop-program/presentation/Shilane

[12]

Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2017. Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems (JETC) 13, 3 (2017), 1--18.

Digital Library

[13]

Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, and Liang Zhao. 2024. Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models. arXiv:2401.00625 [cs.LG]

[14]

Davis Blalock, Samuel Madden, and John Guttag. 2018. Sprintz: Time series compression for the internet of things. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3 (2018), 1--23.

Digital Library

[15]

A.Z. Broder. 1997. On the resemblance and containment of documents. In Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171). 21--29.

[16]

Martin Burtscher and Paruj Ratanaworabhan. 2007. High Throughput Compression of Double-Precision Floating-Point Data. In 2007 Data Compression Conference (DCC'07). 293--302.

Digital Library

[17]

Franck Cappello, Sheng Di, Sihuan Li, Xin Liang, Ali Murat Gok, Dingwen Tao, Chun Hong Yoon, Xin-Chuan Wu, Yuri Alexeev, and Frederic T Chong. 2019. Use cases of lossy compression for floating-point data in scientific data sets. The International Journal of High Performance Computing Applications 33, 6 (2019), 1201--1220.

Digital Library

[18]

Xuepeng Chang, Huihui Pan, Weiyang Lin, and Huijun Gao. 2021. A mixed-pruning based framework for embedded convolutional neural network acceleration. IEEE Transactions on Circuits and Systems I: Regular Papers 68, 4 (2021), 1706--1715.

[19]

Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. In International conference on machine learning. PMLR, 2285--2294.

[20]

Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting Linear Structure within Convolutional Networks for Efficient Evaluation. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS'14). MIT Press, Cambridge, MA, USA, 1269--1277.

[21]

Assaf Eisenman, Kiran Kumar Matam, Steven Ingram, Dheevatsa Mudigere, Raghuraman Krishnamoorthi, Krishnakumar Nair, Misha Smelyanskiy, and Murali Annavaram. 2022. {Check-N-Run}: A checkpointing system for training deep learning recommendation models. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). 929--943.

[22]

Justin Forrester and Barton Miller. 2000. An Empirical Study of the Robustness of Windows NT Applications Using Random Testing. In 4th USENIX Windows Systems Symposium (4th USENIX Windows Systems Symposium). USENIX Association, Seattle, WA.

[23]

Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen, Wen Xia, Yucheng Zhang, and Yujuan Tan. 2015. Design Tradeoffs for Data Deduplication Performance in Backup Workloads. In 13th USENIX Conference on File and Storage Technologies (FAST 15). USENIX Association, Santa Clara, CA, 331--344. https://www.usenix.org/conference/fast15/technical-sessions/presentation/fu

[24]

Manas Gupta, Efe Camci, Vishandi Rudy Keneta, Abhishek Vaidyanathan, Ritwik Kanodia, Chuan-Sheng Foo, Wu Min, and Lin Jie. 2022. Is complexity required for neural network pruning? a case study on global magnitude pruning. arXiv preprint arXiv:2209.14624 (2022).

[25]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[26]

Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning Both Weights and Connections for Efficient Neural Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS'15). MIT Press, Cambridge, MA, USA, 1135--1143.

Digital Library

[27]

Benjamin Hawks, Javier Duarte, Nicholas J Fraser, Alessandro Pappalardo, Nhan Tran, and Yaman Umuroglu. 2021. Ps and qs: Quantization-aware pruning for efficient low latency neural network inference. Frontiers in Artificial Intelligence 4 (2021), 676564.

[28]

Tianxing He, Yuchen Fan, Yanmin Qian, Tian Tan, and Kai Yu. 2014. Reshaping deep neural network for fast decoding by node-pruning. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 245--249.

[29]

L Ibarria, P Lindstrom, J Rossignac, and A Szymczak. 2003. Out-of-core Compression and Decompression of Large n-dimensional Scalar Fields. 22, 3 (2 2003).

[30]

William Kahan. 1996. IEEE Standard 754 for Binary Floating-Point Arithmetic. Lecture Notes on the Status of IEEE 754 (1996).

[31]

Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv:1806.08342 [cs.LG]

[32]

Maximilian Kuschewski, David Sauerwein, Adnan Alhomssi, and Viktor Leis. 2023. BtrBlocks: Efficient Columnar Compression for Data Lakes. Proceedings of the ACM on Management of Data 1, 2 (2023), 1--26.

Digital Library

[33]

Panagiotis Liakos, Katia Papakonstantinopoulou, and Yannis Kotidis. 2022. Chimp: Efficient Lossless Floating Point Compression for Time Series Databases. Proc. VLDB Endow. 15, 11 (jul 2022), 3058--3070.

Digital Library

[34]

Tailin Liang, John Glossner, Lei Wang, Shaobo Shi, and Xiaotong Zhang. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370--403.

Digital Library

[35]

Tailin Liang, John Glossner, Lei Wang, Shaobo Shi, and Xiaotong Zhang. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370--403.

Digital Library

[36]

Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Shaomeng Li, Hanqi Guo, Zizhong Chen, and Franck Cappello. 2018. Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets. In 2018 IEEE International Conference on Big Data (Big Data). 438--447.

[37]

Zhu Liao, Victor Quétu, Van-Tam Nguyen, and Enzo Tartaglione. 2023. Can Unstructured Pruning Reduce the Depth in Deep Neural Networks?. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1402--1406.

[38]

Chunwei Liu, Hao Jiang, John Paparrizos, and Aaron J Elmore. 2021. Decomposed bounded floats for fast compression and queries. Proceedings of the VLDB Endowment 14, 11 (2021), 2586--2598.

Digital Library

[39]

Zhenhua Liu, Yunhe Wang, Kai Han, Siwei Ma, and Wen Gao. 2022. Instance-aware dynamic neural network quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12434--12443.

[40]

Dirk Meister and André Brinkmann. 2009. Multi-Level Comparison of Data Deduplication in a Backup Scenario. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference (Haifa, Israel) (SYSTOR '09). Association for Computing Machinery, New York, NY, USA, Article 8, 12 pages.

Digital Library

[41]

Dutch T. Meyer and William J. Bolosky. 2011. A Study of Practical Deduplication. In 9th USENIX Conference on File and Storage Technologies (FAST 11). USENIX Association, San Jose, CA. https://www.usenix.org/conference/fast11/study-practical-deduplication

[42]

Barton P. Miller, Lars Fredriksen, and Bryan So. 1990. An Empirical Study of the Reliability of UNIX Utilities. Commun. ACM 33, 12 (dec 1990), 32--44.

Digital Library

[43]

Barton P Miller, David Koski, Cjin Pheow Lee, Vivekandanda Maganty, Ravi Murthy, Ajitkumar Natarajan, and Jeff Steidl. 1995. Fuzz revisited: A re-examination of the reliability of UNIX utilities and services. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.

[44]

Yury Nahshan, Brian Chmiel, Chaim Baskin, Evgenii Zheltonozhskii, Ron Banner, Alex M Bronstein, and Avi Mendelson. 2021. Loss aware post-training quantization. Machine Learning 110, 11--12 (2021), 3245--3262.

Digital Library

[45]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. arXiv:2203.02155 [cs.CL]

[46]

Eunhyeok Park, Sungjoo Yoo, and Peter Vajda. 2018. Value-aware quantization for training and inference of neural networks. In Proceedings of the European Conference on Computer Vision (ECCV). 580--595.

Digital Library

[47]

Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. 2015. Gorilla: A Fast, Scalable, in-Memory Time Series Database. Proc. VLDB Endow. 8, 12 (aug 2015), 1816--1827.

Digital Library

[48]

University of Massachusetts Purushottam Kulkarni, Fred Douglis, Jason LaVoie, and John M. Tracey. 2004. Redundancy Elimination Within Large Collections of Files. In 2004 USENIX Annual Technical Conference (USENIX ATC 04). USENIX Association, Boston, MA. https://www.usenix.org/conference/2004-usenix-annual-technical-conference/redundancy-elimination-within-large-collections

[49]

Sean Quinlan and Sean Dorward. 2002. Venti: A new approach to archival data storage. In Conference on file and storage technologies (FAST 02).

[50]

Michael O. Rabin. 1981. Fingerprinting by random polynomials. Note: Harvard Aiken Computational Laboratory TR-15-81.

[51]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018).

[52]

James A. Storer and Thomas G. Szymanski. 1982. Data Compression via Textual Substitution. J. ACM 29, 4 (oct 1982), 928--951.

Digital Library

[53]

Zhaoyuan Su, Sheng Di, Ali Murat Gok, Yue Cheng, and Franck Cappello. 2022. Understanding Impact of Lossy Compression on Derivative-related Metrics in Scientific Datasets. In 2022 IEEE/ACM 8th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD). 44--53.

[54]

Saeed Vahidian, Mahdi Morafah, and Bill Lin. 2021. Personalized federated learning by structured and unstructured pruning under data heterogeneity. In 2021 IEEE 41st international conference on distributed computing systems workshops (ICDCSW). IEEE, 27--34.

[55]

Deepak Vohra and Deepak Vohra. 2016. Apache parquet. Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools (2016), 325--335.

[56]

Ziheng Wang, Jeremy Wohlwend, and Tao Lei. 2019. Structured pruning of large language models. arXiv preprint arXiv:1910.04732 (2019).

[57]

BigScience Workshop and Scao et al. 2023. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv:2211.05100 [cs.CL]

[58]

Wen Xia, Hong Jiang, Dan Feng, Fred Douglis, Philip Shilane, Yu Hua, Min Fu, Yucheng Zhang, and Yukun Zhou. 2016. A Comprehensive Study of the Past, Present, and Future of Data Deduplication. Proc. IEEE 104, 9 (2016), 1681--1710.

[59]

Wen Xia, Yukun Zhou, Hong Jiang, Dan Feng, Yu Hua, Yuchong Hu, Qing Liu, and Yucheng Zhang. 2016. {FastCDC}: A fast and efficient {Content-Defined} chunking approach for data deduplication. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). 101--114.

[60]

Lianghong Xu, Andrew Pavlo, Sudipta Sengupta, and Gregory R. Ganger. 2017. Online Deduplication for Databases. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 1355--1368.

Digital Library

[61]

Xiaodong Yu, Sheng Di, Kai Zhao, Jiannan Tian, Dingwen Tao, Xin Liang, and Franck Cappello. 2022. Ultrafast Error-Bounded Lossy Compression for Scientific Datasets. In Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing (Minneapolis, MN, USA) (HPDC '22). Association for Computing Machinery, New York, NY, USA, 159--171.

Digital Library

[62]

Ali Hadi Zadeh, Isak Edo, Omar Mohamed Awad, and Andreas Moshovos. 2020. Gobo: Quantizing attention-based nlp models for low latency and energy efficient inference. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 811--824.

[63]

Shuyu Zhang, Donglei Wu, Haoyu Jin, Xiangyu Zou, Wen Xia, and Xiaojia Huang. 2021. QD-Compressor: a Quantization-based Delta Compression Framework for Deep Neural Networks. In 2021 IEEE 39th International Conference on Computer Design (ICCD). IEEE, 542--550.

[64]

Yucheng Zhang, Wen Xia, Dan Feng, Hong Jiang, Yu Hua, and Qiang Wang. 2019. Finesse: Fine-Grained Feature Locality based Fast Resemblance Detection for Post-Deduplication Delta Compression. In 17th USENIX Conference on File and Storage Technologies (FAST 19). USENIX Association, Boston, MA, 121--128. https://www.usenix.org/conference/fast19/presentation/zhang

[65]

Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D. Tonellot, Zizhong Chen, and Franck Cappello. 2021. Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 1643--1654.

[66]

Kai Zhao, Sheng Di, Xin Liang, Sihuan Li, Dingwen Tao, Zizhong Chen, and Franck Cappello. 2020. Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (Stockholm, Sweden) (HPDC '20). Association for Computing Machinery, New York, NY, USA, 89--100.

Digital Library

[67]

Benjamin Zhu, Kai Li, and Hugo Patterson. 2008. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In 6th USENIX Conference on File and Storage Technologies (FAST 08). USENIX Association, San Jose, CA. https://www.usenix.org/conference/fast-08/avoiding-disk-bottleneck-data-domain-deduplication-file-system

[68]

J. Ziv and A. Lempel. 1977. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, 3 (1977), 337--343.

Digital Library

Recommendations

Sublinear Algorithms for Approximating String Compressibility

We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and a variant ...
Compressing Pre-trained Models of Code into 3 MB
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Although large pre-trained models of code have delivered significant advancements in various code processing tasks, there is an impediment to the wide and fluent adoption of these powerful models in software developers’ daily workflow: these large ...
Sublinear Algorithms for Approximating String Compressibility
APPROX '07/RANDOM '07: Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques

We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and Lempel-Ziv ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 17, Issue 8

April 2024

335 pages

Editors:
Meihui Zhang
Beijing Institute of Technology
,
Cyrus Shahabi
University of Southern California

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 April 2024

Published in PVLDB Volume 17, Issue 8

Check for updates

Badges

Artifacts Available / v1.1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
35
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)5

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents