Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3410463.3414624acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article
Open access

cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data

Published: 30 September 2020 Publication History

Abstract

Error-bounded lossy compression is a state-of-the-art data reduction technique for HPC applications because it not only significantly reduces storage overhead but also can retain high fidelity for postanalysis. Because supercomputers and HPC applications are becoming heterogeneous using accelerator-based architectures, in particular GPUs, several development teams have recently released GPU versions of their lossy compressors. However, existing state-of-the-art GPU-based lossy compressors suffer from either low compression and decompression throughput or low compression quality. In this paper, we present an optimized GPU version, cuSZ, for one of the best error-bounded lossy compressors-SZ. To the best of our knowledge, cuSZ is the first error-bounded lossy compressor on GPUs for scientific data. Our contributions are fourfold. (1) We propose a dual-quantization scheme to entirely remove the data dependency in the prediction step of SZ such that this step can be performed very efficiently on GPUs. (2) We develop an efficient customized Huffman coding for the SZ compressor on GPUs. (3) We implement cuSZ using CUDA and optimize its performance by improving the utilization of GPU memory bandwidth. (4) We evaluate our cuSZ on five real-world HPC application datasets from the Scientific Data Reduction Benchmarks and compare it with other state-of-the-art methods on both CPUs and GPUs. Experiments show that our cuSZ improves SZ's compression throughput by up to 370.1x and 13.1x, respectively, over the production version running on single and multiple CPU cores, respectively, while getting the same quality of reconstructed data. It also improves the compression ratio by up to 3.48x on the tested data compared with another state-of-the-art GPU supported lossy compressor.

References

[1]
S. Habib, V. Morozov, N. Frontiere, H. Finkel, A. Pope, K. Heitmann, K. Kumaran, V. Vishwanath, T. Peterka, J. Insley, et al., ?HACC: Extreme scaling and performance across diverse architectures,? Communications of the ACM, vol. 60, no. 1, pp. 97--104, 2016.
[2]
S. C. V. Vishwanath and K. Harms, Parallel i/o on mira, https://www. alcf.anl.gov/files/Parallel_IO_on_Mira_0.pdf, Online, 2019.
[3]
X. Liang, S. Di, D. Tao, S. Li, S. Li, H. Guo, Z. Chen, and F. Cappello, 'Error-controlled lossy compression optimized for high compression ratios of scientific datasets,? in 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA: IEEE, 2018, pp. 438--447.
[4]
X. Liang, S. Di, S. Li, D. Tao, Z. Chen, and F. Cappello, 'Exploring best lossy compression strategy by combining SZ with spatiotemporal decimation,? in The 4th International Workshop on Data Reduction for Big Scientific Data, Dallas, TX, USA: IEEE, 2018.
[5]
D. Meister, J. Kaiser, A. Brinkmann, T. Cortes, M. Kuhn, and J. Kunkel, "A study on data deduplication in HPC storage systems,? in SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT, USA: IEEE, 2012, p. 7.
[6]
S. W. Son, Z. Chen, W. Hendrix, A. Agrawal, W.-k. Liao, and A. Choudhary, "Data compression for the exascale computing erasurvey," Supercomputing Frontiers and Innovations, vol. 1, no. 2, pp. 76--88, 2014.
[7]
A. H. Baker, H. Xu, J. M. Dennis, M. N. Levy, D. Nychka, S. A. Mickelson, J. Edwards, M. Vertenstein, and A. Wegener, "A methodology for evaluating the impact of data compression on climate simulation data," in Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, Vancouver, BC, Canada: ACM, 2014, pp. 203--214.
[8]
D. Tao, S. Di, Z. Chen, and F. Cappello, "Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization," in 2017 IEEE International Parallel and Distributed Processing Symposium, Orlando, FL, USA: IEEE, 2017, pp. 1129--1139.
[9]
S. Di and F. Cappello, "Fast error-bounded lossy HPC data compression with SZ," in 2016 IEEE International Parallel and Distributed Processing Symposium, Chicago, IL, USA: IEEE, 2016, pp. 730--739.
[10]
https://lcls.slac.stanford.edu/lasers/lcls-ii, Online.
[11]
F. Cappello, S. Di, S. Li, X. Liang, A. M. Gok, D. Tao, C. H. Yoon, X.-C. Wu, Y. Alexeev, and F. T. Chong, "Use cases of lossy compression for floating-point data in scientific data sets," The International Journal of High Performance Computing Applications, vol. 33, no. 6, pp. 1201--1220, 2019.
[12]
J. Tian, S. Di, C. Zhang, X. Liang, S. Jin, D. Cheng, D. Tao, and F. Cappello, "Wavesz: A hardware-algorithm co-design of efficient lossy compression for scientific data," in Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, USA, 2020, pp. 74--88.
[13]
N. V. T. C. GPU, https://www.nvidia.com/en-us/data-center/v100/, Online, 2020.
[14]
L. Ibarria, P. Lindstrom, J. Rossignac, and A. Szymczak, "Out-of-core compression and decompression of large n-dimensional scalar fields," Computer Graphics Forum, vol. 22, no. 3, pp. 343--348, 2003.
[15]
cuZFP, https://github.com/LLNL/zfp/tree/develop/src/cuda_zfp, Online, 2019.
[16]
Scientific Data Reduction Benchmarks, https://sdrbench.github.io/, Online, 2019.
[17]
L. P. Deutsch, GZIP file format specification version 4.3, 1996.
[18]
Zstd, https://github.com/facebook/zstd/releases, Online, 2019.
[19]
X. Liang, S. Di, D. Tao, Z. Chen, and F. Cappello, "An efficient transformation scheme for lossy data compression with point-wise relative error bound," in IEEE International Conference on Cluster Computing (CLUSTER), Belfast, UK: IEEE, 2018, pp. 179--189.
[20]
I. Foster, M. Ainsworth, B. Allen, J. Bessac, F. Cappello, J. Y. Choi, E. Constantinescu, P. E. Davis, S. Di, W. Di, et al., "Computing just what you need: Online data analysis and reduction at extreme scales," in European Conference on Parallel Processing, Santiago de Compostela, Spain: Springer, 2017, pp. 3--19.
[21]
S. Di, D. Tao, X. Liang, and F. Cappello, "Efficient lossy compression for scientific data based on pointwise relative error bound," IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 2, pp. 331--345, 2018.
[22]
A. M. Gok, S. Di, A. Yuri, D. Tao, V. Mironov, X. Liang, and F. Cappello, "PaSTRI: A novel data compression algorithm for two-electron integrals in quantum chemistry," in IEEE International Conference on Cluster Computing (CLUSTER), Belfast, UK: IEEE, 2018, pp. 1--11.
[23]
T. Lu, Q. Liu, X. He, H. Luo, E. Suchyta, J. Choi, N. Podhorszki, S. Klasky, M. Wolf, T. Liu, et al., "Understanding and modeling lossy compression schemes on HPC scientific data," in 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, BC, Canada: IEEE, 2018, pp. 348--357.
[24]
S. Jin, S. Di, X. Liang, J. Tian, D. Tao, and F. Cappello, "Deepsz: A novel framework to compress deep neural networks by using errorbounded lossy compression," in Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, Phoenix, AZ, USA: ACM, 2019, pp. 159--170.
[25]
X. Liang, S. Di, S. Li, D. Tao, B. Nicolae, Z. Chen, and F. Cappello, "Significantly improving lossy compression quality based on an optimized hybrid prediction model," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA: ACM, 2019, p. 33.
[26]
K. Zhao, S. Di, X. Liang, S. Li, D. Tao, Z. Chen, and F. Cappello, "Significantly improving lossy compression for hpc datasets with second-order prediction and parameter optimization," in Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, Stockholm, Sweden: ACM, 2020, pp. 89--100.
[27]
N. Zhang, Y.-s. Chen, and J.-l. Wang, "Image parallel processing based on GPU," in 2010 2nd International Conference on Advanced Computer Control, vol. 3, Shenyang, China: IEEE, 2010, pp. 367--370.
[28]
J. Gómez-Luna, J. M. González-Linares, J. I. B. Benítez, and N. G. Mata, "An optimized approach to histogram computation on GPU," Machine Vision and Applications, vol. 24, pp. 899--908, 2012.
[29]
Y. Abu-Mostafa and R. McEliece, "Maximal codeword lengths in Huffman codes," Computers & Mathematics with Applications, vol. 39, no. 11, pp. 129--134, 2000.
[30]
M. L. Barnett, Canonical huffman encoded data decompression algorithm, US Patent 6,657,569, 2003.
[31]
M Harris and K Perelygin, Cooperative groups: Flexible cuda thread programming, 2017.
[32]
PantaRhei cluster, https://www.dingwentao.com/experimentalsystem, Online, 2019.
[33]
Community Earth System Model (CESM) Atmosphere Model, http://www.cesm.ucar.edu/models/, Online, 2019.
[34]
Hurricane ISABEL Simulation Data, http://vis.computer.org/vis2004contest/data.html, Online, 2019.
[35]
NYX simulation, https://amrex-astro.github.io/Nyx/, Online.
[36]
QMCPACK: many-body ab initio Quantum Monte Carlo code, http://vis.computer.org/vis2004contest/data.html, Online, 2019.
[37]
D. Tao, S. Di, X. Liang, Z. Chen, and F. Cappello, "Optimizing lossy compression rate-distortion from automatic online selection between SZ and ZFP," IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 8, pp. 1857--1871, 2019.
[38]
X. Liang, S. Di, D. Tao, S. Li, B. Nicolae, Z. Chen, and F. Cappello, "Improving performance of data dumping with lossy compression for scientific simulation," in 2019 IEEE International Conference on Cluster Computing (CLUSTER), Albuquerque, NM, USA: IEEE, 2019, pp. 1--11.
[39]
S. Jin, P. Grosset, C. M. Biwer, J. Pulido, J. Tian, D. Tao, and J. Ahrens, "Understanding GPU-based lossy compression for extreme-scale cosmological simulations," in 2020 IEEE International Parallel and Distributed Processing Symposium, New Orleans, LA, USA: IEEE, 2020, pp. 105--115.
[40]
D. Foley and J. Danskin, "Ultra-performance Pascal GPU and NVLink interconnect," IEEE Micro, vol. 37, no. 2, pp. 7--17, 2017.
[41]
zfp Compression Ratio and Quality, https://computing.llnl.gov/ projects/floating-point-compression/zfp-compression-ratio-andquality, Online, 2019.
[42]
M. Burtscher and P. Ratanaworabhan, "FPC: A high-speed compressor for double-precision floating-point data," IEEE Transactions on Computers, vol. 58, no. 1, pp. 18--31, 2008.
[43]
P. Lindstrom and M. Isenburg, "Fast and efficient compression of floating-point data,"IEEE Transactions on Visualization and Computer Graphics, vol. 12, no. 5, pp. 1245--1250, 2006.
[44]
G. K. Wallace, "The JPEG still picture compression standard," IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii--xxxiv, 1992.
[45]
P. Lindstrom, "Fixed-rate compressed floating-point arrays," IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 12, pp. 2674--2683, 2014.
[46]
P. Xiang, Y. Yang, and H. Zhou, "Warp-level divergence in GPUs: Characterization, impact, and mitigation," in 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), Orlando, FL, USA: IEEE, 2014, pp. 284--295.
[47]
A. Fuentes-Alventosa, J. Gómez-Luna, J. M. González-Linares, and N. Guil, "Cuvle: Variable-length encoding on cuda," in Proceedings of the 2014 Conference on Design and Architectures for Signal and Image Processing, Madrid, Spain: IEEE, 2014, pp. 1--6.
[48]
H. Rahmani, C. Topal, and C. Akinlar, "A parallel huffman coder on the CUDA architecture," in 2014 IEEE Visual Communications and Image Processing Conference, Valletta, Malta: IEEE, 2014, pp. 311--314.
[49]
S. Lal, J. Lucas, and B. Juurlink, "E" 2MC: Entropy encoding based memory compression for GPUs," in 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, USA: IEEE, 2017, pp. 1119--1128.

Cited By

View all
  • (2024)Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU ClustersISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528931(1-12)Online publication date: May-2024
  • (2024)Real-Time Decompression and Rasterization of Massive Point CloudsProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/36753737:3(1-15)Online publication date: 9-Aug-2024
  • (2024)Concealing Compression-accelerated I/O for HPC Applications through In Situ Task SchedulingProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629573(981-998)Online publication date: 22-Apr-2024
  • Show More Cited By

Index Terms

  1. cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques
    September 2020
    505 pages
    ISBN:9781450380751
    DOI:10.1145/3410463
    • General Chair:
    • Vivek Sarkar,
    • Program Chair:
    • Hyesoon Kim
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 September 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cuda
    2. gpu
    3. lossy compression
    4. performance
    5. scientific data

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    PACT '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Upcoming Conference

    PACT '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)331
    • Downloads (Last 6 weeks)41
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU ClustersISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528931(1-12)Online publication date: May-2024
    • (2024)Real-Time Decompression and Rasterization of Massive Point CloudsProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/36753737:3(1-15)Online publication date: 9-Aug-2024
    • (2024)Concealing Compression-accelerated I/O for HPC Applications through In Situ Task SchedulingProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629573(981-998)Online publication date: 22-Apr-2024
    • (2024)CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658691(309-321)Online publication date: 3-Jun-2024
    • (2024)A Portable, Fast, DCT-based Compressor for AI AcceleratorsProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658662(109-121)Online publication date: 3-Jun-2024
    • (2024)A General Framework for Progressive Data Compression and RetrievalIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332718630:1(1358-1368)Online publication date: 1-Jan-2024
    • (2024)Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00040(373-385)Online publication date: 27-May-2024
    • (2024)Accelerating Large Language Model Training with Hybrid GPU-based Compression2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00031(196-205)Online publication date: 6-May-2024
    • (2023)Accelerated dynamic data reduction using spatial and temporal propertiesThe International Journal of High Performance Computing Applications10.1177/1094342023118050437:5(539-559)Online publication date: 5-Jun-2023
    • (2023)Fast 2D Bicephalous Convolutional Autoencoder for Compressing 3D Time Projection Chamber DataProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3625127(298-305)Online publication date: 12-Nov-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media