Abstract
As the growth of data sizes continues to outpace computational resources, there is a pressing need for data reduction techniques that can significantly reduce the amount of data and quantify the error incurred in compression. Compressing scientific data presents many challenges for reduction techniques since it is often on non-uniform or unstructured meshes, is from a high-dimensional space, and has many Quantities of Interests (QoIs) that need to be preserved. To illustrate these challenges, we focus on data from a large scale fusion code, XGC. XGC uses a Particle-In-Cell (PIC) technique which generates hundreds of PetaBytes (PBs) of data a day, from thousands of timesteps. XGC uses an unstructured mesh, and needs to compute many QoIs from the raw data, f.
One critical aspect of the reduction is that we need to ensure that QoIs derived from the data (density, temperature, flux surface averaged momentums, etc.) maintain a relative high accuracy. We show that by compressing XGC data on the high-dimensional, nonuniform grid on which the data is defined, and adaptively quantizing the decomposed coefficients based on the characteristics of the QoIs, the compression ratios at various error tolerances obtained using a multilevel compressor (MGARD) increases more than ten times. We then present how to mathematically guarantee that the accuracy of the QoIs computed from the reduced f is preserved during the compression. We show that the error in the XGC density can be kept under a user-specified tolerance over 1000 timesteps of simulation using the mathematical QoI error control theory of MGARD, whereas traditional error control on the data to be reduced does not guarantee the accuracy of the QoIs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chang, C.-S., et al.: Spontaneous rotation sources in a quiescent tokamak edge plasma. Phys. Plasmas 15(6), 062510 (2008)
Chang, C.-S., et al.: Compressed ion temperature gradient turbulence in diverted tokamak edge. Phys. Plasmas 16(5), 056108 (2009)
Hager, R., et al.: Gyrokinetic study of collisional resonant magnetic perturbation (RMP)-driven plasma density and heat transport in tokamak edge plasma using a magnetohydrodynamic screened RMP field. Nucl. Fusion 59(12), 126009 (2019)
Jesse, S., et al.: Using multivariate analysis of scanning-Rochigram data to reveal material functionality. Microsc. Microanal. 22(S3), 292–293 (2016)
https://www.olcf.ornl.gov/2021/02/18/scientists-use-supercomputers-tostudy-reliable-fusion-reactor-design-operation (2021, Online)
Rebut, P.-H.: ITER: the first experimental fusion reactor. Fusion Eng. Des. 30(1–2), 85–118 (1995)
Ku, S.-H., et al.: Full-f gyrokinetic particle simulation of centrally heated global ITG turbulence from magnetic axis to edge pedestal top in a realistic tokamak geometry. Nucl. Fusion 49(11), 115021 (2009)
Dominski, J., et al.: Spatial coupling of gyrokinetic simulations, a generalized scheme based on first-principles. Phys. Plasmas 28(2), 022301 (2021)
Wolfram Jr, et al.: Global to Coastal Multiscale Modeling via Land-river-ocean Coupling in the Energy Exascale Earth System Model (E3SM). No. LA-UR-20-24263. Los Alamos National Lab. (LANL), Los Alamos, NM (United States) (2020)
Ratanaworabhan, P., et al.: Fast lossless compression of scientific floating-point data. In: Data Compression Conference, DCC 2006 (2006)
Liang, X., et al.: Error-controlled lossy compression optimized for high compression ratios of scientific datasets. In: 2018 IEEE International Conference on Big Data (Big Data). IEEE (2018)
Lindstrom, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Vis. Comput. Graph. 20(12), 2674–2683 (2014)
Ainsworth, M., et al.: Multilevel techniques for compression and reduction of scientific data-the multivariate case. SIAM J. Sci. Comput. 41(2), A1278–A1303 (2019)
Ainsworth, M., et al.: Multilevel techniques for compression and reduction of scientific data-quantitative control of accuracy in derived quantities. SIAM J. Sci. Comput. 41(4), A2146–A2171 (2019)
Ainsworth, M., et al.: Multilevel techniques for compression and reduction of scientific data-the unstructured case. SIAM J. Sci. Comput. 42(2), A1402–A1427 (2020)
Choi, J., et al.: Generative fusion data compression. In: Neural Compression: From Information Theory to Applications-Workshop ICLR (2021)
https://github.com/CODARcode/MGARD/blob/master/README_MGARD_GPU.md
Hines, J.: Stepping up to summit. Comput. Sci. Eng. 20(2), 78–82 (2018)
Faghihi, D., et al.: Moment preserving constrained resampling with applications to particle-in-cell methods. J. Comput. Phys. 409, 109317 (2020)
Jackson, M., et al.: Reservoir modeling for flow simulation by use of surfaces, adaptive unstructured meshes, and an overlapping-control-volume finite-element method. SPE Reservoir Eval. Eng. 18(02), 115–132 (2015)
Alted, F.: Blosc, an extremely fast, multi-threaded, meta-compressor library (2017)
Burtscher, M., et al.: FPC: a high-speed compressor for double-precision floating-point data. IEEE Trans. Comput. 58(1), 18–31 (2008)
https://facebook.github.io/zstd/. Accessed 2021
Chen, J., et al.: Understanding performance-quality trade-offs in scientific visualization workflows with lossy compression. In: 2019 IEEE/ACM 5th International Workshop on Data Analysis and Reduction for Big Scientific Data (2019)
Lu, T., et al.: Understanding and modeling lossy compression schemes on HPC scientific data. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE (2018)
Liang, X., et al.: MGARD+: optimizing multi-grid based reduction for efficient scientific data management. IEEE Trans. Comput. (2021, to appear)
Chen, J., et al.: Accelerating Multigrid-Based Hierarchical Scientific Data Refactoring on GPUs. arXiv preprint arXiv:2007.04457 (2020)
Tian, J., et al.: cuSZ: an efficient GPU-based error-bounded lossy compression framework for scientific data. In: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (2020)
Lindstrom, P., et al.: cuZFP. https://github.com/LLNL/zfp/tree/develop/src/cuda_zfp
Wallace, G.K.: The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 38(1), xviii–xxxiv (1992)
Rabbani, M.: JPEG2000: image compression fundamentals, standards and practice. J. Electron. Imaging 11(2), 286 (2002)
Acknowledgement
This research was supported by the ECP CODAR, Sirius-2, and RAPIDS-2 projects through the Advanced Scientific Computing Research (ASCR) program of Department of Energy, and the LDRD project through DRD program of Oak Ridge National Laboratory.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Gong, Q. et al. (2022). Maintaining Trust in Reduction: Preserving the Accuracy of Quantities of Interest for Lossy Compression. In: Nichols, J., et al. Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation. SMC 2021. Communications in Computer and Information Science, vol 1512. Springer, Cham. https://doi.org/10.1007/978-3-030-96498-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-96498-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96497-9
Online ISBN: 978-3-030-96498-6
eBook Packages: Computer ScienceComputer Science (R0)