Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3281464.3281467acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
short-paper
Public Access

In situ data-driven adaptive sampling for large-scale simulation data summarization

Published: 11 November 2018 Publication History

Abstract

Recent advancements in high-performance computing have enabled scientists to model various scientific phenomena in great detail. However, the analysis and visualization of the output data from such large-scale simulations are posing significant challenges due to their excessive size and disk I/O bottlenecks. One viable solution to this problem is to create a sub-sampled dataset which is able to preserve the important information of the data and also is significantly smaller in size compared to the raw data. Creating an in situ workflow for generating such intelligently sub-sampled datasets is of prime importance for such simulations. In this work, we propose an information-driven data sampling technique and compare it with two well-known sampling methods to demonstrate the superiority of the proposed method. The in situ performance of the proposed method is evaluated by applying it to the Nyx Cosmology simulation. We compare and contrast the performance of these various sampling algorithms and provide a holistic view of all the methods so that the scientists can choose appropriate sampling schemes based on their analysis requirements.

References

[1]
J. Ahrens, S. Jourdain, P. OLeary, J. Patchett, D. H. Rogers, and M. Petersen. 2014. An Image-Based Approach to Extreme Scale in Situ Visualization and Analysis. In SC14: International Conference for High Performance Computing, Networking, Storage and Analysis. 424--434.
[2]
A. S. Almgren, J. B. Bell, M. J. Lijewski, Z. Lukić, and E. Van Andel. 2013. Nyx: A Massively Parallel AMR Code for Computational Cosmology. apj 765, Article 39 (March 2013), 39 pages. arXiv:astroph.IM/1301.4498
[3]
Andrew C. Bauer, Hasan Abbasi, James Ahrens, Hank Childs, Berk Geveci, Scott Klasky, Kenneth Moreland, Patrick O'Leary, Venkatram Vishwanath, Brad Whitlock, and E. W. Bethel. 2016. In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms. Computer Graphics Forum (2016).
[4]
J. Chanussot, A. Clement, B. Vigouroux, and J. Chabod. 2003. Lossless compact histogram representation for multi-component images: application to histogram equalization. In IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No.03CH37477), Vol. 6. 3940--3942 vol. 6.
[5]
Xiang-Hui Chen, Arthur P. Dempster, and Jun S. Liu. 1994. Weighted Finite Population Sampling to Maximize Entropy. Biometrika 81, 3 (1994), 457--469. http://www.jstor.org/stable/2337119
[6]
Hank Childs. 2015. Data Exploration at the Exascale. Supercomputing frontiers and innovations 2, 3 (2015). http://superfri.org/superfri/article/view/78
[7]
Thomas M. Cover and Joy A. Thomas. 2006. Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience.
[8]
S. Dutta, C. M. Chen, G. Heinlein, H. W. Shen, and J. P. Chen. 2017. In Situ Distribution Guided Analysis and Visualization of Transonic Jet Engine Simulations. IEEE Transactions on Visualization and Computer Graphics 23, 1 (Jan 2017), 811--820.
[9]
S. Dutta, J. Woodring, H. W. Shen, J. P. Chen, and J. Ahrens. 2017. Homogeneity guided probabilistic data summaries for analysis and visualization of large-scale data sets. In 2017 IEEE Pacific Visualization Symposium (PacificVis). 111--120.
[10]
N. Fabian, K. Moreland, D. Thompson, A. C. Bauer, P. Marion, B. Gevecik, M. Rasquin, and K. E. Jansen. 2011pages = 89-96, The ParaView Coprocessing Library: A scalable, general purpose in situ visualization library. In 2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV).
[11]
E. T. Jaynes. 1957. Information Theory and Statistical Mechanics. Phys. Rev. 106, 4 (May 1957), 620--630.
[12]
Chun-Wa Ko, Jon Lee, and Maurice Queyranne. 1995. An Exact Algorithm for Maximum Entropy Sampling. Operations Research 43, 4 (1995), 684--691.
[13]
Sriram Lakshminarasimhan, Neil Shah, Stephane Ethier, Scott Klasky, Rob Latham, Rob Ross, and Nagiza F. Samatova. 2011. Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data. In Euro-Par 2011 Parallel Processing, Emmanuel Jeannot, Raymond Namyst, and Jean Roman (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 366--379.
[14]
Jay F. Lofstead, Scott Klasky, Karsten Schwan, Norbert Podhorszki, and Chen Jin. 2008. Flexible IO and Integration for Scientific Codes Through the Adaptable IO System (ADIOS). In Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments (CLADE '08). ACM, 15--24.
[15]
K. Lu and H. Shen. 2015. A compact multivariate histogram representation for query-driven visualization. In 2015 IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV). 49--56.
[16]
T. T. Nguyen and I. Song. 2016. Centrality clustering-based sampling for big data visualization. In 2016 International Joint Conference on Neural Networks (IJCNN). 1911--1917.
[17]
B. Nouanesengsy, J. Woodring, J. Patchett, K. Myers, and J. Ahrens. 2014. ADR visualization: A generalized framework for ranking large-scale scientific data using Analysis-Driven Refinement. In Large Data Analysis and Visualization (LDAV), 2014 IEEE 4th Symposium on. 43--50.
[18]
Yongjoo Park, Michael J. Cafarella, and Barzan Mozafari. 2015. Visualization-Aware Sampling for Very Large Databases. CoRR abs/1510.03921 (2015). arXiv:1510.03921 http://arxiv.org/abs/1510.03921
[19]
M. C. Shewry and H. P. Wynn. 1987. Maximum entropy sampling. Journal of Applied Statistics 14, 2 (1987), 165--170.
[20]
Yu Su, Gagan Agrawal, Jonathan Woodring, Kary Myers, Joanne Wendelberger, and James Ahrens. 2013. Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices. In Proceedings of the 22Nd International Symposium on High-performance Parallel and Distributed Computing (HPDC '13). ACM, New York, NY, USA, 13--24.
[21]
V. Vishwanath, M. Hereld, and M. E. Papka. 2011. Toward simulation-time data analysis and I/O acceleration on leadership-class systems. In 2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV). 9--14.
[22]
T. Wei, S. Dutta, and H. Shen. 2018. Information Guided Data Sampling and Recovery Using Bitmap Indexing. In 2018 IEEE Pacific Visualization Symposium (PacificVis). 56--65.
[23]
Brad Whitlock, Jean M. Favre, and Jeremy S. Meredith. 2011. Parallel in Situ Coupling of Simulation with a Fully Featured Visualization System. In Proceedings of the 11th Eurographics Conference on Parallel Graphics and Visualization (EGPGV '11). Eurographics Association, 101--109.
[24]
J. Woodring, J. Ahrens, J. Figg, J. Wendelberger, S. Habib, and K. Heitmann. 2011. In-situ Sampling of a Large-scale Particle Simulation for Interactive Visualization and Analysis. In Proceedings of the 13th Eurographics / IEEE - VGTC Conference on Visualization. Eurographics Association, 1151--1160.
[25]
Jonathan Woodring, James Ahrens, Timothy J. Tautges, Tom Peterka, Venkatram Vishwanath, and Berk Geveci. 2013. On-demand Unstructured Mesh Translation for Reducing Memory Pressure During in Situ Analysis. In Proceedings of the 8th International Workshop on Ultrascale Visualization. ACM, Article 3, 8 pages.

Cited By

View all
  • (2023)Accelerated dynamic data reduction using spatial and temporal propertiesInternational Journal of High Performance Computing Applications10.1177/1094342023118050437:5(539-559)Online publication date: 1-Sep-2023
  • (2023)Sub-Linear Time Sampling Approach for Large-Scale Data Visualization Using Reinforcement Learning2023 IEEE 13th Symposium on Large Data Analysis and Visualization (LDAV)10.1109/LDAV60332.2023.00008(12-16)Online publication date: 23-Oct-2023
  • (2023)Uniform-in-phase-space data selection with iterative normalizing flowsData-Centric Engineering10.1017/dce.2023.44Online publication date: 25-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ISAV '18: Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization
November 2018
43 pages
ISBN:9781450365796
DOI:10.1145/3281464
© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Short-paper

Funding Sources

Conference

ISAV'18

Acceptance Rates

ISAV '18 Paper Acceptance Rate 6 of 16 submissions, 38%;
Overall Acceptance Rate 23 of 63 submissions, 37%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)95
  • Downloads (Last 6 weeks)9
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Accelerated dynamic data reduction using spatial and temporal propertiesInternational Journal of High Performance Computing Applications10.1177/1094342023118050437:5(539-559)Online publication date: 1-Sep-2023
  • (2023)Sub-Linear Time Sampling Approach for Large-Scale Data Visualization Using Reinforcement Learning2023 IEEE 13th Symposium on Large Data Analysis and Visualization (LDAV)10.1109/LDAV60332.2023.00008(12-16)Online publication date: 23-Oct-2023
  • (2023)Uniform-in-phase-space data selection with iterative normalizing flowsData-Centric Engineering10.1017/dce.2023.44Online publication date: 25-Apr-2023
  • (2022)VDL-Surrogate: A View-Dependent Latent-based Model for Parameter Space Exploration of Ensemble SimulationsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.3209413(1-11)Online publication date: 2022
  • (2022)Exploring Data Reduction Techniques for Additive Manufacturing Analysis2022 IEEE/ACM 8th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD)10.1109/DRBSD56682.2022.00008(21-28)Online publication date: Nov-2022
  • (2022)Estimating Potential Error in Sampling Interpolation2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020913(3153-3162)Online publication date: 17-Dec-2022
  • (2022)Sampling for Scientific Data Analysis and ReductionIn Situ Visualization for Computational Science10.1007/978-3-030-81627-8_2(11-36)Online publication date: 5-May-2022
  • (2021)Accelerating In-Transit Co-Processing for Scientific Simulations Using Region-Based Data-Driven AnalysisAlgorithms10.3390/a1405015414:5(154)Online publication date: 12-May-2021
  • (2021)In Situ Climate Modeling for Analyzing Extreme Weather EventsISAV'21: In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization10.1145/3490138.3490142(18-23)Online publication date: 15-Nov-2021
  • (2021)In-Situ Spatial Inference on Climate Simulations with Sparse Gaussian ProcessesISAV'21: In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization10.1145/3490138.3490140(31-36)Online publication date: 15-Nov-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media