Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2749246.2749268acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps

Published: 15 June 2015 Publication History

Abstract

Neither the memory capacity, memory access speeds, nor disk bandwidths are increasing at the same rate as the computing power in current and upcoming parallel machines. This has led to considerable recent research on in-situ data analytics. However, many open questions remain on how to perform such analytics, especially in memory constrained systems. Building on our earlier work that demonstrated bitmap indices (bitmaps) can be a suitable summary structure for key (offline) analytics tasks, this paper develops an in-situ analysis approach that performs data reduction (such as time-steps selection) using just bitmaps, and subsequently, stores only the selected bitmaps for post-analysis. We construct compressed bitmaps on the fly, show that many kinds of in-situ analyses can be supported by bitmaps without requiring the original data (and thus reducing memory requirements for in-situ analysis), and instead of writing the original simulation output, we only write the selected bitmaps to the disks (reducing the I/O requirements). We also demonstrate that we are able to use bitmaps for key offline analysis steps. We extensively evaluate our method with different simulations and applications, and demonstrate the effectiveness of our approach.

References

[1]
Heat3d simulation. http://http://dournac.org/info/parallel_heat3d.
[2]
Sameh Abdulah, Yu Su, and Gagan Agrawal. Accelerating data mining on incomplete datasets by bitmaps-based missing value imputation. In Proceedings of the 7th International Conference on Advances in Databases, Knowledge, and Data Applications, 2015.
[3]
James Ahrens, Sébastien Jourdain, Patrick O'Leary, John Patchett, D Rogers, and M Peterson. An imagebased approach to extreme scale in situ visualization and analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 2014.
[4]
G. Antoshenkov. Byte-aligned bitmap compression. In Data Compression Conference, 1995. DCC'95. Proceedings, page 476. IEEE, 1995.
[5]
Ayan Biswas, Soumya Dutta, Han-Wei Shen, and Jonathan Woodring. An information-aware framework for exploring multivariate data sets. IEEE Transactions on Visualization and Computer Graphics, 19(12):2683--2692, 2013.
[6]
Udeepta D Bordoloi and H-W Shen. View selection for volume rendering. In Visualization, 2005. VIS 05. IEEE, pages 487--494. IEEE, 2005.
[7]
J. Chou, K. Wu, O. Rübel, M.H.J.Q. Prabhat, B. Austin, E.W. Bethel, R.D. Ryne, and A. Shoshani. Parallel index and query for large scale data analysis. In SC, 2011.
[8]
Thomas M Cover and Joy A Thomas. Elements of information theory. John Wiley & Sons, 2012.
[9]
M.C.F. de Oliveira and H. Levkowitz. From visual data exploration to visual data mining: a survey. Visualization and Computer Graphics, IEEE Transactions on, 9(3):378--394, 2003.
[10]
Matthieu Dorier, Roberto Sisneros, Tom Peterka, Gabriel Antoniu, and Dave Semeraro. A nonintrusive, adaptable and user-friendly in situ visualization framework. In Proceedings of the third international symposium on Large-Scale Data Analysis and Visualization. IEEE, 2013.
[11]
Stefan Gumhold. Maximum entropy light source placement. In Visualization, 2002. VIS 2002. IEEE, pages 275--282. IEEE, 2002.
[12]
Jun He, John Bent, Aaron Torres, Gary Grider, Garth Gibson, Carlos Maltzahn, and Xian-He Sun. I/o acceleration with pattern detection. In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, pages 25--36. ACM, 2013.
[13]
H. Jänicke, M. Bottinger, and G. Scheuermann. Brushing of attribute clouds for the visualization of multivariate data. Visualization and Computer Graphics, IEEE Transactions on, 14(6):1459--1466, 2008.
[14]
PW Jones, PH Worley, Y. Yoshida, JB White III, and J. Levesque. Practical performance portability in the parallel ocean program (pop). Concurrency and Computation: Practice and Experience, 17(10):1317--1327, 2005.
[15]
Ian Karlin, Jeff Keasler, and Rob Neely. Lulesh 2.0 updates and changes. Technical Report LLNL-TR-641973, August 2013.
[16]
Jinoh Kim, Hasan Abbasi, Luis Chacon, Ciprian Docan, Scott Klasky, Qing Liu, Norbert Podhorszki, Arie Shoshani, and Kesheng Wu. Parallel in situ indexing for data-intensive computing. In Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on, pages 65--72. IEEE, 2011.
[17]
Scott Klasky, Hasan Abbasi, Jeremy Logan, Manish Parashar, Karsten Schwan, Arie Shoshani, Matthew Wolf, Sean Ahern, Ilkay Altintas, Wes Bethel, et al. In situ data processing for extreme-scale computing. In Proc. Conf. Scientific Discovery through Advanced Computing Program (SciDAC.11), 2011.
[18]
Peter M Kogge and Timothy J Dysart. Using the top500 to trace and project technology and architecture trends. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, page 28. ACM, 2011.
[19]
Peter M. Kogge and John Shalf. Exascale computing trends: Adjusting to the "new normal" for computer architecture. Computing in Science and Engineering, 15(6):16--26, 2013.
[20]
Sriram Lakshminarasimhan, David A Boyuka, Saurabh V Pendse, Xiaocheng Zou, John Jenkins, Venkatram Vishwanath, Michael E Papka, and Nagiza F Samatova. Scalable in situ scientific data encoding for analytical query processing. In Proceedings of the 22nd international symposium on HPDC, pages 1--12. ACM, 2013.
[21]
Aaditya G Landge, Valerio Pascucci, Attila Gyulassy, Janine C Bennett, Hemanth Kolla, Jacqueline Chen, and Peer-Timo Bremer. In-situ feature extraction of large scale combustion simulations using segmented merge trees. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1020--1031. IEEE Press, 2014.
[22]
Teng-Yok Lee and Han-Wei Shen. Visualization and exploration of temporal trend relationships in multivariate time-varying data. Visualization and Computer Graphics, IEEE Transactions on, 15(6):1359--1366, 2009.
[23]
Jay F Lofstead, Scott Klasky, Karsten Schwan, Norbert Podhorszki, and Chen Jin. Flexible io and integration for scientific codes through the adaptable io system (adios). In Proceedings of the 6th international workshop on Challenges of large applications in distributed environments, pages 15--24. ACM, 2008.
[24]
Robert McLay, Doug James, Si Liu, John Cazes, and William Barth. A user-friendly approach for tuning parallel file operations. In Proceedings of the international conference on SC, pages 229--236. IEEE, 2014.
[25]
James Oly and Daniel A Reed. Markov model prediction of i/o requests for scientific applications. In Proceedings of the 16th international conference on Supercomputing, pages 147--155. ACM, 2002.
[26]
P. O'Neil and D. Quass. Improved query performance with variant indexes. In ACM Sigmod Record, volume 26, pages 38--49. ACM, 1997.
[27]
V. Pascucci and R.J. Frank. Global static indexing for real-time exploration of very large regular grids. In Supercomputing, ACM/IEEE 2001 Conference, pages 45--45. IEEE, 2001.
[28]
Tom Peterka, Robert Ross, Attila Gyulassy, Valerio Pascucci, Wesley Kendall, Han-Wei Shen, Teng-Yok Lee, and Abon Chaudhuri. Scalable parallel building blocks for custom data analysis. In Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on, pages 105--112. IEEE, 2011.
[29]
Yu Su, Gagan Agrawal, and Jonathan Woodring. Indexing and parallel query processing support for visualizing climate datasets. In Proceedings of the 41th IEEE/ACM International Conference on Parallel Processing, pages 249--258. IEEE, 2012.
[30]
Yu Su, Gagan Agrawal, Jonathan Woodring, Ayan Biswas, and Han-Wei Shen. Supporting correlation analysis on scientific datasets in parallel and distributed settings. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, pages 191--202. ACM, 2014.
[31]
Xin Tong, Teng-Yok Lee, and Han-Wei Shen. Salient time steps selection from large scale time-varying data sets with dynamic time warping. In Large Data Analysis and Visualization (LDAV), 2012 IEEE Symposium on, pages 49--56. IEEE, 2012.
[32]
Pere-Pau Vázquez, Miquel Feixas, Mateu Sbert, and Wolfgang Heidrich. Automatic view selection using viewpoint entropy and its application to image-based modelling. In Computer Graphics Forum, volume 22, pages 689--700. Wiley Online Library, 2003.
[33]
Ivan Viola, Miquel Feixas, Mateu Sbert, and Meister Eduard Groller. Importance-driven focus of attention. Visualization and Computer Graphics, IEEE Transactions on, 12(5):933--940, 2006.
[34]
Venkatram Vishwanath, Mark Hereld, and Michael E. Papka. Toward simulation-time data analysis and i/o acceleration on leadership-class systems. In Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on, pages 9--14. IEEE, 2011.
[35]
Chaoli Wang, Hongfeng Yu, Ray W Grout, Kwan-Liu Ma, and Jacqueline H Chen. Analyzing information transfer in time-varying multivariate data. In Pacific Visualization Symposium (PacificVis), 2011 IEEE, pages 99--106. IEEE, 2011.
[36]
Chaoli Wang, Hongfeng Yu, and Kwan-Liu Ma. Importance-driven time-varying data visualization. Visualization and Computer Graphics, IEEE Transactions on, 14(6):1547--1554, 2008.
[37]
Jim Jing-Yan Wang, Xiaolei Wang, and Xin Gao. Non-negative matrix factorization by maximizing correntropy for cancer clustering. BMC bioinformatics, 14(1):107, 2013.
[38]
Yi Wang, Yu Su, and Gagan Agrawal. A novel approach for approximate aggregations over arrays. In Submission to the 18th International Conference on Extending Database Technology, 2015.
[39]
Yi Wang, Yu Su, Gagan Agrawal, and Tantan Liu. Scisd: Novel subgroup discovery over scientific datasets using bitmap indices. In Proceedings of Ohio State CSE Technical Report, 2015.
[40]
Pak Chung Wong and R. Daniel Bergeron. 30 years of multidimensional multivariate visualization. In Scientific Visualization, Overviews, Methodologies, and Techniques, pages 3--33, Washington, DC, USA, 1997. IEEE Computer Society.
[41]
K. Wu, E.J. Otoo, and A. Shoshani. Compressing bitmap indexes for faster search operations. In Scientific and Statistical Database Management, 2002. Proceedings. 14th International Conference on, pages 99--108. IEEE, 2002.
[42]
K. Wu, K. Stockinger, and A. Shoshani. Breaking the curse of cardinality on bitmap indexes. In Scientific and Statistical Database Management, pages 348--365. Springer, 2008.
[43]
Kesheng Wu, W. Koegler, J. Chen, and A. Shoshani. Using bitmap index for interactive exploration of large datasets. In 15th International Conference on Scientific and Statistical Database Management, 2003, pages 65--74. IEEE, July 2003.
[44]
Di Yang, E.A. Rundensteiner, and M.O. Ward. Analysis guided visual exploration of multivariate data. In Visual Analytics Science and Technology, 2007. VAST 2007. IEEE Symposium on, pages 83--90, 2007.
[45]
Fang Zheng, Hongfeng Yu, Can Hantas, Matthew Wolf, Greg Eisenhauer, Karsten Schwan, Hasan Abbasi, and Scott Klasky. Goldrush: resource efficient in situ scientific data analytics using fine-grained interference aware execution. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, page 78. ACM, 2013.

Cited By

View all
  • (2023)Workload-Aware Cache Management of Bitmap IndicesProceedings of the IEEE/ACM 10th International Conference on Big Data Computing, Applications and Technologies10.1145/3632366.3632386(1-10)Online publication date: 4-Dec-2023
  • (2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
  • (2022)Hierarchical Bitmap Indexing for Range Queries on Multidimensional ArraysDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_40(509-525)Online publication date: 8-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '15: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing
June 2015
296 pages
ISBN:9781450335508
DOI:10.1145/2749246
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bitmaps
  2. correlation analysis
  3. data reduction
  4. in-situ analysis
  5. time-steps selection

Qualifiers

  • Research-article

Funding Sources

  • Lucy Nowell

Conference

HPDC'15
Sponsor:

Acceptance Rates

HPDC '15 Paper Acceptance Rate 19 of 116 submissions, 16%;
Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Workload-Aware Cache Management of Bitmap IndicesProceedings of the IEEE/ACM 10th International Conference on Big Data Computing, Applications and Technologies10.1145/3632366.3632386(1-10)Online publication date: 4-Dec-2023
  • (2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
  • (2022)Hierarchical Bitmap Indexing for Range Queries on Multidimensional ArraysDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_40(509-525)Online publication date: 8-Apr-2022
  • (2021)Caching Support for Range Query Processing on Bitmap IndicesProceedings of the 33rd International Conference on Scientific and Statistical Database Management10.1145/3468791.3468800(49-60)Online publication date: 6-Jul-2021
  • (2021)Adaptive Spatially Aware I/O for Multiresolution Particle Data Layouts2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00063(547-556)Online publication date: May-2021
  • (2020)MoHA: A Composable System for Efficient In-Situ Analytics on Heterogeneous HPC SystemsSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00086(1-16)Online publication date: Nov-2020
  • (2019)STASH : Fast Hierarchical Aggregation Queries for Effective Visual Spatiotemporal Explorations2019 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2019.8891029(1-11)Online publication date: Sep-2019
  • (2019)DeStagerDistributed and Parallel Databases10.1007/s10619-018-7235-337:1(209-231)Online publication date: 1-Mar-2019
  • (2018)High-Performance Agent-Based Modeling Applied to Vocal Fold Inflammation and RepairFrontiers in Physiology10.3389/fphys.2018.003049Online publication date: 12-Apr-2018
  • (2018)Information Guided Data Sampling and Recovery Using Bitmap Indexing2018 IEEE Pacific Visualization Symposium (PacificVis)10.1109/PacificVis.2018.00016(56-65)Online publication date: Apr-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media