Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/SBAC-PADW.2014.32guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Exploratory Analysis of Raw Data Files through Dataflows

Published: 22 October 2014 Publication History

Abstract

Scientific applications generate raw data files in very large scale. Most of these files follow a standard format established by the domain area application, like HDF5, Net CDF and FITS. These formats are supported by a variety of programming languages, libraries and programs. Since they are in large scale, analyzing these files require writing a specific program. Generic data analysis systems like database management systems (DBMS) are not suited because of data loading and data transformation in large scale. Recently there have been several proposals for indexing and querying raw data files without the overhead of using a DBMS, such as noDB, RAW and Fast Bit. Their goal is to offer query support to the raw data file after a scientific program has generated it. However, these solutions are focused on the analysis of one single large file. When a large number of files are all related and required to the evaluation of one scientific hypothesis, the relationships must be managed manually or by writing specific programs. The proposed approach takes advantage of existing provenance data support from Scientific Workflow Management Systems (SWfMS). When scientific applications are managed by SWfMS, the data is registered along the provenance database at runtime. Therefore, this provenance data may act as a description of theses files. When the SWfMS is dataflow aware, it registers domain data all in the same database. This resulting database becomes an important access method to the large number of files that are generated by the scientific workflow execution. This becomes a complementary approach to the single raw data file analysis support. In this work, we present our dataflow approach for analyzing data from several raw data files and evaluate it with the Montage application from the astronomy domain.

Cited By

View all
  • (2015)AQUAdexProceedings, Part II, of the 15th International Conference on Algorithms and Architectures for Parallel Processing - Volume 952910.1007/978-3-319-27122-4_7(92-105)Online publication date: 18-Nov-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
SBAC-PADW '14: Proceedings of the 2014 International Symposium on Computer Architecture and High Performance Computing Workshop
October 2014
143 pages
ISBN:9781479970148

Publisher

IEEE Computer Society

United States

Publication History

Published: 22 October 2014

Author Tags

  1. data analysis
  2. high performance computing
  3. in situ processing
  4. raw data processing
  5. scientific workflows

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2015)AQUAdexProceedings, Part II, of the 15th International Conference on Algorithms and Architectures for Parallel Processing - Volume 952910.1007/978-3-319-27122-4_7(92-105)Online publication date: 18-Nov-2015

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media