An algebraic approach for data-centric scientific workflows

E Ogasawara, D de Oliveira, P Valduriez… - Proceedings of the …, 2011 - inria.hal.science
Proceedings of the VLDB Endowment (PVLDB), 2011inria.hal.science
Scientific workflows have emerged as a basic abstraction for structuring and executing
scientific experiments in computational environments. In many situations, these workflows
are computationally and data intensive, thus requiring execution in large-scale parallel
computers. However, parallelization of scientific workflows remains low-level, ad-hoc and
laborintensive, which makes it hard to exploit optimization opportunities. To address this
problem, we propose an algebraic approach (inspired by relational algebra) and a parallel …
Scientific workflows have emerged as a basic abstraction for structuring and executing scientific experiments in computational environments. In many situations, these workflows are computationally and data intensive, thus requiring execution in large-scale parallel computers. However, parallelization of scientific workflows remains low-level, ad-hoc and laborintensive, which makes it hard to exploit optimization opportunities. To address this problem, we propose an algebraic approach (inspired by relational algebra) and a parallel execution model that enable automatic optimization of scientific workflows. We conducted a thorough validation of our approach using both a real oil exploitation application and synthetic data scenarios. The experiments were run in Chiron, a data-centric scientific workflow engine implemented to support our algebraic approach. Our experiments demonstrate performance improvements of up to 226% compared to an ad-hoc workflow implementation.
inria.hal.science