Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/eScience.2015.40guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

dispel4py: An Agile Framework for Data-Intensive eScience

Published: 31 August 2015 Publication History

Abstract

We present dispel4py a versatile data-intensive kit presented as a standard Python library. It empowers scientists to experiment and test ideas using their familiar rapid-prototyping environment. It delivers mappings to diverse computing infrastructures, including cloud technologies, HPC architectures and specialised data-intensive machines, to move seamlessly into production with large-scale data loads. The mappings are fully automated, so that the encoded data analyses and data handling are completely unchanged. The underpinning model is lightweight composition of fine-grained operations on data, coupled together by data streams that use the lowest cost technology available. These fine-grained workflows are locally interpreted during development and mapped to multiple nodes and systems such as MPI and Storm for production. We explain why such an approach is becoming more essential in order that data-driven research can innovate rapidly and exploit the growing wealth of data while adapting to current technical trends. We show how provenance management is provided to improve understanding and reproducibility, and how a registry supports consistency and sharing. Three application domains are reported and measurements on multiple infrastructures show the optimisations achieved. Finally we present the next steps to achieve scalability and performance.

Cited By

View all
  • (2023)Optimization towards Efficiency and Stateful of dispel4pyProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624281(2021-2032)Online publication date: 12-Nov-2023
  • (2019)Orchestrating Big Data Analysis Workflows in the CloudACM Computing Surveys10.1145/333230152:5(1-41)Online publication date: 13-Sep-2019
  • (2016)AsterismProceedings of the 7th International Workshop on Data-Intensive Computing in the Cloud10.5555/3018100.3018101(1-8)Online publication date: 13-Nov-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
E-SCIENCE '15: Proceedings of the 2015 IEEE 11th International Conference on e-Science
August 2015
590 pages
ISBN:9781467393256

Publisher

IEEE Computer Society

United States

Publication History

Published: 31 August 2015

Author Tags

  1. data intensive application
  2. distributed systems
  3. eSciences workflows
  4. python frameworks
  5. run time analysis

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Optimization towards Efficiency and Stateful of dispel4pyProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624281(2021-2032)Online publication date: 12-Nov-2023
  • (2019)Orchestrating Big Data Analysis Workflows in the CloudACM Computing Surveys10.1145/333230152:5(1-41)Online publication date: 13-Sep-2019
  • (2016)AsterismProceedings of the 7th International Workshop on Data-Intensive Computing in the Cloud10.5555/3018100.3018101(1-8)Online publication date: 13-Nov-2016

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media