Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3078597.3078611acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Public Access

MaDaTS: Managing Data on Tiered Storage for Scientific Workflows

Published: 26 June 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Scientific workflows are increasingly used in High Performance Computing (HPC) environments to manage complex simulation and analyses, often consuming and generating large amounts of data. However, workflow tools have limited support for managing the input, output and intermediate data. The data elements of a workflow are often managed by the user through scripts or other ad-hoc mechanisms. Technology advances for future HPC systems is redefining the memory and storage subsystem by introducing additional tiers to improve the I/O performance of data-intensive applications. These architectural changes introduce additional complexities to managing data for scientific workflows. Thus, we need to manage the scientific workflow data across the tiered storage system on HPC machines. In this paper, we present the design and implementation of MaDaTS (Managing Data on Tiered Storage for Scientific Workflows), a software architecture that manages data for scientific workflows. We introduce Virtual Data Space (VDS), an abstraction of the data in a workflow that hides the complexities of the underlying storage system while allowing users to control data management strategies. We evaluate the data management strategies with real scientific and synthetic workflows, and demonstrate the capabilities of MaDaTS. Our experiments demonstrate the flexibility, performance and scalability gains of MaDaTS as compared to the traditional approach of managing data in scientific workflows.

    References

    [1]
    Asif Akram, J Kewley, and Rob Allan. 2006. A Data centric approach for Workflows. In 2006 10th IEEE International Enterprise Distributed Object Computing Conference Workshops (EDOCW'06).
    [2]
    William Allcock, John Bresnahan, Rajkumar Keimuthu, Michael Link, Catalin Dumitrescu, Ioan Raicu, and Ian Foster. 2005. The Globus Striped GridFTP Framework and Server. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC '05). IEEE Computer Society, Washington, DC, USA, 54.
    [3]
    Javier Rojas Balderrama, Matthieu Simonin, and Cedric Tedeschi. 2015. GinFlow: A Decentralised Adaptive Workflow Execution Manager. Ph.D. Dissertation. Inria.
    [4]
    Chao Chen, Michael Lang, Latchesar Ionkov, and Yong Chen. 2016. Active Burst- Butter: In-Transit Processing Integrated into Hierarchical Storage. In Networking, Architecture and Storage (NAS), 2016 IEEE International Conference on.
    [5]
    Ann L. Chervenak, Robert Schuler, Matei Ripeanu, Muhammad Ali Amer, Shishir Bharathi, Ian Foster, Adriana Iamnitchi, and Carl Kesselman. 2009. The Globus Replica Location Service: Design and Experience. IEEE Trans. Parallel Distrib. Syst. 20, 9 (Sept. 2009).
    [6]
    Christopher Daley, Devarshi Ghoshal, Glenn Lockwood, Sudip Dosanjh, Lavanya Ramakrishnan, and Nicholas Wright. 2016. Performance Characterization of Scientific Workflows for the Optimal Use of Burst Butters. In 11th Workshop on Workflows in Support of Large-Scale Science (WORKS'16).
    [7]
    E. Deelman and A. Chervenak. 2008. Data Management Challenges of Data- Intensive Scientific Workflows. In Cluster Computing and the Grid, 2008. CCGRID '08. 8th IEEE International Symposium on.
    [8]
    Ewa Deelman, Gurmeet Singh, Mei-Hui Su, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi, G Bruce Berriman, John Good, and others. 2005. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13, 3 (2005), 219--237.
    [9]
    Ciprian Docan, Manish Parashar, and Scott Klasky. 2012. DataSpaces: an interaction and coordination framework for coupled simulation workflows. Cluster Computing 15, 2 (2012).
    [10]
    Ian T. Foster, Jens-S. Vockler, Michael Wilde, and Yong Zhao. 2002. Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management (SSDBM '02). IEEE Computer Society.
    [11]
    Michael Franklin, Alon Halevy, and David Maier. 2005. From databases to dataspaces: a new abstraction for information management. ACM Sigmod Record 34, 4 (2005).
    [12]
    Valerie Hendrix, James Fox, Devarshi Ghoshal, and Lavanya Ramakrishnan. 2016. Tigres workflow library: Supporting scientific pipelines on hpc systems. In Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on.
    [13]
    D. Henseler, B. Landsteiner, D. Petesch, C. Wright, and N.J. Wright. 2016. Architecture and Design of Cray DataWarp. In Cray User Group CUG.
    [14]
    Stephen Herbein et al. 2016. Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC '16).
    [15]
    Chen Jin, Scott Klasky, Stephen Hodson, Weikuan Yu, Jay Lofstead, Hasan Abbasi, Karsten Schwan, Matthew Wolf, W Liao, Alok Choudhary, and others. 2008. Adaptive io system (adios). Cray User's Group (2008).
    [16]
    Youngjae Kim, Aayush Gupta, Bhuvan Urgaonkar, Piotr Berman, and Anand Sivasubramaniam. 2011. HybridStore: A Cost-Efficient, High-Performance Storage System Combining SSDs and HDDs. In Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '11). Washington, DC, USA.
    [17]
    David T. Liu and Michael J. Franklin. 2004. GridDB: A Data-centric Overlay for Scientific Grids. In the 30th International Conference on Very Large Data Bases.
    [18]
    N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. 2012. On the role of burst buffers in leadership-class storage systems. In IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).
    [19]
    A. Luckow, L. Lacinski, and S. Jha. 2010. SAGA BigJob: An Extensible and Interoperable Pilot-Job Abstraction for Distributed Applications and Systems. In 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
    [20]
    Henry M. Monti, Ali R. Buff, and Sudharshan S. Vazhkudai. 2013. On Timely Staging of HPC Job Input Data. IEEE Transactions on Parallel and Distributed Systems 24, 9 (2013).
    [21]
    Bill Nitzberg and Virginia Lo. 1991. Distributed Shared Memory: A Survey of Issues and Algorithms. Computer 24, 8 (Aug. 1991).
    [22]
    Ramya Prabhakar, Sudharshan S Vazhkudai, Youngjae Kim, Ali R Buff, Min Li, and Mahmut Kandemir. 2011. Provisioning a multi-tiered data staging area for extreme-scale machines. In 2011 31st International Conference on Distributed Computing Systems (ICDCS).
    [23]
    Arcot Rajasekar, Reagan Moore, Chien-yi Hou, Christopher A Lee, Richard Marciano, Antoine de Torcy, Michael Wan, Wayne Schroeder, Sheau-Yen Chen, Lucas Gilbert, and others. 2010. iRODS Primer: integrated rule-oriented data system. Synthesis Lectures on Information Concepts, Retrieval, and Services 2, 1 (2010), 1--143.
    [24]
    Lavanya Ramakrishnan and Beth Plale. 2010. A Multi-dimensional Classification Model for Scientific Workflow Characteristics. In the 1st International Workshop on Workflow Approaches to New Data-centric Science (Wands '10). ACM.
    [25]
    Melissa Romanus, Fan Zhang, Tong Jin, Qian Sun, Hoang Bui, Manish Parashar, Jong Choi, Saloman Janhunen, Robert Hager, Scott Klasky, Choong-Seock Chang, and Ivan Rodero. 2016. Persistent Data Staging Services for Data Intensive Insitu Scientific Workflows. In Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing (DIDC '16). ACM, New York, NY, USA, 8.
    [26]
    Masahiro Tanaka and Osamu Tatebe. 2010. Pwrake: A Parallel and Distributed Flexible Workflow Management Tool for Wide-area Data Intensive Computing. In the 19th ACM International Symposium on High Performance Distributed Computing (HPDC '10). ACM, New York, NY, USA.
    [27]
    Ian J Taylor, Ewa Deelman, Dennis B Gannon, and Matthew Shields. 2014. Workflows for e-Science: scientific workflows for grids. Springer Publishing Company.
    [28]
    Teng Wang, Sarp Oral, Michael Pritchard, Kevin Vasko, and Weikuan Yu. 2015. Development of a Burst Buffer System for Data-Intensive Applications. CoRR (2015).
    [29]
    Michael Wilde, Mihael Hategan, Justin M Wozniak, Ben Clifford, Daniel S Katz, and Ian Foster. 2011. Swiff: A language for distributed parallel scripting. Parallel Comput. 37, 9 (2011).
    [30]
    Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'12). USENIX Association, Berkeley, CA, USA, 15--28.
    [31]
    F. Zhang, C. Docan, M. Parashar, S. Klasky, N. Podhorszki, and H. Abbasi. 2012. Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform. In 26th International Parallel Distributed Processing Symposium (IPDPS).
    [32]
    G. Zhang, L. Chiu, C. Dickey, L. Liu, P. Muench, and S. Seshadri. 2010. Automated lookahead data migration in SSD-enabled multi-tiered storage systems. In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
    [33]
    Zhe Zhang, Chao Wang, Sudharshan S. Vazhkudai, Xiaosong Ma, Gregory G. Pike, John W. Cobb, and Frank Mueller. 2007. Optimizing Center Performance Through Coordinated Data Staging, Scheduling and Recovery. In the 2007 ACM/IEEE Conference on Supercomputing (SC '07). ACM, New York, NY, USA.
    [34]
    Fang Zheng, Hasan Abbasi, Ciprian Docan, Jay Lofstead, Qing Liu, Scott Klasky, Manish Parashar, Norbert Podhorszki, Karsten Schwan, and Matthew Wolf. 2010. PreDatA--preparatory data analytics on peta-scale machines. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. IEEE.

    Cited By

    View all
    • (2024)Detection of misbehaving individuals in social networks using overlapping communities and machine learningJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.102110(102110)Online publication date: Jul-2024
    • (2023)ADT-FSE: A New Encoder for SZProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607044(1-13)Online publication date: 12-Nov-2023
    • (2022)MashupProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508407(46-60)Online publication date: 2-Apr-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing
    June 2017
    254 pages
    ISBN:9781450346993
    DOI:10.1145/3078597
    © 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 June 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. burst buffer
    2. data management
    3. multi-tiered storage
    4. scientific workflows

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    HPDC '17
    Sponsor:

    Acceptance Rates

    HPDC '17 Paper Acceptance Rate 19 of 100 submissions, 19%;
    Overall Acceptance Rate 166 of 966 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)96
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Detection of misbehaving individuals in social networks using overlapping communities and machine learningJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.102110(102110)Online publication date: Jul-2024
    • (2023)ADT-FSE: A New Encoder for SZProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607044(1-13)Online publication date: 12-Nov-2023
    • (2022)MashupProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508407(46-60)Online publication date: 2-Apr-2022
    • (2021)ADA: An Application-Conscious Data Acquirer for Visual Molecular DynamicsProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3473509(1-9)Online publication date: 9-Aug-2021
    • (2021)Programming Abstractions for Managing Workflows on Tiered Storage SystemsACM Transactions on Storage10.1145/345711917:4(1-21)Online publication date: 25-Oct-2021
    • (2021)Experiences with ReproducibilityProceedings of the 4th International Workshop on Practical Reproducible Evaluation of Computer Systems10.1145/3456287.3465478(3-8)Online publication date: 21-Jun-2021
    • (2021)A lightweight method for evaluating in situ workflow efficiencyJournal of Computational Science10.1016/j.jocs.2020.10125948(101259)Online publication date: Jan-2021
    • (2020)Characterizing Scientific Workflows on HPC Systems using Logs2020 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)10.1109/WORKS51914.2020.00013(57-64)Online publication date: Dec-2020
    • (2020)Modeling the Performance of Scientific Workflow Executions on HPC Platforms with Burst Buffers2020 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER49012.2020.00019(92-103)Online publication date: Oct-2020
    • (2020)Transparent Data Access for Scientific Workflows Across CloudsEuro-Par 2019: Parallel Processing Workshops10.1007/978-3-030-48340-1_62(751-755)Online publication date: 29-May-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media