Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows

Published: 18 May 2020 Publication History
  • Get Citation Alerts
  • Abstract

    The dramatic increase in the scale of current and planned high-end HPC systems is leading new challenges, such as the growing costs of data movement and IO, and the reduced mean time between failures (MTBF) of system components. In-situ workflows, i.e., executing the entire application workflows on the HPC system, have emerged as an attractive approach to address data-related challenges by moving computations closer to the data, and staging-based frameworks have been effectively used to support in-situ workflows at scale. However, the resilience of these staging-based solutions has not been addressed, and they remain susceptible to expensive data failures. Furthermore, naive use of data resilience techniques such as n-way replication and erasure codes can impact latency and/or result in significant storage overheads. In this article, we present CoREC, a scalable and resilient in-memory data staging runtime for large-scale in-situ workflows. CoREC uses a novel hybrid approach that combines dynamic replication with erasure coding based on data access patterns. It also leverages multiple levels of replications and erasure coding to support diverse data resiliency requirements. Furthermore, the article presents optimizations for load balancing and conflict-avoiding encoding, and a low overhead, lazy data recovery scheme. We have implemented the CoREC runtime and have deployed with the DataSpaces staging service on leadership class computing machines and present an experimental evaluation in the article. The experiments demonstrate that CoREC can tolerate in-memory data failures while maintaining low latency and sustaining high overall storage efficiency at large scales.

    References

    [1]
    Leonardo Arturo, Bautista Gomez, Naoya Maruyama, and Franck Cappello. 2010. Distributed diskless checkpoint for large scale systems. In Proceedings of the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid’10). 263--272.
    [2]
    Guillaume Aupy, Olivier Beaumont, and Lionel Eyraud-Dubois. 2019. Sizing and partitioning strategies for burst-buffers to reduce IO contention. In Proceedings of the 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS’19). 631--640.
    [3]
    Leonardo Bautista-Gomez, Ana Gainaru, Swann Perarnau, Devesh Tiwari, Saurabh Gupta, Christian Engelmann, Franck Cappello, and Marc Snir. 2016. Reducing waste in extreme scale systems through introspective analysis. In Proceedings of the 30th IEEE International Parallel and Distributed Processing Symposium (IPDPS’16). 631--640.
    [4]
    J. C. Bennett, H. Abbasi, P.-T. Bremer, R. Grout, A. Gyulassy, Tong Jin, S. Klasky, H. Kolla, M. Parashar, V. Pascucci, P. Pebay, D. Thompson, Hongfeng Yu, Fan Zhang, and J. Chen. 2012. Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’12). 1--9.
    [5]
    Janine C. Bennett, Vaidyanathan Krishnamoorthy, Shusen Liu, Ray W. Grout, Evatt R. Hawkes, Jacqueline H. Chen, Jason Shepherd, Valerio Pascucci, and Peer-Timo Bremer. 2011. Feature-based statistical analysis of combustion simulation data. IEEE Trans. Vis. Comput. Graph. 17, 12 (2011), 1822--1831.
    [6]
    E. Wes Bethel, Martin Greenwald, Kersten Kleese van Dam, Manish Parashar, Stefan M. Wild, and H. Steven Wiley. 2016. Report of the DOE workshop on management, analysis, and visualization of experimental and observational data—The convergence of data and computing. In 2016 IEEE 12th International Conference on e-Science (e-Science). 213--222.
    [7]
    Wesley Bland, Aurelien Bouteiller, Thomas Herault, George Bosilca, and Jack Dongarra. 2013. Post-failure recovery of MPI communication capability: Design and rationale. Int. J. High Perform. Comput. Applic. 27, 3 (2013), 244--254.
    [8]
    Wesley Bland, Aurelien Bouteiller, Thomas Herault, Joshua Hursey, George Bosilca, and Jack Dongarra. 2012. An evaluation of user-level failure mitigation support in MPI. In Proceedings of the 19th European MPI Users’ Group Meeting (EuroMPI’12).
    [9]
    Franck Cappello, Geist Al, William Gropp, Sanjay Kale, Bill Kramer, and Marc Snir. 2014. Toward exascale resilience: 2014 update. Supercomput. Front. Innov. Int. J. 1 (2014), 5--28.
    [10]
    Alexis Champsaur, Jay Lofstead, Jai Dayal, Matthew Wolf, Greg Eisenhauer, Patrick Widener, and Ada Gavrilovska. 2017. SmartBlock: An approach to standardizing in situ workflow components. In Proceedings of the IEEE 31st International Symposium on Parallel and Distributed Processing Symposium Workshops (IPDPSW’17).
    [11]
    J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo. 2009. Terascale direct numerical simulations of turbulent combustion using S3D. Comput. Sci. Discov. 2, 1 (2009).
    [12]
    Asaf Cidon, Ryan Stutsman, Stephen Rumble, Sachin Katti, John Ousterhout, and Mendel Rosenblum. 2013. MinCopysets: Derandomizing replication in cloud storage. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI’13).
    [13]
    Ciprian Docan, Manish Parashar, and Scott Klasky. 2010. DataSpaces: An interaction and coordination framework for coupled simulation workflows. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC’10). 25--36.
    [14]
    Ciprian Docan, Manish Parashar, and Scott Klasky. 2012. DataSpaces: An interaction and coordination framework for coupled simulation workflows. Cluster Comput. 15, 2 (01 June 2012), 163--181.
    [15]
    Ciprian Docan, Fan Zhang, Tong Jin, Hoang Bui, Qian Sun, Julian Cummings, Norbert Podhorszki, Scott Klasky, and Manish Parashar. 2014. ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing. Concurr. Comput. Pract. Exper. 27, 14 (2014).
    [16]
    Shaohua Duan, Pradeep Subedi, Keita Teranishi, Philip Davis, Hemanth Kolla, Marc Gamell, and Manish Parashar. 2018. Scalable data resilience for in-memory data staging. In Proceedings of the 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS’18). 105--115.
    [17]
    Ifeanyi P. Egwutuoha, David Levy, Bran Selic, and Shiping Chen. 2013. A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems. J. Supercomput. 65, 3 (2013), 1302--1326.
    [18]
    James Elliott, Kishor Kharbas, David Fiala, Frank Mueller, Kurt Ferreira, and Christian Engelmann. 2012. Combining partial redundancy and checkpointing for HPC. In Proceedings of the 32nd IEEE International Conference on Distributed Computing Systems (ICDCS’12).
    [19]
    Marc Gamell, Keita Teranishi, Michael A. Heroux, Jackson Mayo, Hemanth Kolla, Jacqueline Chen, and Manish Parashar. 2015. Local recovery and failure masking for stencil-based applications at extreme scales. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’15).
    [20]
    Shen Gao, Bingsheng He, and Jianliang Xu. 2015. Real-time in-memory checkpointing for future hybrid memory systems. In Proceedings of the 29th ACM International Conference on Supercomputing. 263--272.
    [21]
    Leonardo Bautista Gomez, Dimitri Komatitsch, and Naoya Maruyama. 2012. FTI: High performance fault tolerance interface for hybrid systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 728--740.
    [22]
    Saurabh Gupta, Tirthak Patel, Christian Engelmann, and Devesh Tiwari. 2017. Failures in large scale systems: Long-term measurement, analysis, and implications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’17).
    [23]
    Saurabh Gupta, Devesh Tiwari, Christopher Jantzi, James Rogers, and Don Maxwell. 2015. Understanding and exploiting spatial properties of system failures on extreme-scale HPC systems. In Proceedings of the 45th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’15). 37--44.
    [24]
    Anthony Kougkas, Hariharan Devarajan, Xian-He Sun, and Jay Lofstead. 2018. Harmonia: An interference-aware dynamic I/O scheduler for shared non-volatile burst buffers. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’18).
    [25]
    Anthony Kougkas, Matthieu Dorier, Rob Latham, Rob Ross, and Xian-He Sun. 2016. Leveraging burst buffer coordination to prevent I/O interference. In Proceedings of the IEEE 12th International Conference on e-Science (e-Science’16).
    [26]
    Jiaqi Liu and Gagan Agrawal. 2017. Supporting fault-tolerance in presence of in-situ analytics. In Proceedings of the 17th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid’17). 304--313.
    [27]
    Jay Lofstead, Jai Dayaly, Ivo Jimenezz, and Carlos Maltzahn. 2014. Efficient, failure resilient transactions for parallel and distributed computing. In Proceedings of the International Workshop on Data Intensive Scalable Computing Systems. 17--24.
    [28]
    Bin Nie, Devesh Tiwari, Saurabh Gupta, Evgenia Smirni, and James H. Rogers. 2016. A large-scale study of soft-errors on GPUs in the field. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’16). 519--530.
    [29]
    Bin Nie, Ji Xue, Saurabh Gupta, Tirthak Patel, Christian Engelmann, Evgenia Smirni, and Devesh Tiwari. 2018. Machine learning models for GPU error prediction in a large scale HPC system. In Proceedings of the 48th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’18). 95--106.
    [30]
    I. S. Reed and G. Solomon. 1960. Polynomial codes over certain finite fields. J. Soc. Industr. Appl. Math. Vol. 8, 2 (1960), 300.
    [31]
    Hyogi Sim, Youngjae Kim, Sudharshan S. Vazhkudai, Devesh Tiwari, Ali Anwar, Ali R. Butt, and Lavanya Ramakrishnan. 2015. AnalyzeThis: An analysis workflow-aware storage system. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’15).
    [32]
    James S. Plank, J. Luo, Catherine D. Schuman, L. Xu, and Z. Wilcox-O’Hearn. 2009. A performance evaluation and examination of open-source erasure coding libraries for storage. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’09). 263--272.
    [33]
    Pradeep Subedi, Philip Davis, Shaohua Duan, Scott Klasky, Hemanth Kolla, and Manish Parashar. 2018. Stacker: An autonomic data movement engine for extreme-scale data staging-based in-situ workflows. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 73.
    [34]
    Pradeep Subedi and Xubin He. 2013. A comprehensive analysis of XOR-based erasure codes tolerating 3 or more concurrent failures. In Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW’13).
    [35]
    Kun Tang, Ping Huang, Xubin He, Tao Lu, Sudharshan S. Vazhkudai, and Devesh Tiwari. 2017. Toward managing HPC burst buffers effectively: Draining strategy to regulate bursty I/O behavior. In Proceedings of the IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’17).
    [36]
    Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter J. Desnoyers, and Yan Solihin. 2013. Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 119--132.
    [37]
    U.S. Department of Energy, Office of Science. 2018. Exascale Computing Project. Retrieved from https://www.exascaleproject.org/exascale-computing-project/.
    [38]
    Dirk Vogt, Cristiano Giuffrida, Herbert Bos, and Andrew S. Tanenbaum. 2013. Techniques for efficient in-memory checkpointing. In Proceedings of the 9th Workshop on Hot Topics in Dependable Systems. 263--272.
    [39]
    Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). USENIX Association, 307--320.
    [40]
    Matt M. T. Yiu, Helen H. W. Chan, and Patrick P. C. Lee. 2017. Erasure coding for small objects in in-memory KV storage. In Proceedings of the 10th ACM International Systems and Storage Conference (SYSTOR’17).
    [41]
    Hongfeng Yu, Chaoli Wang, Ray W. Grout, Jacqueline H. Chen, and Kwan-Liu Ma. 2010. In situ visualization for large-scale combustion simulations. IEEE Comput. Graph. Applic. 3 (2010), 45--57.
    [42]
    Heng Zhang, Mingkai Dong, and Haibo Chen. 2016. Efficient and available in-memory KV-store with hybrid erasure coding and replication. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16).
    [43]
    Fang Zheng, H. Abbasi, C. Docan, J. Lofstead, Qing Liu, S. Klasky, M. Parashar, N. Podhorszki, K. Schwan, and M. Wolf. 2010. PreDatAPreparatory data analytics on peta-scale machines. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing (IPDPS’10). 1--12.

    Cited By

    View all
    • (2023)Accelerating In Situ Analysis using Non-volatile MemoryProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624176(995-1004)Online publication date: 12-Nov-2023
    • (2023)RAPIDS: Reconciling Availability, Accuracy, and Performance in Managing Geo-Distributed Scientific DataProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592983(87-100)Online publication date: 7-Aug-2023
    • (2023)Towards elastic in situ analysis for high-performance computing simulationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.02.014177(106-116)Online publication date: Jul-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Parallel Computing
    ACM Transactions on Parallel Computing  Volume 7, Issue 2
    June 2020
    182 pages
    ISSN:2329-4949
    EISSN:2329-4957
    DOI:10.1145/3400890
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 May 2020
    Online AM: 07 May 2020
    Accepted: 01 February 2020
    Revised: 01 December 2019
    Received: 01 June 2019
    Published in TOPC Volume 7, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Data resilience
    2. data staging
    3. erasure codes
    4. in-situ workflows
    5. replication

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)131
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Accelerating In Situ Analysis using Non-volatile MemoryProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624176(995-1004)Online publication date: 12-Nov-2023
    • (2023)RAPIDS: Reconciling Availability, Accuracy, and Performance in Managing Geo-Distributed Scientific DataProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592983(87-100)Online publication date: 7-Aug-2023
    • (2023)Towards elastic in situ analysis for high-performance computing simulationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.02.014177(106-116)Online publication date: Jul-2023
    • (2023)Adaptive elasticity policies for staging-based in situ visualizationFuture Generation Computer Systems10.1016/j.future.2022.12.010142(75-89)Online publication date: May-2023
    • (2023)Dynamic Data-Driven Application Systems for Reservoir Simulation-Based Optimization: Lessons Learned and Future TrendsHandbook of Dynamic Data Driven Applications Systems10.1007/978-3-031-27986-7_11(287-330)Online publication date: 6-Sep-2023
    • (2022)Colza: Enabling Elastic In Situ Visualization for High-performance Computing Simulations2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00059(538-548)Online publication date: May-2022
    • (2021)An Adaptive Elasticity Policy For Staging Based In-Situ Processing2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS)10.1109/WORKS54523.2021.00010(33-41)Online publication date: Nov-2021

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media