research-article

Public Access

CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows

Authors:

Pradeep Subedi,

Keita Teranishi,

Manish ParasharAuthors Info & Claims

ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 2

Article No.: 12, Pages 1 - 29

https://doi.org/10.1145/3391448

Published: 18 May 2020 Publication History

All formats PDF

Abstract

The dramatic increase in the scale of current and planned high-end HPC systems is leading new challenges, such as the growing costs of data movement and IO, and the reduced mean time between failures (MTBF) of system components. In-situ workflows, i.e., executing the entire application workflows on the HPC system, have emerged as an attractive approach to address data-related challenges by moving computations closer to the data, and staging-based frameworks have been effectively used to support in-situ workflows at scale. However, the resilience of these staging-based solutions has not been addressed, and they remain susceptible to expensive data failures. Furthermore, naive use of data resilience techniques such as n-way replication and erasure codes can impact latency and/or result in significant storage overheads. In this article, we present CoREC, a scalable and resilient in-memory data staging runtime for large-scale in-situ workflows. CoREC uses a novel hybrid approach that combines dynamic replication with erasure coding based on data access patterns. It also leverages multiple levels of replications and erasure coding to support diverse data resiliency requirements. Furthermore, the article presents optimizations for load balancing and conflict-avoiding encoding, and a low overhead, lazy data recovery scheme. We have implemented the CoREC runtime and have deployed with the DataSpaces staging service on leadership class computing machines and present an experimental evaluation in the article. The experiments demonstrate that CoREC can tolerate in-memory data failures while maintaining low latency and sustaining high overall storage efficiency at large scales.

References

[1]

Leonardo Arturo, Bautista Gomez, Naoya Maruyama, and Franck Cappello. 2010. Distributed diskless checkpoint for large scale systems. In Proceedings of the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid’10). 263--272.

[2]

Guillaume Aupy, Olivier Beaumont, and Lionel Eyraud-Dubois. 2019. Sizing and partitioning strategies for burst-buffers to reduce IO contention. In Proceedings of the 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS’19). 631--640.

[3]

Leonardo Bautista-Gomez, Ana Gainaru, Swann Perarnau, Devesh Tiwari, Saurabh Gupta, Christian Engelmann, Franck Cappello, and Marc Snir. 2016. Reducing waste in extreme scale systems through introspective analysis. In Proceedings of the 30th IEEE International Parallel and Distributed Processing Symposium (IPDPS’16). 631--640.

[4]

J. C. Bennett, H. Abbasi, P.-T. Bremer, R. Grout, A. Gyulassy, Tong Jin, S. Klasky, H. Kolla, M. Parashar, V. Pascucci, P. Pebay, D. Thompson, Hongfeng Yu, Fan Zhang, and J. Chen. 2012. Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’12). 1--9.

[5]

Janine C. Bennett, Vaidyanathan Krishnamoorthy, Shusen Liu, Ray W. Grout, Evatt R. Hawkes, Jacqueline H. Chen, Jason Shepherd, Valerio Pascucci, and Peer-Timo Bremer. 2011. Feature-based statistical analysis of combustion simulation data. IEEE Trans. Vis. Comput. Graph. 17, 12 (2011), 1822--1831.

Digital Library

[6]

E. Wes Bethel, Martin Greenwald, Kersten Kleese van Dam, Manish Parashar, Stefan M. Wild, and H. Steven Wiley. 2016. Report of the DOE workshop on management, analysis, and visualization of experimental and observational data—The convergence of data and computing. In 2016 IEEE 12th International Conference on e-Science (e-Science). 213--222.

[7]

Wesley Bland, Aurelien Bouteiller, Thomas Herault, George Bosilca, and Jack Dongarra. 2013. Post-failure recovery of MPI communication capability: Design and rationale. Int. J. High Perform. Comput. Applic. 27, 3 (2013), 244--254.

Digital Library

[8]

Wesley Bland, Aurelien Bouteiller, Thomas Herault, Joshua Hursey, George Bosilca, and Jack Dongarra. 2012. An evaluation of user-level failure mitigation support in MPI. In Proceedings of the 19th European MPI Users’ Group Meeting (EuroMPI’12).

Digital Library

[9]

Franck Cappello, Geist Al, William Gropp, Sanjay Kale, Bill Kramer, and Marc Snir. 2014. Toward exascale resilience: 2014 update. Supercomput. Front. Innov. Int. J. 1 (2014), 5--28.

Digital Library

[10]

Alexis Champsaur, Jay Lofstead, Jai Dayal, Matthew Wolf, Greg Eisenhauer, Patrick Widener, and Ada Gavrilovska. 2017. SmartBlock: An approach to standardizing in situ workflow components. In Proceedings of the IEEE 31st International Symposium on Parallel and Distributed Processing Symposium Workshops (IPDPSW’17).

[11]

J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo. 2009. Terascale direct numerical simulations of turbulent combustion using S3D. Comput. Sci. Discov. 2, 1 (2009).

[12]

Asaf Cidon, Ryan Stutsman, Stephen Rumble, Sachin Katti, John Ousterhout, and Mendel Rosenblum. 2013. MinCopysets: Derandomizing replication in cloud storage. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI’13).

[13]

Ciprian Docan, Manish Parashar, and Scott Klasky. 2010. DataSpaces: An interaction and coordination framework for coupled simulation workflows. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC’10). 25--36.

Digital Library

[14]

Ciprian Docan, Manish Parashar, and Scott Klasky. 2012. DataSpaces: An interaction and coordination framework for coupled simulation workflows. Cluster Comput. 15, 2 (01 June 2012), 163--181.

[15]

Ciprian Docan, Fan Zhang, Tong Jin, Hoang Bui, Qian Sun, Julian Cummings, Norbert Podhorszki, Scott Klasky, and Manish Parashar. 2014. ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing. Concurr. Comput. Pract. Exper. 27, 14 (2014).

[16]

Shaohua Duan, Pradeep Subedi, Keita Teranishi, Philip Davis, Hemanth Kolla, Marc Gamell, and Manish Parashar. 2018. Scalable data resilience for in-memory data staging. In Proceedings of the 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS’18). 105--115.

[17]

Ifeanyi P. Egwutuoha, David Levy, Bran Selic, and Shiping Chen. 2013. A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems. J. Supercomput. 65, 3 (2013), 1302--1326.

Digital Library

[18]

James Elliott, Kishor Kharbas, David Fiala, Frank Mueller, Kurt Ferreira, and Christian Engelmann. 2012. Combining partial redundancy and checkpointing for HPC. In Proceedings of the 32nd IEEE International Conference on Distributed Computing Systems (ICDCS’12).

Digital Library

[19]

Marc Gamell, Keita Teranishi, Michael A. Heroux, Jackson Mayo, Hemanth Kolla, Jacqueline Chen, and Manish Parashar. 2015. Local recovery and failure masking for stencil-based applications at extreme scales. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’15).

Digital Library

[20]

Shen Gao, Bingsheng He, and Jianliang Xu. 2015. Real-time in-memory checkpointing for future hybrid memory systems. In Proceedings of the 29th ACM International Conference on Supercomputing. 263--272.

Digital Library

[21]

Leonardo Bautista Gomez, Dimitri Komatitsch, and Naoya Maruyama. 2012. FTI: High performance fault tolerance interface for hybrid systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 728--740.

[22]

Saurabh Gupta, Tirthak Patel, Christian Engelmann, and Devesh Tiwari. 2017. Failures in large scale systems: Long-term measurement, analysis, and implications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’17).

Digital Library

[23]

Saurabh Gupta, Devesh Tiwari, Christopher Jantzi, James Rogers, and Don Maxwell. 2015. Understanding and exploiting spatial properties of system failures on extreme-scale HPC systems. In Proceedings of the 45th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’15). 37--44.

Digital Library

[24]

Anthony Kougkas, Hariharan Devarajan, Xian-He Sun, and Jay Lofstead. 2018. Harmonia: An interference-aware dynamic I/O scheduler for shared non-volatile burst buffers. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’18).

[25]

Anthony Kougkas, Matthieu Dorier, Rob Latham, Rob Ross, and Xian-He Sun. 2016. Leveraging burst buffer coordination to prevent I/O interference. In Proceedings of the IEEE 12th International Conference on e-Science (e-Science’16).

[26]

Jiaqi Liu and Gagan Agrawal. 2017. Supporting fault-tolerance in presence of in-situ analytics. In Proceedings of the 17th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid’17). 304--313.

[27]

Jay Lofstead, Jai Dayaly, Ivo Jimenezz, and Carlos Maltzahn. 2014. Efficient, failure resilient transactions for parallel and distributed computing. In Proceedings of the International Workshop on Data Intensive Scalable Computing Systems. 17--24.

Digital Library

[28]

Bin Nie, Devesh Tiwari, Saurabh Gupta, Evgenia Smirni, and James H. Rogers. 2016. A large-scale study of soft-errors on GPUs in the field. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’16). 519--530.

[29]

Bin Nie, Ji Xue, Saurabh Gupta, Tirthak Patel, Christian Engelmann, Evgenia Smirni, and Devesh Tiwari. 2018. Machine learning models for GPU error prediction in a large scale HPC system. In Proceedings of the 48th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’18). 95--106.

[30]

I. S. Reed and G. Solomon. 1960. Polynomial codes over certain finite fields. J. Soc. Industr. Appl. Math. Vol. 8, 2 (1960), 300.

[31]

Hyogi Sim, Youngjae Kim, Sudharshan S. Vazhkudai, Devesh Tiwari, Ali Anwar, Ali R. Butt, and Lavanya Ramakrishnan. 2015. AnalyzeThis: An analysis workflow-aware storage system. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’15).

Digital Library

[32]

James S. Plank, J. Luo, Catherine D. Schuman, L. Xu, and Z. Wilcox-O’Hearn. 2009. A performance evaluation and examination of open-source erasure coding libraries for storage. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’09). 263--272.

[33]

Pradeep Subedi, Philip Davis, Shaohua Duan, Scott Klasky, Hemanth Kolla, and Manish Parashar. 2018. Stacker: An autonomic data movement engine for extreme-scale data staging-based in-situ workflows. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 73.

Digital Library

[34]

Pradeep Subedi and Xubin He. 2013. A comprehensive analysis of XOR-based erasure codes tolerating 3 or more concurrent failures. In Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW’13).

Digital Library

[35]

Kun Tang, Ping Huang, Xubin He, Tao Lu, Sudharshan S. Vazhkudai, and Devesh Tiwari. 2017. Toward managing HPC burst buffers effectively: Draining strategy to regulate bursty I/O behavior. In Proceedings of the IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’17).

[36]

Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter J. Desnoyers, and Yan Solihin. 2013. Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 119--132.

Digital Library

[37]

U.S. Department of Energy, Office of Science. 2018. Exascale Computing Project. Retrieved from https://www.exascaleproject.org/exascale-computing-project/.

[38]

Dirk Vogt, Cristiano Giuffrida, Herbert Bos, and Andrew S. Tanenbaum. 2013. Techniques for efficient in-memory checkpointing. In Proceedings of the 9th Workshop on Hot Topics in Dependable Systems. 263--272.

[39]

Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). USENIX Association, 307--320.

Digital Library

[40]

Matt M. T. Yiu, Helen H. W. Chan, and Patrick P. C. Lee. 2017. Erasure coding for small objects in in-memory KV storage. In Proceedings of the 10th ACM International Systems and Storage Conference (SYSTOR’17).

[41]

Hongfeng Yu, Chaoli Wang, Ray W. Grout, Jacqueline H. Chen, and Kwan-Liu Ma. 2010. In situ visualization for large-scale combustion simulations. IEEE Comput. Graph. Applic. 3 (2010), 45--57.

[42]

Heng Zhang, Mingkai Dong, and Haibo Chen. 2016. Efficient and available in-memory KV-store with hybrid erasure coding and replication. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16).

Digital Library

[43]

Fang Zheng, H. Abbasi, C. Docan, J. Lofstead, Qing Liu, S. Klasky, M. Parashar, N. Podhorszki, K. Schwan, and M. Wolf. 2010. PreDatAPreparatory data analytics on peta-scale machines. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing (IPDPS’10). 1--12.

Cited By

Hegde DMalakar P(2023)Accelerating In Situ Analysis using Non-volatile MemoryProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624176(995-1004)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624176
Wan LChen JLiang XGainaru AGong QLiu QWhitney BArulraj JLiu ZFoster IKlasky SButt AMi NChard K(2023)RAPIDS: Reconciling Availability, Accuracy, and Performance in Managing Geo-Distributed Scientific DataProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592983(87-100)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3588195.3592983
Dorier MWang ZRamesh SAyachit USnyder SRoss RParashar M(2023)Towards elastic in situ analysis for high-performance computing simulationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.02.014177(106-116)Online publication date: Jul-2023
https://doi.org/10.1016/j.jpdc.2023.02.014
Show More Cited By

Index Terms

CoREC: Scalable and Resilient In-memory Data Staging for In-situ Workflows
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
  2. Real-time systems
2. Information systems
  1. Data management systems

Recommendations

Addressing data resiliency for staging based scientific workflows
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

As applications move towards extreme scales, data-related challenges are becoming significant concerns, and in-situ workflows based on data staging and in-situ/in-transit data processing have been proposed to address these challenges. Increasing scale ...
Big data challenges in simulation-based science
DIDC '14: Proceedings of the sixth international workshop on Data intensive distributed computing

Data-related challenges are quickly dominating computational and data-enabled sciences, and are limiting the potential impact of scientific applications enabled by current and emerging high-performance distributed computing environments. These data-...
Persistent Data Staging Services for Data Intensive In-situ Scientific Workflows
DIDC '16: Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing

Scientific simulation workflows executing on very large scale computing systems are essential modalities for scientific investigation. The increasing scales and resolution of these simulations provide new opportunities for accurately modeling complex ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 7, Issue 2

June 2020

182 pages

ISSN:2329-4949

EISSN:2329-4957

DOI:10.1145/3400890

Editor:
David A. Bader
New Jersey Institute of Technology, USA

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2020

Online AM: 07 May 2020

Accepted: 01 February 2020

Revised: 01 December 2019

Received: 01 June 2019

Published in TOPC Volume 7, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

the National Science Foundation
Sandia National Laboratories

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
533
Total Downloads

Downloads (Last 12 months)131
Downloads (Last 6 weeks)20

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hegde DMalakar P(2023)Accelerating In Situ Analysis using Non-volatile MemoryProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624176(995-1004)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624176
Wan LChen JLiang XGainaru AGong QLiu QWhitney BArulraj JLiu ZFoster IKlasky SButt AMi NChard K(2023)RAPIDS: Reconciling Availability, Accuracy, and Performance in Managing Geo-Distributed Scientific DataProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592983(87-100)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3588195.3592983
Dorier MWang ZRamesh SAyachit USnyder SRoss RParashar M(2023)Towards elastic in situ analysis for high-performance computing simulationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.02.014177(106-116)Online publication date: Jul-2023
https://doi.org/10.1016/j.jpdc.2023.02.014
Wang ZDorier MSubedi PDavis PParashar M(2023)Adaptive elasticity policies for staging-based in situ visualizationFuture Generation Computer Systems10.1016/j.future.2022.12.010142(75-89)Online publication date: May-2023
https://doi.org/10.1016/j.future.2022.12.010
Parashar MKurc TKlie HWheeler MSaltz JJammoul MDong R(2023)Dynamic Data-Driven Application Systems for Reservoir Simulation-Based Optimization: Lessons Learned and Future TrendsHandbook of Dynamic Data Driven Applications Systems10.1007/978-3-031-27986-7_11(287-330)Online publication date: 6-Sep-2023
https://doi.org/10.1007/978-3-031-27986-7_11
Dorier MWang ZAyachit USnyder SRoss RParashar M(2022)Colza: Enabling Elastic In Situ Visualization for High-performance Computing Simulations2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00059(538-548)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00059
Wang ZDorier MSubedi PDavis PParashar M(2021)An Adaptive Elasticity Policy For Staging Based In-Situ Processing2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS)10.1109/WORKS54523.2021.00010(33-41)Online publication date: Nov-2021
https://doi.org/10.1109/WORKS54523.2021.00010

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents