Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Characterizing Output Bottlenecks of a Production Supercomputer: Analysis and Implications

Published: 16 January 2020 Publication History
  • Get Citation Alerts
  • Abstract

    This article studies the I/O write behaviors of the Titan supercomputer and its Lustre parallel file stores under production load. The results can inform the design, deployment, and configuration of file systems along with the design of I/O software in the application, operating system, and adaptive I/O libraries.
    We propose a statistical benchmarking methodology to measure write performance across I/O configurations, hardware settings, and system conditions. Moreover, we introduce two relative measures to quantify the write-performance behaviors of hardware components under production load. In addition to designing experiments and benchmarking on Titan, we verify the experimental results on one real application and one real application I/O kernel, XGC and HACC IO, respectively. These two are representative and widely used to address the typical I/O behaviors of applications.
    In summary, we find that Titan’s I/O system is variable across the machine at fine time scales. This variability has two major implications. First, stragglers lessen the benefit of coupled I/O parallelism (striping). Peak median output bandwidths are obtained with parallel writes to many independent files, with no striping or write sharing of files across clients (compute nodes). I/O parallelism is most effective when the application—or its I/O libraries—distributes the I/O load so that each target stores files for multiple clients and each client writes files on multiple targets in a balanced way with minimal contention. Second, our results suggest that the potential benefit of dynamic adaptation is limited. In particular, it is not fruitful to attempt to identify “good locations” in the machine or in the file system: component performance is driven by transient load conditions and past performance is not a useful predictor of future performance. For example, we do not observe diurnal load patterns that are predictable.

    References

    [1]
    Argonne National Laboratory. 2018. Retrieved November 9, 2019 from Darshan: HPC I/O Characterization Tool. http://www.mcs.anl.gov/research/projects/darshan.
    [2]
    Philip Carns, Kevin Harms, William Allcock, Charles Bacon, Samuel Lang, Robert Latham, and Robert Ross. 2011. Understanding and improving computational science storage access through continuous characterization. ACM Transactions on Storage 7, 3, 8--26.
    [3]
    Philip Carns, Robert Latham, Robert Ross, Kamil Iskra, Samuel Lang, and Katherine Riley. 2009. 24/7 characterization of petascale I/O workloads. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’09). New Orleans, LA, 1--10.
    [4]
    Luis Chacón. 2004. A non-staggered, conservative, finite-volume scheme for 3D implicit extended magnetohydrodynamics in curvilinear geometries. Computer Physics Communications 163, 3, 143--171.
    [5]
    C. S. Chang and Susan Ku. 2008. Spontaneous rotation sources in a quiescent tokamak edge plasma. Physics of Plasmas 15, 6, 062510.
    [6]
    J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. Liao, K. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. Yoo. 2009. Terascale direct numerical simulations of turbulent combustion using S3D. Computational Science 8 Discovery 2, 1, 015001.
    [7]
    Yanpei Chen, Kiran Srinivasan, Garth Goodson, and Randy Katz. 2011. Design implications for enterprise storage systems via multi-dimensional trace analysis. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). Cascais, Portugal, 43--56.
    [8]
    Y. Cui, K. Olsen, T. Jordan, K. Lee, J. Zhou, P. Small, D. Roten, G. Ely, D. K. Panda, A. Chourasia, J. Levesque, S. Day, and P. Maechling. 2010. Scalable earthquake simulation on petascale supercomputers. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’10). Washington, DC. 1--20.
    [9]
    David A. Dillow, Galen M. Shipman, Sarp Oral, Zhe Zhang, and Youngjae Kim. 2011. Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems. In Proceedings of the 30th IEEE International Performance Computing and Communications Conference (IPCCC’11). Orlando, FL, 21--29.
    [10]
    Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, and Rob Ross. 2014. Omnisc’IO: A grammar-based approach to spatial and temporal I/O patterns prediction. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). New Orleans, LA, 623--634.
    [11]
    Matt Ezell, David Dillow, Sarp Oral, Feiyi Wang, Devesh Tiwari, Don Maxwell, Dustin Leverman, and Jason Hill. 2014. I/O router placement and fine-grained routing on Titan to support Spider II. In Proceedings of the Cray User Group Conference (CUG’14). Lugano, Switzerland, 1--6.
    [12]
    Youngjae Kim and Raghul Gunasekaran. 2014. Understanding I/O workload characteristics of a peta-scale storage system. The Journal of Supercomputing 71, 3, 761--780.
    [13]
    Youngjae Kim, Raghul Gunasekaran, Galen M. Shipman, David A. Dillow, Zhe Zhang, and Bradley W. Settlemyer. 2010. Workload characterization of a leadership class storage cluster. In Proceedings of the 5th Petascale Data Storage Workshop (PDSW’10). New Orleans, LA, 1--5.
    [14]
    S. Klasky, S. Ethier, Z. Lin, K. Martins, D. McCune, and R. Samtaney. 2003. Grid-based parallel data streaming implemented for the Gyrokinetic Toroidal Code. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’03). Phoenix, AZ, 24--36.
    [15]
    Nancy P. Kronenberg, Henry M. Levy, and William D. Strecker. 1986. VAXcluster: A closely-coupled distributed system. ACM Transactions on Computer Systems 4, 2, 130--146.
    [16]
    S. Ku, C. S. Chang, M. Adams, J. Cummings, F. Hinton, D. Keyes, S. Klasky, W. Lee, Z. Lin, S. Parker, and the CPES team. 2006. Gyrokinetic particle simulation of neoclassical transport in the pedestal/scrape-off region of a tokamak plasma. Journal of Physics 46, 1, 87--91.
    [17]
    Julian Kunkel, Michaela Zimmer, and Eugen Betke. 2015. Predicting performance of non-contiguous I/O with machine learning. In Proceedings of the International Conference on High Performance Computing (ISC’15). Frankfurt, Germany, 257--273.
    [18]
    S. Lang, P. Carns, R. Latham, R. Ross, K. Harms, and W. Allcock. 2009. I/O performance challenges at leadership scale. In Proceedings of the ACM/IEEE International Conference for High Performance Computing Networking, Storage and Analysis (SC’09). Portland, OR, 40--52.
    [19]
    Qing Liu, Jeremy Logan, Yuan Tian, Hasan Abbasi, Norbert Podhorszki, Jong Youl Choi, Scott Klasky, Roselyne Tchoua, Jay Lofstead, Ron Oldfield, et al. 2014. Hello ADIOS: The challenges and lessons of developing leadership class I/O frameworks. Concurrency and Computation: Practice and Experience 26, 7, 1453--1473.
    [20]
    Jay Lofstead, Fang Zheng, Scott Klasky, and Karsten Schwan. 2009. Adaptable, metadata-rich I/O methods for portable high performance I/O. In Proceedings of the 23rd IEEE International Parallel 8 Distributed Processing Symposium (IPDPS’09). Rome, Italy, 1--10.
    [21]
    Jay Lofstead, Fang Zheng, Qing Liu, Scott Klasky, Ron Oldfield, Todd Kordenbrock, Karsten Schwan, and Matthew Wolf. 2010. Managing variability in the I/O performance of petascale storage systems. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’10). Washington, DC, 1--12.
    [22]
    Huong Luu, Marianne Winslett, William Gropp, Robert Ross, Philip Carns, Kevin Harms, Mr Prabhat, Suren Byna, and Yushu Yao. 2015. A multiplatform study of I/O behavior on petascale supercomputers. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC’15). Portland, OR, 33--44.
    [23]
    Sandeep Madireddy, Prasanna Balaprakash, Phil Carns, Robert Latham, Robert Ross, Shane Snyder, and Stefan M. Wild. 2018. Machine learning based parallel I/O predictive modeling: A case study on Lustre file systems. In Proceedings of the International Conference on High Performance Computing. Hyderabad, India, 184--204.
    [24]
    Ryan McKenna, Stephen Herbein, Adam Moody, Todd Gamblin, and Michela Taufer. 2016. Machine learning predictions of runtime and IO traffic on high-end clusters. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’16). Taipei, Taiwan, 255--258.
    [25]
    David A. Nowark and Mark Seager. 1999. ASCI terascale simulation: Requirements and deployments. In Oak Ridge Interconnect Workshop (ASCI-00-003.1). Oak Ridge, TN, 1--15.
    [26]
    Oak Ridge National Laboratory. 2018. HACC. Retrieved November 9, 2019 from https://www.olcf.ornl.gov/caar/hacc/.
    [27]
    Sarp Oral, Feiyi Wang, David Dillow, Galen Shipman, Ross Miller, and Oleg Drokin. 2010. Efficient object storage journaling in a distributed parallel file system. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10). San Jose, CA, 143--154.
    [28]
    Hongzhang Shan, Katie Antypas, and John Shalf. 2008. Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’08). Austin, TX, 42--54.
    [29]
    Hongzhang Shan and John Shalf. 2007. Using IOR to analyze the I/O performance for HPC platforms. In Proceedings of the Cray User Group Meeting (CUG’07). Washington, DC, 1--15.
    [30]
    Galen Shipman, David Dillow, Douglas Fuller, Raghul Gunasekaran, Jason Hill, Youngjae Kim, Sarp Oral, Doug Reitz, James Simmons, and Feiyi Wang. 2012. A next-generation parallel file system environment for the OLCF. In Proceedings of the Cray User Group Conference (CUG’12). Stuttgart, Germany, 1--12.
    [31]
    Galen Shipman, David Dillow, Sarp Oral, and Feiyi Wang. 2009. The Spider center wide file system: from concept to reality. In Proceedings of the Cray User Group Meeting (CUG’09). Atlanta GA, 1--10.
    [32]
    Rajeev Thakur, William Gropp, and Ewing Lusk. 1999. Data sieving and collective I/O in ROMIO. In Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation (Frontiers’99). Annapolis, MD, 182--189.
    [33]
    Yuan Tian, Scott Klasky, Hasan Abbasi, Jay Lofstead, Ray Grout, Norbert Podhorszki, Qing Liu, Yandong Wang, and Weikuan Yu. 2011. EDO: Improving read performance for scientific applications through elastic data organization. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’11). Austin, TX, 93--102.
    [34]
    Andrew Uselton, Mark Howison, Nicholas J. Wright, David Skinner, Noel Keen, John Shalf, Karen L. Karavanic, and Leonid Oliker. 2010. Parallel I/O performance: From events to ensembles. In Proceedings of the 24th IEEE International Parallel 8 Distributed Processing Symposium (IPDPS’10). Atlanta, GA, 1--11.
    [35]
    Lipeng Wan, Matthew Wolf, Feiyi Wang, Jong Youl Choi, George Ostrouchov, and Scott Klasky. 2017. Analysis and modeling of the end-to-end I/O performance on OLCF’s titan supercomputer. In Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS’17). Salt Lake City, Utah, 1--9.
    [36]
    Feiyi Wang, Sarp Oral, Galen Shipman, Oleg Drokin, Tom Wang, and Isaac Huang. 2009. Understanding Lustre filesystem internals. Technical Report ORNL TM-2009, 117, 1--80.
    [37]
    Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). Seattle, WA, 307--320.
    [38]
    Bing Xie. 2017. Output Performance of Petascale File Systems. Ph.D. Dissertation. Duke University, Durham, NC.
    [39]
    Bing Xie, Jeffrey Chase, David Dillow, Oleg Drokin, Scott Klasky, Sarp Oral, and Norbert Podhorszki. 2012. Characterizing output bottlenecks in a supercomputer. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’12). Salt Lake City, UT, 1--11.
    [40]
    Bing Xie, Jeffrey S. Chase, David Dillow, Scott Klasky, Jay Lofstead, Sarp Oral, and Norbert Podhorszki. 2017. Output performance study on a production petascale filesystem. In HPC I/O in the Data Center Workshop (HPC-IODC’17). Frankfurt, Germany, 1--14.
    [41]
    Bing Xie, Yezhou Huang, Jeffrey Chase, Jong Youl Choi, Scott Klasky, Jay Lofstead, and Sarp Oral. 2017. Predicting output performance of a petascale supercomputer. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (HPDC’17). ACM, Washington DC, 181--192.

    Cited By

    View all
    • (2024)HyzoneStore: Hybrid Storage with Flexible Logical Interface and Optimized Cache for Zoned DevicesProceedings of the 2024 7th International Conference on Data Storage and Data Engineering10.1145/3653924.3653935(71-77)Online publication date: 27-Feb-2024
    • (2023)LAMP: Improving Compression Ratio for AMR Applications via Level Associated Mapping-Based PreconditioningIEEE Transactions on Computers10.1109/TC.2023.329744272:12(3370-3382)Online publication date: 1-Dec-2023
    • (2021)An Empirical Study of Package Dependencies and Lifetimes in Binder Python Containers2021 IEEE 17th International Conference on eScience (eScience)10.1109/eScience51609.2021.00032(215-224)Online publication date: Sep-2021
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Storage
    ACM Transactions on Storage  Volume 15, Issue 4
    Usenix Fast 2019 Special Section and Regular Papers
    November 2019
    228 pages
    ISSN:1553-3077
    EISSN:1553-3093
    DOI:10.1145/3373756
    • Editor:
    • Sam H. Noh
    Issue’s Table of Contents
    This paper is authored by an employee(s) of the United States Government and is in the public domain. Non-exclusive copying or redistribution is allowed, provided that the article citation is given and the authors and agency are clearly identified as its source.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 January 2020
    Accepted: 01 May 2019
    Revised: 01 March 2019
    Received: 01 April 2018
    Published in TOS Volume 15, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. High-performance computing
    2. benchmarking
    3. file systems

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)125
    • Downloads (Last 6 weeks)21
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)HyzoneStore: Hybrid Storage with Flexible Logical Interface and Optimized Cache for Zoned DevicesProceedings of the 2024 7th International Conference on Data Storage and Data Engineering10.1145/3653924.3653935(71-77)Online publication date: 27-Feb-2024
    • (2023)LAMP: Improving Compression Ratio for AMR Applications via Level Associated Mapping-Based PreconditioningIEEE Transactions on Computers10.1109/TC.2023.329744272:12(3370-3382)Online publication date: 1-Dec-2023
    • (2021)An Empirical Study of Package Dependencies and Lifetimes in Binder Python Containers2021 IEEE 17th International Conference on eScience (eScience)10.1109/eScience51609.2021.00032(215-224)Online publication date: Sep-2021
    • (2021)A Serial Image Copy-Move Forgery Localization Scheme With Source/Target DistinguishmentIEEE Transactions on Multimedia10.1109/TMM.2020.302686823(3506-3517)Online publication date: 1-Jan-2021
    • (2021)SCTuner: An Autotuner Addressing Dynamic I/O Needs on Supercomputer I/O Subsystems2021 IEEE/ACM Sixth International Parallel Data Systems Workshop (PDSW)10.1109/PDSW54622.2021.00010(29-34)Online publication date: Nov-2021
    • (2021)Interpreting Write Performance of Supercomputer I/O Systems with Regression Models2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00064(557-566)Online publication date: May-2021
    • (2021)Battle of the Defaults: Extracting Performance Characteristics of HDF5 under Production Load2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid51090.2021.00015(51-60)Online publication date: May-2021
    • (2020)Processing full-scale square kilometre array data on the summit supercomputerProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433704(1-12)Online publication date: 9-Nov-2020
    • (2020)The Need for Precise and Efficient Memory Capacity BudgetingProceedings of the International Symposium on Memory Systems10.1145/3422575.3422791(169-177)Online publication date: 28-Sep-2020
    • (2020)Processing Full-Scale Square Kilometre Array Data on the Summit SupercomputerSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00006(1-12)Online publication date: Nov-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media