Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3205289.3205305acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Rethinking Node Allocation Strategy for Data-intensive Applications in Consideration of Spatially Bursty I/O

Published: 12 June 2018 Publication History

Abstract

Job scheduling in HPC systems by default allocate adjacent compute nodes for jobs for lower communication overhead. However, it is no longer applicable to data-intensive jobs running on systems with I/O forwarding layer, where each I/O node performs I/O on behalf of a subset of compute nodes in the vicinity. Under the default node allocation strategy a job's nodes are located close to each other and thus it only uses a limited number of I/O nodes. Since the I/O activities of jobs are bursty, at any moment only a minority of jobs in the system are busy processing I/O. Consequently, the bursty I/O traffic in the system is also concentrated in space, making the load on I/O nodes highly unbalanced. In this paper, we use the job logs and I/O traces collected from Tianhe-1A to quantitatively analyze the two causes of spatially bursty I/O, including uneven I/O traffic of job's processes and uneven distribution of job's nodes. Based on the analysis we propose a node allocation strategy that takes account of processes' different amounts of I/O traffic, so that the I/O traffic can be processed by more I/O nodes more evenly. Our evaluations on Tianhe-1A with synthetic benchmarks and realistic applications show that the proposed strategy can further exploit the potential of I/O forwarding layer and promote the I/O performance.

References

[1]
Boyle P A. 2012. The Bluegene/Q Supercomputer. PoS 020 (2012).
[2]
Nawab Ali, Philip Carns, Kamil Iskra, Dries Kimpe, Samuel Lang, Robert Latham, Robert Ross, Lee Ward, and P. Sadayappan. 2009. Scalable I/O forwarding framework for high-performance computing systems. In IEEE International Conference on CLUSTER Computing and Workshops. 1--10.
[3]
Katie Antypas, Nicholas Wright, Nicholas P Cardo, Allison Andrews, and Matthew Cordery. 2014. Cori: a cray xc pre-exascale system for nersc. Cray User Group Proceedings. Cray (2014).
[4]
Philip Carns, Kevin Harms, William Allcock, Charles Bacon, Samuel Lang, Robert Latham, and Robert Ross. 2011. Understanding and improving computational science storage access through continuous characterization. ACM Transactions on Storage (TOS) 7, 3 (2011), 8.
[5]
Yiannis Georgiou, Emmanuel Jeannot, Guillaume Mercier, Adle Villiermet, Yiannis Georgiou, Emmanuel Jeannot, Guillaume Mercier, Adle Villiermet, Yiannis Georgiou, and Emmanuel Jeannot. 2017. Topology-aware job mapping. International Journal of High Performance Computing Applications (2017), 109434201772706.
[6]
Landsteiner B Henseler D and Petesch D. 2016. Architecture and Design of Cray DataWarp. In Proc. Cray Users' Group Technical Conference (CUG).
[7]
Stephen Herbein, H. Ahn Dong, Don Lipari, Thomas R. W. Scogland, Marc Stearman, Mark Grondona, Jim Garlick, Becky Springmeyer, and Michela Taufer. 2016. Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters. In ACM International Symposium on High-Performance Parallel and Distributed Computing. 69--80.
[8]
Youngjae Kim, Raghul Gunasekaran, Galen M Shipman, David A Dillow, Zhe Zhang, and Bradley W Settlemyer. 2010. Workload characterization of a leadership class storage cluster. In Petascale Data Storage Workshop (PDSW), 2010 5th. IEEE, 1--5.
[9]
Quincey Koziol et al. 2014. High performance parallel I/O. CRC Press.
[10]
Xiangke Liao, Liquan Xiao, Canqun Yang, and Yu-tong Lu. 2014. Milky Way-2 supercomputer: system and application. Frontiers of Computer Science Selected Publications from Chinese Universities 8, 3 (2014), 345--356.
[11]
Xin Liu, Yutong Lu, Jie Yu, Peng-fei Wang, Jie-ting Wu, and Ying Lu. 2017. ONFS: a hierarchical hybrid file system based on memory, SSD, and HDD for high performance computers. Frontiers of Information Technology & Electronic Engineering 18, 12 (2017), 1940--1971.
[12]
Yang Liu, Raghul Gunasekaran, Xiaosong Ma, and Sudharshan S. Vazhkudai. 2016. Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems. In International Conference for High PERFORMANCE Computing, Networking, Storage and Analysis. 70.
[13]
Xiao Qin, Hong Jiang, Adam Manzanares, Xiaojun Ruan, and Shu Yin. 2009. Dynamic load balancing for I/O-intensive applications on clusters. Acm Transactions on Storage 5, 3 (2009), 1--38.
[14]
Stephan Schlagkamp, Rafael Ferreira da Silva, William Allcock, Ewa Deelman, and Uwe Schwiegelshohn. 2016. Consecutive job submission behavior at Mira supercomputer. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. ACM, 93--96.
[15]
Claude E Shannon. 2001. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5, 1 (2001), 3--55.
[16]
H. Subramoni, D. Bureddy, K. Kandalla, K. Schulz, B. Barth, J. Perkins, M. Arnold, and D. K. Panda. 2014. Design of network topology aware scheduling services for large InfiniBand clusters. In IEEE International Conference on CLUSTER Computing. 1--8.
[17]
TOP500. 2017. TOP500 Supercomputer Sites. http://www.top500.org. (2017).
[18]
Jie Yu, Guangming Liu, Xiaoyong Li, Wenrui Dong, and Qiong Li. 2017. Cross-layer coordination in the I/O software stack of extreme-scale systems. Concurrency & Computation Practice & Experience (2017).
[19]
Yanyong Zhang, Antony Yang, Anand Sivasubramaniam, and Jose Moreira. 2007. Gang Scheduling Extensions for I/O Intensive Workloads. In Job Scheduling Strategies for Parallel Processing, International Workshop, Jsspp 2003, Seattle, Wa, Usa, June 24, 2003, Revised Papers. 183--207.

Cited By

View all
  • (2024)Trade-off topology design for hierarchical network based on job characteristicsCCF Transactions on High Performance Computing10.1007/s42514-024-00193-zOnline publication date: 21-May-2024
  • (2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
  • (2023)DFBuffer: High-performance data forwarding software optimized for single-process I/O scenarios2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS56603.2022.00074(522-529)Online publication date: Jan-2023
  • Show More Cited By

Index Terms

  1. Rethinking Node Allocation Strategy for Data-intensive Applications in Consideration of Spatially Bursty I/O

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '18: Proceedings of the 2018 International Conference on Supercomputing
    June 2018
    407 pages
    ISBN:9781450357838
    DOI:10.1145/3205289
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 June 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. I/O Forwarding
    2. Information Entropy
    3. Job Scheduling
    4. Node Allocation
    5. Spatially Bursty I/O

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICS '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 29 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Trade-off topology design for hierarchical network based on job characteristicsCCF Transactions on High Performance Computing10.1007/s42514-024-00193-zOnline publication date: 21-May-2024
    • (2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
    • (2023)DFBuffer: High-performance data forwarding software optimized for single-process I/O scenarios2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS56603.2022.00074(522-529)Online publication date: Jan-2023
    • (2021)Visual Analysis of the High-performance Computing Jobs Based on the Comprehensive Load Scoring Algorithm2021 7th International Conference on Computer and Communications (ICCC)10.1109/ICCC54389.2021.9674694(1436-1443)Online publication date: 10-Dec-2021
    • (2019)A Comprehensive Analysis of User Job Data on a Petascale Supercomputer Dedicated to CFD2019 IEEE 5th International Conference on Computer and Communications (ICCC)10.1109/ICCC47050.2019.9064094(86-91)Online publication date: Dec-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media