research-article

Improving Collective MPI-IO Using Topology-Aware Stepwise Data Aggregation with I/O Throttling

Authors:

Yuichi Tsujita,

Toyohisa Kameyama,

Fumiyoshi Shoji, and

Yutaka IshikawaAuthors Info & Claims

HPCAsia '18: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

January 2018

Pages 12 - 23

https://doi.org/10.1145/3149457.3149464

Published: 28 January 2018 Publication History

Abstract

MPI-IO has been used in an internal I/O interface layer of HDF5 or PnetCDF, where collective MPI-IO plays a big role in parallel I/O to manage a huge scale of scientific data. However, existing collective MPI-IO optimization named two-phase I/O has not been tuned enough for recent supercomputers consisting of mesh/torus interconnects and a huge scale of parallel file systems due to lack of topology-awareness in data transfers and optimization for parallel file systems. In this paper, we propose I/O throttling and topology-aware stepwise data aggregation in two-phase I/O of ROMIO, which is a representative MPI-IO library, in order to improve collective MPI-IO performance even if we have multiple processes per compute node. Throttling I/O requests going to a target file system mitigates I/O request contention, and consequently I/O performance improvements are achieved in file access phase of two-phase I/O. Topology-aware aggregator layout with paying attention to multiple aggregators per compute node alleviates contention in data aggregation phase of two-phase I/O. In addition, stepwise data aggregation improves data aggregation performance. HPIO benchmark results on the K computer indicate that the proposed optimization has achieved up to about 73% and 39% improvements in write performance compared with the original implementation using 12,288 and 24,576 processes on 3,072 and 6,144 compute nodes, respectively.

References

[1]

Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Yuzo Takagi, and Toshiyuki Shimizu. 2012. The Tofu Interconnect. IEEE Micro 32, 1 (2012), 21--31.

Digital Library

[2]

Javier García Blas, Florin Isaila, David E. Singh, and Jesús Carretero. 2008. View-Based Collective I/O for MPI-IO. In CCGRID. 409--416.

Digital Library

[3]

Mohamad Chaarawi and Edgar Gabriel. 2011. Automatically Selecting the Number of Aggregators for Collective I/O Operations. In Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER'11). IEEE Computer Society, 428--437.

Digital Library

[4]

Yong Chen, Xian-He Sun, Rajeev Thakur, Philip C. Roth, and William D. Gropp. 2011. LACIO: A New Collective I/O Strategy for Parallel I/O Systems. In Proceedings of 25th IEEE International Parallel and Distributed Processing Symposium(IPDPS '11). IEEE, 794--804.

Digital Library

[5]

Phillip M. Dickens and Jeremy Logan. 2010. A high performance implementation of MPI-IO for a Lustre file system environment. Concurrency and Computation: Practice and Experience 22, 11 (August 2010), 1433--1449.

Digital Library

[6]

Wei keng Liao and Alok Choudhary. 2008. Dynamically Adapting File Domain Partitioning Methods for Collective I/O Based on Underlying Parallel File System Locking Protocols. In Proceedings of the 2008 ACM/IEEE Conference on High Performance Computing (SC '08). IEEE Press, Piscataway, NJ, USA, Article 3, 12 pages.

Digital Library

[7]

Jianwei Li, Wei-Keng Liao, Alok Choudhary, Robert Ross, Rajeev Thakur, William Gropp, Rob Latham, Andrew Siegel, Brad Gallagher, and Michael Zingale. 2003. Parallel netCDF: A High-Performance Scientific I/O Interface. In Proceedings of the 2003 ACM/IEEE Conference on Supercomputing (SC '03). ACM, 39.

Digital Library

[8]

Jay Lofstead, Fang Zheng, Qing Liu, Scott Klasky, Ron Oldfield, Todd Kordenbrock, Karsten Schwan, and Matthew Wolf. 2010. Managing Variability in the IO Performance of Petascale Storage Systems. In 2010 International Conference for High Performance Computing Networking, Storage and Analysis (SC'10). IEEE, 1--12.

Digital Library

[9]

Lustre. {n. d.}. ({n. d.}). http://lustre.org/

[10]

Lustre. 2008. Lustre ADIO collective write driver. Technical Report. Lustre.

[11]

Siyuan Ma, Xian-He Sun, and Ioan Raicu. 2012. I/O Throttling and Coordination for MapReduce. Technical Report. Illinois Institute of Technology. http://datasys.cs.iit.edu/reports/2012_io-mapreduce-TechRep.pdf

[12]

Takumi Maruyama, Toshio Yoshida, Ryuji Kan, Iwao Yamazaki, Shuji Yamamura, Noriyuki Takahashi, Mikio Hondou, and Hiroshi Okano. 2010. Sparc64 VIIIfx: A New-Generation Octocore Processor for Petascale Computing. IEEE Micro 30, 2 (2010), 30--40.

Digital Library

[13]

Message Passing Interface Forum. 1997. MPI-2: Extensions to the Message-Passing Interface. MPI Forum.

[14]

Hiroyuki Miyazaki, Yoshihiro Kusano, Naoki Shinjou, Fumiyoshi Shoji, Mitsuo Yokokawa, and Tadashi Watanabe. 2012. Overview of the K Computer System. Fujitsu Sci. Tech. J. 48, 3 (2012), 255--265.

[15]

MPI Forum. {n. d.}. ({n. d.}). http://www.mpi-forum.org/

[16]

Open MPI. {n. d.}. Open Source High Performance Computing. ({n. d.}). http://www.open-mpi.org/

[17]

Kenichiro Sakai, Shinji Sumimoto, and Motoyoshi Kurokawa. 2012. High-Performance and Highly Reliable File System for the K computer. Fujitsu Sci. Tech. J. 48, 3 (2012), 302--309.

[18]

Frank Schmuck and Roger Haskin. 2002. GPFS: A Shared-Disk File System for Large Computing Clusters. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST '02). USENIX Association, Article 19.

Digital Library

[19]

Seetharami Seelam, Andre Kerstens, and Patricia J. Teller. 2007. Throttling I/O Streams to Accelerate File-IO Performance. In High Performance Computing and Communications, Third International Conference, HPCC 2007, Houston, USA, September 26-28, 2007, Proceedings (Lecture Notes in Computer Science), Vol. 4782. Springer, 718--731.

Digital Library

[20]

Rajeev Thakur, William Gropp, and Ewing Lusk. 1996. An Abstract-Device Interface for Implementing Portable Parallel-I/O Interfaces. In Proceedings of the Sixth Symposium on the Frontiers of Massively Parallel Computation. 180--187.

Digital Library

[21]

Rajeev Thakur, William Gropp, and Ewing Lusk. 1999. On Implementing MPI-IO Portably and with High Performance. In Proceedings of the Sixth Workshop on Input/Output in Parallel and Distributed Systems. 23--32.

Digital Library

[22]

Rajeev Thakur, William Gropp, and Ewing Lusk. 2002. Optimizing noncontiguous accesses in MPI-IO. Parallel Comput. 28, 1 (2002), 83--105.

Digital Library

[23]

The National Center for Supercomputing Applications. {n. d.}. ({n. d.}). https://www.hdfgroup.org/

[24]

Yuichi Tsujita, Atsushi Hori, and Yutaka Ishikawa. 2014. Locality-Aware Process Mapping for High Performance Collective MPI-IO on FEFS with Tofu Interconnect. In Proceedings of the 21th European MPI Users' Group Meeting (EuroMPI/ASIA '14). ACM, Article 157, 157:157--157:162 pages. Challenges in Data-Centric Computing.

Digital Library

[25]

Yuichi Tsujita, Atsushi Hori, and Yutaka Ishikawa. 2015. Striping Layout Aware Data Aggregation for High Performance I/O on a Lustre File System. In High Performance Computing - 30th International Conference, ISC High Performance 2015, Frankfurt, Germany, July 12-16, 2015, Proceedings. 282--290.

Cited By

Bez JByna SIbrahim S(2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3611007
Sen KVadhiyar SVinayachandran P(2023)Strategies for Fast I/O Throughput in Large-Scale Climate Modeling Applications2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00038(203-212)Online publication date: 18-Dec-2023
https://doi.org/10.1109/HiPC58850.2023.00038
Wan LHuebl AGu JPoeschel FGainaru AWang RChen JLiang XGanyushin DMunson TFoster IVay JPodhorszki NWu KKlasky S(2022)Improving I/O Performance for Exascale Applications Through Online Data Layout ReorganizationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.310078433:4(878-890)Online publication date: 1-Apr-2022
https://doi.org/10.1109/TPDS.2021.3100784
Show More Cited By

Index Terms

Improving Collective MPI-IO Using Topology-Aware Stepwise Data Aggregation with I/O Throttling
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Interconnection architectures

Recommendations

Locality-Aware Process Mapping for High Performance Collective MPI-IO on FEFS with Tofu Interconnect
EuroMPI/ASIA '14: Proceedings of the 21st European MPI Users' Group Meeting

Collective MPI-IO for non-contiguous access has been frequently used in not only its direct MPI-IO application programming interface (API) calls, but also scientific application driven parallel I/O libraries such as the HDF5, which utilizes MPI-IO APIs ...
Read More
MPI-IO/Gfarm: An Optimized Implementation of MPI-IO for the Gfarm File System
CCGRID '11: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

This paper proposes a design and implementation of an MPI-IO implementation of the Gfarm file system, called MPI-IO/Gfarm. The Gfarm file system is a global file system that federates the local storage of compute nodes among several clusters. It has a ...
Read More
Multithreaded Two-Phase I/O: Improving Collective MPI-IO Performance on a Lustre File System
PDP '14: Proceedings of the 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing

ROMIO, a representative MPI-IO implementation, has been widely used in recent large-scale parallel computations. The two-phase I/O optimization scheme of ROMIO improves I/O performance for non-contiguous access patterns, however, this scheme still has ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HPCAsia '18: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

January 2018

322 pages

ISBN:9781450353724

DOI:10.1145/3149457

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing
IPSJ: Information Processing Society of Japan
Cybermedia Center, Osaka University: Cybermedia Center, Osaka University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 January 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

HPC Asia 2018

HPC Asia 2018: International Conference on High Performance Computing in Asia-Pacific Region

January 28 - 31, 2018

Tokyo, Chiyoda, Japan

Acceptance Rates

HPCAsia '18 Paper Acceptance Rate 30 of 67 submissions, 45%;

Overall Acceptance Rate 69 of 143 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
130
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Bez JByna SIbrahim S(2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3611007
Sen KVadhiyar SVinayachandran P(2023)Strategies for Fast I/O Throughput in Large-Scale Climate Modeling Applications2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00038(203-212)Online publication date: 18-Dec-2023
https://doi.org/10.1109/HiPC58850.2023.00038
Wan LHuebl AGu JPoeschel FGainaru AWang RChen JLiang XGanyushin DMunson TFoster IVay JPodhorszki NWu KKlasky S(2022)Improving I/O Performance for Exascale Applications Through Online Data Layout ReorganizationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.310078433:4(878-890)Online publication date: 1-Apr-2022
https://doi.org/10.1109/TPDS.2021.3100784
Feki RGabriel E(2020)On Overlapping Communication and File I/O in Collective Write Operation2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00175(1-8)Online publication date: May-2020
https://doi.org/10.1109/IPDPSW50202.2020.00175
Tsujita YFurutani YHida HYamamoto KUno A(2020)Characterizing I/O Optimization Effect Through Holistic Log Data Analysis of Parallel File Systems and InterconnectsHigh Performance Computing10.1007/978-3-030-59851-8_11(177-190)Online publication date: 22-Jun-2020
https://dl.acm.org/doi/10.1007/978-3-030-59851-8_11
Pavan PBez JSerpa MBoito FNavaux P(2019)An Unsupervised Learning Approach for I/O Behavior Characterization2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2019.00019(33-40)Online publication date: Oct-2019
https://doi.org/10.1109/SBAC-PAD.2019.00019
Hori ATsujita YShimada AYoshinaga KMitaro NFukazawa GSato MBosilca GBouteiller AHerault T(2018)System Software for Many-Core and Multi-core ArchitectureAdvanced Software Technologies for Post-Peta Scale Computing10.1007/978-981-13-1924-2_4(59-75)Online publication date: 7-Dec-2018
https://doi.org/10.1007/978-981-13-1924-2_4

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents