Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Optimizing the SSD Burst Buffer by Traffic Detection

Published: 04 March 2020 Publication History

Abstract

Currently, HPC storage systems still use hard disk drive (HDD) as their dominant storage device. Solid state drive (SSD) is widely deployed as the buffer to HDDs. Burst buffer has also been proposed to manage the SSD buffering of bursty write requests. Although burst buffer can improve I/O performance in many cases, we find that it has some limitations such as requiring large SSD capacity and harmonious overlapping between computation phase and data flushing phase.
In this article, we propose a scheme, called SSDUP+.1 SSDUP+ aims to improve the burst buffer by addressing the above limitations. First, to reduce the demand for the SSD capacity, we develop a novel method to detect and quantify the data randomness in the write traffic. Further, an adaptive algorithm is proposed to classify the random writes dynamically. By doing so, much less SSD capacity is required to achieve the similar performance as other burst buffer schemes. Next, to overcome the difficulty of perfectly overlapping the computation phase and the flushing phase, we propose a pipeline mechanism for the SSD buffer, in which data buffering and flushing are performed in pipeline. In addition, to improve the I/O throughput, we adopt a traffic-aware flushing strategy to reduce the I/O interference in HDD. Finally, to further improve the performance of buffering random writes in SSD, SSDUP+ transforms the random writes to sequential writes in SSD by storing the data with a log structure. Further, SSDUP+ uses the AVL tree structure to store the sequence information of the data.
We have implemented a prototype of SSDUP+ based on OrangeFS and conducted extensive experiments. The experimental results show that our proposed SSDUP+ can save an average of 50% SSD space while delivering almost the same performance as other common burst buffer schemes. In addition, SSDUP+ can save about 20% SSD space compared with the previous version of this work, SSDUP, while achieving 20–30% higher I/O throughput than SSDUP.

References

[1]
Jens Axboe. 2003. Completely Fair Queueing (CFQ) Scheduler. Retrieved from http://en.wikipedia.org/wiki/CFQ.
[2]
Jens Axboe. 2003. Noop Scheduler. Retrieved from http://en.wikipedia.org/wiki/Noop.
[3]
J. Bent, G. Gibson, G. Grider, B. McClelland, P. Nowoczynski, J. Nunez, M. Polte, and M. Wingate. 2009. PLFS: A checkpoint filesystem for parallel applications. In Proceedings of the International Conference on Supercomputing (SC’09). 21.
[4]
Peter Braam. 2003. Lustre File System. Retrieved from http://wiki.lustre.org/Main_Page.
[5]
S. Byna, Y. Chen, X.-H. Sun, R. Thakur, and W. Gropp. 2008. Parallel I/O prefetching using MPI file caching and I/O signatures. In Proceedings of the International Conference on Supercomputing (SC’08). 44.
[6]
Hsung-Pin Chang, Jhih-Cheng Luo, and Da-Wei Chang. 2016. A load-balancing data caching scheme in multi-tiered storage systems. In Proceedings of the 2016 IEEE 18th International Conference on High Performance Computing and Communications, the IEEE 14th International Conference on Smart City, and the IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS’16). IEEE, 124--127.
[7]
Feng Chen, David A. Koufaty, and Xiaodong Zhang. 2011. Hystor: Making the best use of solid state drives in high performance storage systems. In Proceedings of the International Conference on Supercomputing. ACM, 22--32.
[8]
Y. Chen, X.-H. Sun, R. Thakur, H. Song, and H. Jin. 2010. Improving parallel I/O performance with data layout awareness. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’10). 302--311.
[9]
A. Ching, A. Choudhary, W.-K. Liao, L. Ward, and N. Pundit. 2006. Evaluating I/O characteristics and methods for storing structured scientific data. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’06). 49--49.
[10]
P. M. Dickens and J. Logan. 2009. Y-lib: A user level library to increase the performance of MPI-IO in a lustre file system environment. In Proceedings of the International Symposium on High-Performance Parallel and Distributed Computing (HPDC’09). 31--38.
[11]
H. Fu, J. Liao, J. Yang, L. Wang, X. Huang, C. Yang, W. Xue, F. Qiao, W. Zhao, X. Yin, C. Hou, W. Ge, J. Zhang, Y. Wang, and G. Yang. 2016. The Sunway TaihuLight supercomputer: System and applications. Inf. Sci. 59, 7, Article 072001 (2016), 16 pages.
[12]
Jaehyun Han, Donghun Koo, Glenn K. Lockwood, Jaehwan Lee, Hyeonsang Eom, and Soonwook Hwang. 2017. Accelerating a burst buffer via user-level i/o isolation. In Proceedings of the 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 245--255.
[13]
S. He, X.-H. Sun, and B. Feng. 2014. S4D-Cache: Smart selective SSD cache for parallel I/O systems. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS’14). 514--523.
[14]
D. Huang, X. Zhang, W. Shi, M. Zheng, S. Jiang, and F. Qin. 2013. LiU: Hiding disk access latency for HPC applications with a new SSD-enabled data layout. In Proceedings of the IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’13). 111--120.
[15]
H. Huang, W. Hung, and K. Shin. 2005. FS2: Dynamic data replication in free disk space for improving disk performance and energy consumption. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’05). 263--276.
[16]
Sai Huang, Qingsong Wei, Dan Feng, Jianxi Chen, and Cheng Chen. 2016. Improving flash-based disk cache with lazy adaptive replacement. ACM Trans. Stor. 12, 2 (2016), 8.
[17]
IBM. 2006. Boost Application Performance Using Asynchronous I/O. Retrieved from https://www.ibm.com/developerworks/library/l-async.
[18]
Hartford N. Gunn Jr. 1970. PBS. Retrieved from http://pbspro.org.
[19]
Lawrence Livermore National Laboratory. 2010. Slurm Workload Manager. Retrieved from http://slurm.schedmd.com.
[20]
Tonglin Li, Xiaobing Zhou, Kevin Brandstatter, Dongfang Zhao, Ke Wang, Anupam Rajendran, Zhao Zhang, and Ioan Raicu. 2013. ZHT: A light-weight reliable persistent dynamic scalable zero-hop distributed hash table. In Proceedings of the IEEE 27th International Parallel and Distributed Processing Symposium (IPDPS’13). IEEE, 775--787.
[21]
Xiangke Liao, Liquan Xiao, Canqun Yang, and Yutong Lu. 2014. MilkyWay-2 supercomputer: System and application. Front. Comput. Sci. 8, 3 (2014), 345--356.
[22]
N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. 2012. On the role of burst buffers in leadership-class storage systems. In Proceedings of the International Conference on Massive Storage Systems and Technology (MSST’12). 1--11.
[23]
Y. Liu, R. Gunasekaran, X. Ma, and S. Vazhkudai. 2014. Automatic identification of application I/O signatures from noisy server-side traces. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’14). 213--228.
[24]
LLNL. 2003. Interleaved Or Random (IOR) Benchmarks. Retrieved from https://github.com/LLNL/ior.
[25]
H. Luu, M. Winslett, W. Gropp, R. Ross, P. Carns, K. Harms, M. Prabhat, S. Byna, and Y. Yao. 2015. A multiplatform study of I/O behavior on petascale supercomputers. In Proceedings of the ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC’15). 33--44.
[26]
University of Chicago. 2002. High-performance and Widely Portable Implementation of the Message Passing Interface (MPI) Standard. Retrieved from http://www.mpich.org.
[27]
Oracle. 2012. Sun Grid Engine. Retrieved from https://en.wikipedia.org/wiki/Oracle_Grid_Engine.
[28]
Michael Papka, Susan Coghlan, Eric Isaacs, Mark Peters, and Paul Messina. 2013. Mira: Argonne’s 10-petaflops Supercomputer. Technical Report. Argonne National Laboratory, Argonne, IL.
[29]
Raghunath Rajachandrasekar, Adam Moody, Kathryn Mohror, and Dhabaleswar K. Panda. 2013. A 1 PB/s file system to checkpoint three million MPI tasks. In Proceedings of the 22nd International Symposium on High-performance Parallel and Distributed Computing. ACM, 143--154.
[30]
Rob Ross. 2001. Evaluatings the performance of an underlying MPI-IO and file system implementation under a noncontiguous access workload. Retrieved from http://www.mcs.anl.gov/research/projects/pio-benchmark/.
[31]
Robert Ross. 2015. Orange File System. Retrieved from http://www.orangefs.org.
[32]
Reza Salkhordeh, Shahriar Ebrahimi, and Hossein Asadi. 2018. ReCA: An efficient reconfigurable cache architecture for storage systems with online workload characterization. IEEE Trans. Parallel Distrib. Syst. 29, 7 (2018), 1605--1620.
[33]
Wolfram Schenck, Salem El Sayed, Maciej Foszczynski, Wilhelm Homberg, and Dirk Pleiter. 2017. Evaluation and performance modeling of a burst buffer solution. ACM SIGOPS Operat. Syst. Rev. 50, 2 (2017), 12--26.
[34]
Xuanhua Shi, Ming Li, Wei Liu, Hai Jin, Chen Yu, and Yong Chen. 2017. SSDUP: A traffic-aware ssd burst buffer for HPC systems. In Proceedings of the Industrial Control Systems Cyber Security Conference (ICS’17). 27:1--27:10.
[35]
Kun Tang, Ping Huang, Xubin He, Tao Lu, Sudharshan S. Vazhkudai, and Devesh Tiwari. 2017. Toward managing hpc burst buffers effectively: Draining strategy to regulate bursty i/o behavior. In Proceedings of the 2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’17). IEEE, 87--98.
[36]
L. Tang, Q. Huang, W. Lloyd, S. Kumar, and K. Li. 2015. RIPQ: Advanced photo caching on flash for Facebook. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15). 373--386.
[37]
R. Thakur, W. Gropp, and E. Lusk. 1999. Data sieving and collective I/O in ROMIO. In Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation. 182--189.
[38]
Teng Wang, Adam Moody, Yue Zhu, Kathryn Mohror, Kento Sato, Tanzima Islam, and Weikuan Yu. 2017. Metakv: A key-value store for metadata management of distributed burst buffers. In Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS’17). IEEE, 1174--1183.
[39]
Teng Wang, Sarp Oral, Yandong Wang, Brad Settlemyer, Scott Atchley, and Weikuan Yu. 2014. Burstmem: A high-performance burst buffer system for scientific applications. In Proceedings of the IEEE International Conference on Big Data (Big Data’14). IEEE, 71--79.
[40]
Teng Wang, W. Yu, K. Sato, A. Moody, and K. Mohror. 2016. BurstFS: A Distributed Burst Buffer File System for Scientific Applications. Technical Report. Lawrence Livermore National Laboratory, Livermore, CA.
[41]
Y. Wang and D. Kaeli. 2003. Profile-guided I/O partitioning. In Proceedings of the Industrial Control Systems Cyber Security Conference (ICS’03). 252--260.
[42]
Z. Wang, X. Shi, H. Jin, S. Wu, and Y. Chen. 2014. Iteration based collective I/O strategy for parallel I/O systems. In Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid’14). 287--294.
[43]
N. Xiao, Z. Chen, F. Liu, M. Lai, and F. Long. 2011. P3Stor: A parallel, durable flash-based SSD for enterprise-scale storage systems. Inf. Sci. 54, 6, Article 1129 (2011), 1129--1141 pages.
[44]
Feng Ye, Jianxi Chen, Xuejiao Fang, Jieqiong Li, and Dan Feng. 2015. A regional popularity-aware cache replacement algorithm to improve the performance and lifetime of SSD-based disk cache. In Proceedings of the 2015 IEEE International Conference on Networking, Architecture and Storage (NAS’15). IEEE, 45--53.
[45]
Y. Yin, J. Li, J. He, X.-H. Sun, and R. Thakur. 2013. Pattern-direct and layout-aware replication scheme for parallel I/O systems. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’13). 345--356.
[46]
X. Zhang, K. Davis, and S. Jiang. 2010. IOrchestrator: Improving the performance of multi-node I/O systems via inter-server coordination. In Proceedings of the International Conference on Supercomputing (SC’10). 1--11.
[47]
X. Zhang, K. Davis, and S. Jiang. 2012. iTransformer: Using SSD to improve disk scheduling for high-performance I/O. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’12). 715--726.
[48]
X. Zhang, K. Davis, and S. Jiang. 2012. Opportunistic data-driven execution of parallel programs for efficient I/O services. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’12). 330--341.
[49]
X. Zhang and S. Jiang. 2010. InterferenceRemoval: Removing interference of disk access for MPI programs through data replication. In Proceedings of the Industrial Control Systems Cyber Security Conference (ICS’10). 223--232.
[50]
X. Zhang, K. Liu, K. Davis, and S. Jiang. 2013. iBridge: Improving unaligned parallel file access with solid-state drives. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’13). 381--392.
[51]
Dongfang Zhao, Zhao Zhang, Xiaobing Zhou, Tonglin Li, Ke Wang, Dries Kimpe, Philip Carns, Robert Ross, and Ioan Raicu. 2014. Fusionfs: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems. In Proceedings of the IEEE International Conference on Big Data (Big Data’14). IEEE, 61--70.

Cited By

View all
  • (2024)Design and performance analysis of modern computational storage devices: A systematic reviewExpert Systems with Applications10.1016/j.eswa.2024.123570250(123570)Online publication date: Sep-2024
  • (2024)Probabilistic scheduling of dynamic I/O requests via application clustering for burst‐buffers equipped high‐performance computingConcurrency and Computation: Practice and Experience10.1002/cpe.814236:19Online publication date: 27-Jun-2024
  • (2023)QoS-pro: A QoS-enhanced Transaction Processing Framework for Shared SSDsACM Transactions on Architecture and Code Optimization10.1145/363295521:1(1-25)Online publication date: 14-Nov-2023
  • Show More Cited By

Index Terms

  1. Optimizing the SSD Burst Buffer by Traffic Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 17, Issue 1
    March 2020
    206 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/3386454
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 March 2020
    Accepted: 01 December 2019
    Revised: 01 December 2019
    Received: 01 February 2019
    Published in TACO Volume 17, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. High-performance computing
    2. burst buffer
    3. hybrid storage system
    4. solid state drive

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • NSFC
    • National Key Research and Development Plan
    • Outstanding Youth Foundation of Hubei Province

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)290
    • Downloads (Last 6 weeks)55
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Design and performance analysis of modern computational storage devices: A systematic reviewExpert Systems with Applications10.1016/j.eswa.2024.123570250(123570)Online publication date: Sep-2024
    • (2024)Probabilistic scheduling of dynamic I/O requests via application clustering for burst‐buffers equipped high‐performance computingConcurrency and Computation: Practice and Experience10.1002/cpe.814236:19Online publication date: 27-Jun-2024
    • (2023)QoS-pro: A QoS-enhanced Transaction Processing Framework for Shared SSDsACM Transactions on Architecture and Code Optimization10.1145/363295521:1(1-25)Online publication date: 14-Nov-2023
    • (2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
    • (2023)Clustering based Probabilistic I/O Scheduling for Burst-Buffers Equipped HPC2023 IEEE 14th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)10.1109/PAAP60200.2023.10391426(1-6)Online publication date: 24-Nov-2023
    • (2023)Cache eviction for SSD-HDD hybrid storage based on sequential packingJournal of Systems Architecture10.1016/j.sysarc.2023.102930141(102930)Online publication date: Aug-2023
    • (2022)Adaptively Periodic I/O Scheduling for Concurrent HPC ApplicationsElectronics10.3390/electronics1109131811:9(1318)Online publication date: 21-Apr-2022
    • (2022)Efficient Garbage Collection Algorithm for Low Latency SSDElectronics10.3390/electronics1107108411:7(1084)Online publication date: 30-Mar-2022
    • (2022)Degraded Mode-benefited I/O Scheduling to Ensure I/O Responsiveness in RAID-enabled SSDsACM Transactions on Design Automation of Electronic Systems10.1145/352275527:6(1-24)Online publication date: 22-Nov-2022
    • (2021)Sunway supercomputer architecture towards exascale computing: analysis and practiceScience China Information Sciences10.1007/s11432-020-3104-764:4Online publication date: 3-Feb-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media