Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2612262.2612269acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

An evaluation of BitTorrent's performance in HPC environments

Published: 10 June 2014 Publication History
  • Get Citation Alerts
  • Abstract

    A number of novel decentralized systems have recently been developed to address challenges of scale in large distributed systems. The suitability of such systems for meeting the challenges of scale in high performance computing (HPC) systems is unclear, however. In this paper, we begin to answer this question by examining the suitability of the popular BitTorrent protocol to handle dynamic shared library distribution in HPC systems. To that end, we describe the architecture and implementation of a system that uses BitTorrent to distribute shared libraries in HPC systems, evaluate and optimize BitTorrent protocol usage for the HPC environment, and measure the performance of the resulting system. Our results demonstrate the potential viability of BitTorrent-style protocols in HPC systems, but also highlight the challenges of these protocols. In particular, our results show that the protocol mechanisms meant to enforce fairness in a distributed computing environment can have a significant impact on system performance if not properly taken into account in system design and implementation.

    References

    [1]
    D. H. Ahn, M. J. Brim, B. R. de Supinski, T. Gamblin, G. L. Lee, M. P. LeGendre, B. P. Miller, A. Moody, and M. Schulz, Efficient and scalable retrieval techniques for global file properties, in Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, IEEE, 2013, pp. 369--380.
    [2]
    N. Ali, P. Carns, K. Iskra, D. Kimpe, S. Lang, R. Latham, R. Ross, L. Ward, and P. Sadayappan, Scalable I/O forwarding framework for high-performance computing systems, in International Conference on Cluster Computing, IEEE, Sept. 2009.
    [3]
    B. Barrett, R. Barrett, J. Brandt, R. Brightwell, M. Curry, N. Fabian, K. Ferreira, A. Gentile, S. Hemmert, S. Kelly, R. Klundt, J. H. Laros III, V. Leung, M. Levenhagen, G. Lofstead, K. Moreland, R. Oldfield, K. Pedretti, A. Rodrigues, D. Thompson, T. Tucker, L. Ward, J. V. Dyke, C. Vaughan, and K. Wheeler, Report of Experiments and Evidence for ASC L2 Milestone 4467 - Demonstration of a Legacy Application's Path to Exascale, Technical Report SAND2012-1750, Sandia National Laboratories, March 2012.
    [4]
    B. Cohen, The bittorrent protocol specification, 2008.
    [5]
    M. G. Dosanjh, P. G. Bridges, S. M. Kelly, and J. H. Laros III, A peer-to-peer architecture for supporting dynamic shared libraries in large-scale systems, in Parallel Processing Workshops (ICPPW), 2012 41st International Conference on, IEEE, 2012, pp. 55--61.
    [6]
    D. Engling, opentracker--an open and free bittorrent tracker, Web, 2010.
    [7]
    W. Frings, D. H. Ahn, M. P. LeGendre, T. Gamblin, B. R. de Supinski, and F. Wolf, Massively parallel loading., in ICS, 2013, pp. 389--398.
    [8]
    H. N. Greenberg, L. Ionkov, and R. Minnich, XGet: A Highly Scalable and Efficient File Transfer Tool for Clusters, in LCI International Conference on High-Performance Clustered Computing, January 2009.
    [9]
    D. Holmes, Enhanced ctorrent, http://www.rahul.net/dholmes/ctorrent.
    [10]
    S. M. Kelly, R. Klundt, and J. H. Laros III, Shared Libraries on a Capability Class Computer, in Cray User Group Annual Technical Conference, May 2011.
    [11]
    J. H. Laros III, S. M. Kelly, M. J. Levenhagen, and K. T. Pedretti, Investigating Methods of Supporting Dynamically Linked Executables on High Performance Computing Platforms, Technical Report SAND2009-5515, Sandia National Laboratories, 2009.
    [12]
    G. L. Lee, D. H. Ahn, B. R. de Supinski, J. Gyllenhaal, and P. Miller, Pynamic: the python dynamic benchmark, in Proceedings of the IEEE 10th International Symposium on Workload Characterization, Sept. 2007, pp. 101--106.
    [13]
    Magic Ermine. http://www.magicermine.com/erk/.
    [14]
    K. Ohta, D. Kimpe, J. Cope, K. Iskra, R. Ross, and Y. Ishikawa, Optimization Techniques at the I/O Forwarding Layer, in International Conference on Cluster Computing, IEEE, Sept. 2010.
    [15]
    S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, A Scalable Content-Addressable Network, in Special Interest Group on Data Communication (SIGCOMM), August 2001.
    [16]
    A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems, in IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), November 2001.
    [17]
    P. Soltero, P. Bridges, D. Arnold, and M. Lang, A gossip-based approach to exascale system services, in Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ACM, 2013, p. 3.
    [18]
    I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications, in Special Interest Group on Data Communication (SIGCOMM), August 2001.
    [19]
    S. Sugiyama and D. Wallace, Cray DVS: Data Virtualization Service, in Cray User Group Annual Technical Conference, May 2008.
    [20]
    V. Vishwanath, M. Hereld, K. Iskra, D. Kimpe, V. Morozov, M. Papka, R. Ross, and K. Yoshii, Accelerating I/O Forwarding in IBM Blue Gene/P Systems, in Internatioinal Conference for High Performance Computing, Networking, Storage and Analysis (SC), ACM, Nov. 2010.
    [21]
    B. Welton, D. Kimpe, J. Cope, C. Patrick, K. Iskra, and R. Ross, Improving I/O Forwarding Throughput with Data Compression, in International Conference on Cluster Computing, IEEE, Sept. 2011.
    [22]
    B. Y. Zhao, K. J. D., and A. D. Joseph, Tapestry: a fault-tolerant wide-area application infrastructure, SIGCOMM Comput. Commun. Rev., 32 (2002).
    [23]
    Z. Zhao, M. Davis, K. Antypas, Y. Yao, R. Lee, and T. Butler, Shared library performance on Hopper, in Cray User Group Annual Technical Conference, May 2012.

    Cited By

    View all
    • (2023)TaskVine: Managing In-Cluster Storage for High-Throughput Data Intensive WorkflowsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624277(1978-1988)Online publication date: 12-Nov-2023

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ROSS '14: Proceedings of the 4th International Workshop on Runtime and Operating Systems for Supercomputers
    June 2014
    76 pages
    ISBN:9781450329507
    DOI:10.1145/2612262
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • SPCL: Scalable Parallel Computing Laboratory

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ROSS '14
    Sponsor:
    • SPCL

    Acceptance Rates

    ROSS '14 Paper Acceptance Rate 9 of 16 submissions, 56%;
    Overall Acceptance Rate 58 of 169 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)TaskVine: Managing In-Cluster Storage for High-Throughput Data Intensive WorkflowsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624277(1978-1988)Online publication date: 12-Nov-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media