Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3219104.3229276acmotherconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article
Public Access

Navigating the Unexpected Realities of Big Data Transfers in a Cloud-based World

Published: 22 July 2018 Publication History

Abstract

The emergence of big data has created new challenges for researchers transmitting big data sets across campus networks to local (HPC) cloud resources, or over wide area networks to public cloud services. Unlike conventional HPC systems where the network is carefully architected (e.g., a high speed local interconnect, or a wide area connection between Data Transfer Nodes), today's big data communication often occurs over shared network infrastructures with many external and uncontrolled factors influencing performance.
This paper describes our efforts to understand and characterize the performance of various big data transfer tools such as rclone, cyberduck, and other provider-specific CLI tools when moving data to/from public and private cloud resources. We analyze the various parameter settings available on each of these tools and their impact on performance. Our experimental results give insights into the performance of cloud providers and transfer tools, and provide guidance for parameter settings when using cloud transfer tools. We also explore performance when coming from HPC DTN nodes as well as researcher machines located deep in the campus network, and show that emerging SDN approaches such as the VIP Lanes system can deliver excellent performance even from researchers' machines.

References

[1]
W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster. 2005. The Globus Striped GridFTP Framework and Server. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing.
[2]
Amazon. 2018. AWS Command Line Interface. https://aws.amazon.com/cli/. (2018).
[3]
J. Basney and P. Duda. 2007. Clustering the Reliable File Transfer Service. In Proceedings of the 2007 TeraGrid Conference.
[4]
E. Bocchi, I. Drago, and M. Mellia. 2017. Personal Cloud Storage Benchmarks and Comparison. IEEE Transactions on Cloud Computing 5, 4 (Oct 2017), 751--764.
[5]
E. Bocchi, M. Mellia, and S. Sarni. 2014. Cloud storage service benchmarking: Methodologies and experimentations. In 2014 IEEE 3rd International Conference on Cloud Networking (CloudNet). 395--400.
[6]
Dropbox. 2018. dbxcli: A command line tool for Dropbox users and team admins. https://github.com/dropbox/dbxcli. (2018).
[7]
J. GriffiRoen, K. Calvert, Z. Fei, S. Rivera, J. Chappell, M. Hayashida, C. Carpenter, Y. Song, and H. Nasir. 2017. VIP Lanes: High-Speed Custom Communication Paths for Authorized Flows. In 2017 26th International Conference on Computer Communication and Networks (ICCCN). 1--9.
[8]
D. Kocher, Y. Langisch, and J. Malek. 2018. Cyberduck. https://cyberduck.io/. (2018).
[9]
Microsoft. 2018. Azure CLI 2.0. https://docs.microsoft.com/en-us/cli/azure/?view=azure-cli-latest. (2018).
[10]
Nick Craig Wood. 2018. Rclone - rsync for cloud storage. https://rclone.org/. (2018).
[11]
The University of Utah. 2018. Exploring the Effects of Options on Performance. https://www.chpc.utah.edu/documentation/software/rclone.php. (2018).
[12]
V. Persico, A. Montieri, and A. PescapÃĺ. 2016. On the Network Performance of Amazon S3 Cloud-Storage Service. In 2016 5th IEEE International Conference on Cloud Networking (Cloudnet). 113--118.
[13]
Petter Rasmussen. 2017. Google Drive CLI client. https://github.com/prasmussen/gdrive. (2017).
[14]
P. Shen, K. Guo, and M. Xiao. 2014. Measuring the QoS of Personal Cloud Storage. In Fifth International Conference on Computing, Communications and Networking Technologies (ICCCNT). 1--6.

Cited By

View all
  • (2020)An automation API for authentication and security for file uploads in the cloud storage environmentIntelligent Decision Technologies10.3233/IDT-19012914:3(393-407)Online publication date: 29-Sep-2020

Index Terms

  1. Navigating the Unexpected Realities of Big Data Transfers in a Cloud-based World

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      PEARC '18: Proceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity
      July 2018
      652 pages
      ISBN:9781450364461
      DOI:10.1145/3219104
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 July 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Big Data Flows
      2. Data Transfer Tools
      3. Software-Defined Networks

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      PEARC '18

      Acceptance Rates

      PEARC '18 Paper Acceptance Rate 79 of 123 submissions, 64%;
      Overall Acceptance Rate 133 of 202 submissions, 66%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)71
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 13 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)An automation API for authentication and security for file uploads in the cloud storage environmentIntelligent Decision Technologies10.3233/IDT-19012914:3(393-407)Online publication date: 29-Sep-2020

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media