Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2486001.2486021acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free access

Leveraging endpoint flexibility in data-intensive clusters

Published: 27 August 2013 Publication History
  • Get Citation Alerts
  • Abstract

    Many applications do not constrain the destinations of their network transfers. New opportunities emerge when such transfers contribute a large amount of network bytes. By choosing the endpoints to avoid congested links, completion times of these transfers as well as that of others without similar flexibility can be improved. In this paper, we focus on leveraging the flexibility in replica placement during writes to cluster file systems (CFSes), which account for almost half of all cross-rack traffic in data-intensive clusters. The replicas of a CFS write can be placed in any subset of machines as long as they are in multiple fault domains and ensure a balanced use of storage throughout the cluster.
    We study CFS interactions with the cluster network, analyze optimizations for replica placement, and propose Sinbad -- a system that identifies imbalance and adapts replica destinations to navigate around congested links. Experiments on EC2 and trace-driven simulations show that block writes complete 1.3X (respectively, 1.58X) faster as the network becomes more balanced. As a collateral benefit, end-to-end completion times of data-intensive jobs improve as well. Sinbad does so with little impact on the long-term storage balance.

    References

    [1]
    Amazon EC2. http://aws.amazon.com/ec2.
    [2]
    Amazon Simple Storage Service. http://aws.amazon.com/s3.
    [3]
    Apache Hadoop. http://hadoop.apache.org.
    [4]
    Facebook Production HDFS. http://goo.gl/BGGuf.
    [5]
    How Big is Facebook's Data? 2.5 Billion Pieces Of Content And 500+ Terabytes Ingested Every Day. TechCrunch http://goo.gl/n8xhq.
    [6]
    Total number of objects stored in Amazon S3. http://goo.gl/WTh6o.
    [7]
    S. Agarwal et al. Reoptimizing data parallel computing. In NSDI, 2012.
    [8]
    M. Al-Fares et al. Hedera: Dynamic flow scheduling for data center networks. In NSDI, 2010.
    [9]
    N. Alon et al. Approximation schemes for scheduling on parallel machines. Journal of Scheduling, 1:55--66, 1998.
    [10]
    G. Ananthanarayanan et al. Reining in the outliers in mapreduce clusters using Mantri. In OSDI, 2010.
    [11]
    G. Ananthanarayanan et al. Scarlett: Coping with skewed popularity content in mapreduce clusters. In EuroSys, 2011.
    [12]
    G. Ananthanarayanan et al. PACMan: Coordinated memory caching for parallel jobs. In NSDI, 2012.
    [13]
    T. Benson et al. MicroTE: Fine grained traffic engineering for data centers. In CoNEXT, 2011.
    [14]
    P. Bodik et al. Surviving failures in bandwidth-constrained datacenters. In SIGCOMM, 2012.
    [15]
    D. Borthakur. The Hadoop distributed file system: Architecture and design. Hadoop Project Website, 2007.
    [16]
    D. Borthakur et al. Apache Hadoop goes realtime at Facebook. In SIGMOD, pages 1071--1080, 2011.
    [17]
    B. Calder et al. Windows Azure Storage: A highly available cloud storage service with strong consistency. In SOSP, 2011.
    [18]
    M. Castro et al. Scalable application-level anycast for high dynamic groups. LNCS, 2816:47--57, 2003.
    [19]
    R. Chaiken et al. SCOPE: Easy and efficient parallel processing of massive datasets. In VLDB, 2008.
    [20]
    M. Chowdhury et al. Managing data transfers in computer clusters with Orchestra. In SIGCOMM, 2011.
    [21]
    M. Chowdhury and I. Stoica. Coflow: A networking abstraction for cluster applications. In HotNets-XI, pages 31--36, 2012.
    [22]
    J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004.
    [23]
    M. Y. Eltabakh et al. CoHadoop: Flexible data placement and its exploitation in hadoop. In VLDB, 2011.
    [24]
    A. D. Ferguson et al. Hierarchical policies for Software Defined Networks. In HotSDN, pages 37--42, 2012.
    [25]
    M. Freedman, K. Lakshminarayanan, and D. Mazières. OASIS: Anycast for any service. NSDI, 2006.
    [26]
    M. Garey and D. Johnson. "Strong" NP-completeness results: Motivation, examples, and implications. Journal of the ACM, 25(3):499--508, 1978.
    [27]
    S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In SOSP, 2003.
    [28]
    A. Greenberg et al. VL2: A scalable and flexible data center network. In SIGCOMM, 2009.
    [29]
    C. Guo et al. DCell: A scalable and fault-tolerant network structure for data centers. In SIGCOMM, pages 75--86, 2008.
    [30]
    C. Guo et al. BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers. ACM SIGCOMM, 2009.
    [31]
    Z. Guo et al. Spotting code optimizations in data-parallel pipelines through PeriSCOPE. In OSDI, 2012.
    [32]
    B. Hindman et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In NSDI, 2011.
    [33]
    C. Huang et al. Erasure coding in Windows Azure Storage. In USENIX ATC, 2012.
    [34]
    M. Isard et al. Quincy: Fair scheduling for distributed computing clusters. In SOSP, 2009.
    [35]
    S. Kandula et al. The nature of datacenter traffic: Measurements and analysis. In IMC, 2009.
    [36]
    J. Lenstra, D. Shmoys, and É. Tardos. Approximation algorithms for scheduling unrelated parallel machines. Mathematical Programming, 46(1):259--271, 1990.
    [37]
    R. Motwani, S. Phillips, and E. Torng. Non-clairvoyant scheduling, 1993.
    [38]
    R. N. Mysore et al. PortLand: A scalable fault-tolerant layer 2 data center network fabric. In SIGCOMM, pages 39-50, 2009.
    [39]
    {39} E. Nightingale et al. Flat Datacenter Storage. In OSDI, 2012.
    [40]
    M. Sathiamoorthy et al. XORing elephants: Novel erasure codes for big data. In PVLDB, 2013.
    [41]
    A. Thusoo et al. Data warehousing and analytics infrastructure at Facebook. In SIGMOD, 2010.
    [42]
    R. van Renesse and F. B. Schneider. Chain replication for supporting high throughput and availability. In OSDI, 2004.
    [43]
    M. Zaharia et al. Improving mapreduce performance in heterogeneous environments. In OSDI, 2008.
    [44]
    M. Zaharia et al. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In EuroSys, 2010.
    [45]
    M. Zaharia et al. Resilient distributed datasets: A fault tolerant abstraction for in-memory cluster computing. In NSDI, 2012.

    Cited By

    View all
    • (2024)Designing Non-uniform Locally Repairable Codes for Wide Stripes under Skewed File AccessesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673103(1197-1206)Online publication date: 12-Aug-2024
    • (2023)Toward Optimal Repair and Load Balance in Locally Repairable CodesProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605635(725-735)Online publication date: 7-Aug-2023
    • (2023)Adaptive and Scalable Caching With Erasure Codes in Distributed Cloud-Edge Storage SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2022.316866211:2(1840-1853)Online publication date: 1-Apr-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGCOMM '13: Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
    August 2013
    580 pages
    ISBN:9781450320566
    DOI:10.1145/2486001
    • cover image ACM SIGCOMM Computer Communication Review
      ACM SIGCOMM Computer Communication Review  Volume 43, Issue 4
      October 2013
      595 pages
      ISSN:0146-4833
      DOI:10.1145/2534169
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 August 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cluster file systems
    2. constrained anycast
    3. data-intensive applications
    4. datacenter networks
    5. replica placement

    Qualifiers

    • Research-article

    Conference

    SIGCOMM'13
    Sponsor:
    SIGCOMM'13: ACM SIGCOMM 2013 Conference
    August 12 - 16, 2013
    Hong Kong, China

    Acceptance Rates

    SIGCOMM '13 Paper Acceptance Rate 38 of 246 submissions, 15%;
    Overall Acceptance Rate 462 of 3,389 submissions, 14%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)57
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Designing Non-uniform Locally Repairable Codes for Wide Stripes under Skewed File AccessesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673103(1197-1206)Online publication date: 12-Aug-2024
    • (2023)Toward Optimal Repair and Load Balance in Locally Repairable CodesProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605635(725-735)Online publication date: 7-Aug-2023
    • (2023)Adaptive and Scalable Caching With Erasure Codes in Distributed Cloud-Edge Storage SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2022.316866211:2(1840-1853)Online publication date: 1-Apr-2023
    • (2023)Optimal Rack-Coordinated Updates in Erasure-Coded Data Centers: Design and AnalysisIEEE Transactions on Computers10.1109/TC.2023.3234215(1-14)Online publication date: 2023
    • (2023)BPR: An Erasure Coding Batch Parallel Repair Approach in Distributed Storage SystemsIEEE Access10.1109/ACCESS.2023.325740411(44509-44518)Online publication date: 2023
    • (2022)Repair-Optimal Data Placement for Locally Repairable Codes with Optimal Minimum Hamming DistanceProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545038(1-11)Online publication date: 29-Aug-2022
    • (2022)Understanding the performance of erasure codes in hadoop distributed file systemProceedings of the Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems10.1145/3503646.3524296(24-32)Online publication date: 5-Apr-2022
    • (2022)PushBox: Making Use of Every Bit of Time to Accelerate Completion of Data-Parallel JobsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318203733:12(4256-4269)Online publication date: 1-Dec-2022
    • (2022)Bandwidth-Aware Scheduling Repair Techniques in Erasure-Coded Clusters: Design and AnalysisIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.315306133:12(3333-3348)Online publication date: 1-Dec-2022
    • (2022)Co-Scheduler: A Coflow-Aware Data-Parallel Job Scheduler in Hybrid Electrical/Optical Datacenter NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2022.314323230:4(1599-1612)Online publication date: Aug-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media