research-article

Free access

Leveraging endpoint flexibility in data-intensive clusters

Authors:

Mosharaf Chowdhury,

Srikanth Kandula,

Ion StoicaAuthors Info & Claims

SIGCOMM '13: Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM

Pages 231 - 242

https://doi.org/10.1145/2486001.2486021

Published: 27 August 2013 Publication History

Abstract

Many applications do not constrain the destinations of their network transfers. New opportunities emerge when such transfers contribute a large amount of network bytes. By choosing the endpoints to avoid congested links, completion times of these transfers as well as that of others without similar flexibility can be improved. In this paper, we focus on leveraging the flexibility in replica placement during writes to cluster file systems (CFSes), which account for almost half of all cross-rack traffic in data-intensive clusters. The replicas of a CFS write can be placed in any subset of machines as long as they are in multiple fault domains and ensure a balanced use of storage throughout the cluster.

We study CFS interactions with the cluster network, analyze optimizations for replica placement, and propose Sinbad -- a system that identifies imbalance and adapts replica destinations to navigate around congested links. Experiments on EC2 and trace-driven simulations show that block writes complete 1.3X (respectively, 1.58X) faster as the network becomes more balanced. As a collateral benefit, end-to-end completion times of data-intensive jobs improve as well. Sinbad does so with little impact on the long-term storage balance.

References

[1]

Amazon EC2. http://aws.amazon.com/ec2.

[2]

Amazon Simple Storage Service. http://aws.amazon.com/s3.

[3]

Apache Hadoop. http://hadoop.apache.org.

[4]

Facebook Production HDFS. http://goo.gl/BGGuf.

[5]

How Big is Facebook's Data? 2.5 Billion Pieces Of Content And 500+ Terabytes Ingested Every Day. TechCrunch http://goo.gl/n8xhq.

[6]

Total number of objects stored in Amazon S3. http://goo.gl/WTh6o.

[7]

S. Agarwal et al. Reoptimizing data parallel computing. In NSDI, 2012.

Digital Library

[8]

M. Al-Fares et al. Hedera: Dynamic flow scheduling for data center networks. In NSDI, 2010.

Digital Library

[9]

N. Alon et al. Approximation schemes for scheduling on parallel machines. Journal of Scheduling, 1:55--66, 1998.

[10]

G. Ananthanarayanan et al. Reining in the outliers in mapreduce clusters using Mantri. In OSDI, 2010.

Digital Library

[11]

G. Ananthanarayanan et al. Scarlett: Coping with skewed popularity content in mapreduce clusters. In EuroSys, 2011.

Digital Library

[12]

G. Ananthanarayanan et al. PACMan: Coordinated memory caching for parallel jobs. In NSDI, 2012.

Digital Library

[13]

T. Benson et al. MicroTE: Fine grained traffic engineering for data centers. In CoNEXT, 2011.

Digital Library

[14]

P. Bodik et al. Surviving failures in bandwidth-constrained datacenters. In SIGCOMM, 2012.

Digital Library

[15]

D. Borthakur. The Hadoop distributed file system: Architecture and design. Hadoop Project Website, 2007.

[16]

D. Borthakur et al. Apache Hadoop goes realtime at Facebook. In SIGMOD, pages 1071--1080, 2011.

Digital Library

[17]

B. Calder et al. Windows Azure Storage: A highly available cloud storage service with strong consistency. In SOSP, 2011.

Digital Library

[18]

M. Castro et al. Scalable application-level anycast for high dynamic groups. LNCS, 2816:47--57, 2003.

[19]

R. Chaiken et al. SCOPE: Easy and efficient parallel processing of massive datasets. In VLDB, 2008.

Digital Library

[20]

M. Chowdhury et al. Managing data transfers in computer clusters with Orchestra. In SIGCOMM, 2011.

Digital Library

[21]

M. Chowdhury and I. Stoica. Coflow: A networking abstraction for cluster applications. In HotNets-XI, pages 31--36, 2012.

Digital Library

[22]

J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004.

Digital Library

[23]

M. Y. Eltabakh et al. CoHadoop: Flexible data placement and its exploitation in hadoop. In VLDB, 2011.

Digital Library

[24]

A. D. Ferguson et al. Hierarchical policies for Software Defined Networks. In HotSDN, pages 37--42, 2012.

Digital Library

[25]

M. Freedman, K. Lakshminarayanan, and D. Mazières. OASIS: Anycast for any service. NSDI, 2006.

Digital Library

[26]

M. Garey and D. Johnson. "Strong" NP-completeness results: Motivation, examples, and implications. Journal of the ACM, 25(3):499--508, 1978.

Digital Library

[27]

S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In SOSP, 2003.

Digital Library

[28]

A. Greenberg et al. VL2: A scalable and flexible data center network. In SIGCOMM, 2009.

Digital Library

[29]

C. Guo et al. DCell: A scalable and fault-tolerant network structure for data centers. In SIGCOMM, pages 75--86, 2008.

Digital Library

[30]

C. Guo et al. BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers. ACM SIGCOMM, 2009.

Digital Library

[31]

Z. Guo et al. Spotting code optimizations in data-parallel pipelines through PeriSCOPE. In OSDI, 2012.

Digital Library

[32]

B. Hindman et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In NSDI, 2011.

Digital Library

[33]

C. Huang et al. Erasure coding in Windows Azure Storage. In USENIX ATC, 2012.

Digital Library

[34]

M. Isard et al. Quincy: Fair scheduling for distributed computing clusters. In SOSP, 2009.

Digital Library

[35]

S. Kandula et al. The nature of datacenter traffic: Measurements and analysis. In IMC, 2009.

Digital Library

[36]

J. Lenstra, D. Shmoys, and É. Tardos. Approximation algorithms for scheduling unrelated parallel machines. Mathematical Programming, 46(1):259--271, 1990.

Digital Library

[37]

R. Motwani, S. Phillips, and E. Torng. Non-clairvoyant scheduling, 1993.

[38]

R. N. Mysore et al. PortLand: A scalable fault-tolerant layer 2 data center network fabric. In SIGCOMM, pages 39-50, 2009.

Digital Library

[39]

{39} E. Nightingale et al. Flat Datacenter Storage. In OSDI, 2012.

Digital Library

[40]

M. Sathiamoorthy et al. XORing elephants: Novel erasure codes for big data. In PVLDB, 2013.

Digital Library

[41]

A. Thusoo et al. Data warehousing and analytics infrastructure at Facebook. In SIGMOD, 2010.

Digital Library

[42]

R. van Renesse and F. B. Schneider. Chain replication for supporting high throughput and availability. In OSDI, 2004.

Digital Library

[43]

M. Zaharia et al. Improving mapreduce performance in heterogeneous environments. In OSDI, 2008.

Digital Library

[44]

M. Zaharia et al. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In EuroSys, 2010.

Digital Library

[45]

M. Zaharia et al. Resilient distributed datasets: A fault tolerant abstraction for in-memory cluster computing. In NSDI, 2012.

Digital Library

Cited By

Lin GWu SLi CXu Y(2024)Designing Non-uniform Locally Repairable Codes for Wide Stripes under Skewed File AccessesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673103(1197-1206)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673103
Zhao HWu SLiu HTang ZHe XXu Y(2023)Toward Optimal Repair and Load Balance in Locally Repairable CodesProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605635(725-735)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605635
Liu KPeng JWang JHuang ZPan J(2023)Adaptive and Scalable Caching With Erasure Codes in Distributed Cloud-Edge Storage SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2022.316866211:2(1840-1853)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TCC.2022.3168662
Show More Cited By

Index Terms

Leveraging endpoint flexibility in data-intensive clusters
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

Leveraging endpoint flexibility in data-intensive clusters

Many applications do not constrain the destinations of their network transfers. New opportunities emerge when such transfers contribute a large amount of network bytes. By choosing the endpoints to avoid congested links, completion times of these ...
GridFS: highly scalable I/O solution for clusters and computational grids

I/O has always been performance bottleneck for applications running on clusters. Most traditional storage architectures fail to meet the requirement of concurrent access to the same file that is posed by most High-Performance Computing (HPC) ...
A Network Performance Based Data Placement Policy in Distributed Data-Intensive Applications
CIT '14: Proceedings of the 2014 IEEE International Conference on Computer and Information Technology

Data Placement Policy is a key factor to the performance of distributed data-intensive applications. Usually placing massive data sets efficiently can improve user experience of data-intensive scientific applications under large scale distributed ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGCOMM '13: Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM

August 2013

580 pages

ISBN:9781450320566

DOI:10.1145/2486001

General Chairs:
Dah Ming Chiu
Chinese University of Hong Kong, China
,
Jia Wang
AT&T Labs - Research, USA
,
Program Chairs:
Paul Barford
University of Wisconsin, USA
,
Srinivasan Seshan
Carnegie Mellon University, USA

ACM SIGCOMM Computer Communication Review Volume 43, Issue 4
October 2013
595 pages
ISSN:0146-4833
DOI:10.1145/2534169
Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGCOMM'13

Sponsor:

SIGCOMM

SIGCOMM'13: ACM SIGCOMM 2013 Conference

August 12 - 16, 2013

Hong Kong, China

Acceptance Rates

SIGCOMM '13 Paper Acceptance Rate 38 of 246 submissions, 15%;

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

156
Total Citations
View Citations
863
Total Downloads

Downloads (Last 12 months)57
Downloads (Last 6 weeks)4

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lin GWu SLi CXu Y(2024)Designing Non-uniform Locally Repairable Codes for Wide Stripes under Skewed File AccessesProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673103(1197-1206)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673103
Zhao HWu SLiu HTang ZHe XXu Y(2023)Toward Optimal Repair and Load Balance in Locally Repairable CodesProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605635(725-735)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605635
Liu KPeng JWang JHuang ZPan J(2023)Adaptive and Scalable Caching With Erasure Codes in Distributed Cloud-Edge Storage SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2022.316866211:2(1840-1853)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TCC.2022.3168662
Gong GShen ZChen LWu SLi XLee PWan ZShu J(2023)Optimal Rack-Coordinated Updates in Erasure-Coded Data Centers: Design and AnalysisIEEE Transactions on Computers10.1109/TC.2023.3234215(1-14)Online publication date: 2023
https://doi.org/10.1109/TC.2023.3234215
Song YZhao WWang B(2023)BPR: An Erasure Coding Batch Parallel Repair Approach in Distributed Storage SystemsIEEE Access10.1109/ACCESS.2023.325740411(44509-44518)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3257404
Ma SWu SLi CXu Y(2022)Repair-Optimal Data Placement for Locally Repairable Codes with Optimal Minimum Hamming DistanceProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545038(1-11)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3545008.3545038
Darrous JIbrahim SKuhn MDuwe KAcquaviva JChasapis KBoukhobza J(2022)Understanding the performance of erasure codes in hadoop distributed file systemProceedings of the Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems10.1145/3503646.3524296(24-32)Online publication date: 5-Apr-2022
https://dl.acm.org/doi/10.1145/3503646.3524296
Tian CWang YTian BZhao YZhou YWang CGuan HDou WChen G(2022)PushBox: Making Use of Every Bit of Time to Accelerate Completion of Data-Parallel JobsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318203733:12(4256-4269)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TPDS.2022.3182037
Zhou HFeng DHu Y(2022)Bandwidth-Aware Scheduling Repair Techniques in Erasure-Coded Clusters: Design and AnalysisIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.315306133:12(3333-3348)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TPDS.2022.3153061
Li ZShen H(2022)Co-Scheduler: A Coflow-Aware Data-Parallel Job Scheduler in Hybrid Electrical/Optical Datacenter NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2022.314323230:4(1599-1612)Online publication date: Aug-2022
https://doi.org/10.1109/TNET.2022.3143232
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents