Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2619239.2626316acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free access

CONGA: distributed congestion-aware load balancing for datacenters

Published: 17 August 2014 Publication History

Abstract

We present the design, implementation, and evaluation of CONGA, a network-based distributed congestion-aware load balancing mechanism for datacenters. CONGA exploits recent trends including the use of regular Clos topologies and overlays for network virtualization. It splits TCP flows into flowlets, estimates real-time congestion on fabric paths, and allocates flowlets to paths based on feedback from remote switches. This enables CONGA to efficiently balance load and seamlessly handle asymmetry, without requiring any TCP modifications. CONGA has been implemented in custom ASICs as part of a new datacenter fabric. In testbed experiments, CONGA has 5x better flow completion times than ECMP even with a single link failure and achieves 2-8x better throughput than MPTCP in Incast scenarios. Further, the Price of Anarchy for CONGA is provably small in Leaf-Spine topologies; hence CONGA is nearly as effective as a centralized scheduler while being able to react to congestion in microseconds. Our main thesis is that datacenter fabric load balancing is best done in the network, and requires global schemes such as CONGA to handle asymmetry.

References

[1]
M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In SIGCOMM, 2008.
[2]
M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic Flow Scheduling for Data Center Networks. In NSDI, 2010.
[3]
M. Alizadeh et al. CONGA: Distributed Congestion-Aware Load Balancing for Datacenters. http://simula.stanford.edu/ alizade/papers/conga-techreport.pdf.
[4]
M. Alizadeh et al. Data center TCP (DCTCP). In SIGCOMM, 2010.
[5]
M. Alizadeh et al. pFabric: Minimal Near-optimal Datacenter Transport. In SIGCOMM, 2013.
[6]
R. Banner and A. Orda. Bottleneck Routing Games in Communication Networks. Selected Areas in Communications, IEEE Journal on, 25(6):1173--1179, 2007.
[7]
M. Beck and M. Kagan. Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. In DC-CaVES, 2011.
[8]
T. Benson, A. Akella, and D. A. Maltz. Network Traffic Characteristics of Data Centers in the Wild. In SIGCOMM, 2010.
[9]
T. Benson, A. Anand, A. Akella, and M. Zhang. MicroTE: Fine Grained Traffic Engineering for Data Centers. In CoNEXT, 2011.
[10]
J. Cao et al. Per-packet Load-balanced, Low-latency Routing for Clos-based Data Center Networks. In CoNEXT, 2013.
[11]
Y. Cao, M. Xu, X. Fu, and E. Dong. Explicit Multipath Congestion Control for Data Center Networks. In CoNEXT, 2013.
[12]
Y. Chen, R. Griffith, J. Liu, R. H. Katz, and A. D. Joseph. Understanding TCP Incast Throughput Collapse in Datacenter Networks. In WREN, 2009.
[13]
N. Dukkipati and N. McKeown. Why Flow-completion Time is the Right Metric for Congestion Control. SIGCOMM Comput. Commun. Rev., 2006.
[14]
A. Elwalid, C. Jin, S. Low, and I. Widjaja. MATE: MPLS adaptive traffic engineering. In INFOCOM, 2001.
[15]
B. Fortz and M. Thorup. Internet traffic engineering by optimizing OSPF weights. In INFOCOM, 2000.
[16]
R. Gallager. A Minimum Delay Routing Algorithm Using Distributed Computation. Communications, IEEE Transactions on, 1977.
[17]
P. Gill, N. Jain, and N. Nagappan. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications. In SIGCOMM, 2011.
[18]
A. Greenberg et al. VL2: a scalable and flexible data center network. In SIGCOMM, 2009.
[19]
Apache Hadoop. http://hadoop.apache.org/.
[20]
C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing Flows Quickly with Preemptive Scheduling. In SIGCOMM, 2012.
[21]
C.-Y. Hong et al. Achieving High Utilization with Software-driven WAN. In SIGCOMM, 2013.
[22]
R. Jain and S. Paul. Network virtualization and software defined networking for cloud computing: a survey. Communications Magazine, IEEE, 51(11):24--31, 2013.
[23]
S. Jain et al. B4: Experience with a Globally-deployed Software Defined Wan. In SIGCOMM, 2013.
[24]
S. Jansen and A. McGregor. Performance, Validation and Testing with the Network Simulation Cradle. In MASCOTS, 2006.
[25]
V. Jeyakumar et al. EyeQ: Practical Network Performance Isolation at the Edge. In NSDI, 2013.
[26]
S. Kandula, D. Katabi, B. Davie, and A. Charny. Walking the Tightrope: Responsive Yet Stable Traffic Engineering. In SIGCOMM, 2005.
[27]
S. Kandula, D. Katabi, S. Sinha, and A. Berger. Dynamic Load Balancing Without Packet Reordering. SIGCOMM Comput. Commun. Rev., 37(2):51--62, Mar. 2007.
[28]
S. Kandula, S. Sengupta, A. Greenberg, P. Patel, and R. Chaiken. The nature of data center traffic: measurements & analysis. In IMC, 2009.
[29]
R. Kapoor et al. Bullet Trains: A Study of NIC Burst Behavior at Microsecond Timescales. In CoNEXT, 2013.
[30]
P. Key, L. Massoulié, and D. Towsley. Path Selection and Multipath Congestion Control. Commun. ACM, 54(1):109--116, Jan. 2011.
[31]
A. Khanna and J. Zinky. The Revised ARPANET Routing Metric. In SIGCOMM, 1989.
[32]
M. Kodialam, T. V. Lakshman, J. B. Orlin, and S. Sengupta. Oblivious Routing of Highly Variable Traffic in Service Overlays and IP Backbones. IEEE/ACM Trans. Netw., 17(2):459--472, Apr. 2009.
[33]
T. Koponen et al. Network Virtualization in Multi-tenant Datacenters. In NSDI, 2014.
[34]
V. Liu, D. Halperin, A. Krishnamurthy, and T. Anderson. F10: A Fault-tolerant Engineered Network. In NSDI, 2013.
[35]
M. Mahalingam et al. VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks. http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-06, 2013.
[36]
N. Michael, A. Tang, and D. Xu. Optimal link-state hop-by-hop routing. In ICNP, 2013.
[37]
MultiPath TCP - Linux Kernel implementation. http://www.multipath-tcp.org/.
[38]
T. Narten et al. Problem Statement: Overlays for Network Virtualization. http://tools.ietf.org/html/draft-ietf-nvo3-overlay-problem-statement-04, 2013.
[39]
J. Ousterhout et al. The case for RAMCloud. Commun. ACM, 54, July 2011.
[40]
C. Papadimitriou. Algorithms, Games, and the Internet. In Proc. of STOC, 2001.
[41]
C. Raiciu et al. Improving datacenter performance and robustness with multipath tcp. In SIGCOMM, 2011.
[42]
M. Roughan, M. Thorup, and Y. Zhang. Traffic engineering with estimated traffic matrices. In IMC, 2003.
[43]
T. Roughgarden. Selfish Routing and the Price of Anarchy. The MIT Press, 2005.
[44]
S. Sen, D. Shue, S. Ihm, and M. J. Freedman. Scalable, Optimal Flow Routing in Datacenters via Local Link Balancing. In CoNEXT, 2013.
[45]
M. Sridharan et al. NVGRE: Network Virtualization using Generic Routing Encapsulation. http://tools.ietf.org/html/draft-sridharan-virtualization-nvgre-03, 2013.
[46]
A. Varga et al. The OMNeT+ discrete event simulation system. In ESM, 2001.
[47]
V. Vasudevan et al. Safe and effective fine-grained TCP retransmissions for datacenter communication. In SIGCOMM, 2009.
[48]
S. Vutukury and J. J. Garcia-Luna-Aceves. A Simple Approximation to Minimum-delay Routing. In SIGCOMM, 1999.
[49]
H. Wang et al. COPE: Traffic Engineering in Dynamic Networks. In SIGCOMM, 2006.
[50]
D. Wischik, C. Raiciu, A. Greenhalgh, and M. Handley. Design, Implementation and Evaluation of Congestion Control for Multipath TCP. In NSDI, 2011.
[51]
D. Xu, M. Chiang, and J. Rexford. Link-state Routing with Hop-by-hop Forwarding Can Achieve Optimal Traffic Engineering. IEEE/ACM Trans. Netw., 19(6):1717--1730, Dec. 2011.
[52]
D. Zats, T. Das, P. Mohan, D. Borthakur, and R. H. Katz. DeTail: Reducing the Flow Completion Time Tail in Datacenter Networks. In SIGCOMM, 2012.

Cited By

View all
  • (2024)Opportunistic Packet Forwarding for Proactive Transport in Datacenters2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619903(1-9)Online publication date: 3-Jun-2024
  • (2024)Practical Heavy-Hitter Detection Algorithms for Programmable Switches2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619799(377-385)Online publication date: 3-Jun-2024
  • (2024)POSTER: Hybrid-Granularity Network Load balancing for Distributed AI Model TrainingProceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos10.1145/3672202.3673739(51-53)Online publication date: 4-Aug-2024
  • Show More Cited By

Index Terms

  1. CONGA: distributed congestion-aware load balancing for datacenters

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGCOMM '14: Proceedings of the 2014 ACM conference on SIGCOMM
    August 2014
    662 pages
    ISBN:9781450328364
    DOI:10.1145/2619239
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 August 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. datacenter fabric
    2. distributed
    3. load balancing

    Qualifiers

    • Research-article

    Conference

    SIGCOMM'14
    Sponsor:
    SIGCOMM'14: ACM SIGCOMM 2014 Conference
    August 17 - 22, 2014
    Illinois, Chicago, USA

    Acceptance Rates

    SIGCOMM '14 Paper Acceptance Rate 45 of 242 submissions, 19%;
    Overall Acceptance Rate 462 of 3,389 submissions, 14%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)860
    • Downloads (Last 6 weeks)120
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Opportunistic Packet Forwarding for Proactive Transport in Datacenters2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619903(1-9)Online publication date: 3-Jun-2024
    • (2024)Practical Heavy-Hitter Detection Algorithms for Programmable Switches2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619799(377-385)Online publication date: 3-Jun-2024
    • (2024)POSTER: Hybrid-Granularity Network Load balancing for Distributed AI Model TrainingProceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos10.1145/3672202.3673739(51-53)Online publication date: 4-Aug-2024
    • (2024)POSTER: CAVER: Enhancing RDMA Load Balancing by Hunting Less-Congested PathsProceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos10.1145/3672202.3673729(39-41)Online publication date: 4-Aug-2024
    • (2024)Network Load Balancing with Parallel Flowlets for AI Training ClustersProceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing10.1145/3672198.3673794(18-25)Online publication date: 4-Aug-2024
    • (2024)LEFT: LightwEight and FasT packet Reordering for RDMAProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663418(67-73)Online publication date: 3-Aug-2024
    • (2024)HF^2T: Host-Based Flowlet Fine-Tuning for RDMA Load BalancingProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663410(9-15)Online publication date: 3-Aug-2024
    • (2024)Alibaba HPN: A Data Center Network for Large Language Model TrainingProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672265(691-706)Online publication date: 4-Aug-2024
    • (2024)FIGRET: Fine-Grained Robustness-Enhanced Traffic EngineeringProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672258(117-135)Online publication date: 4-Aug-2024
    • (2024)Uniform-Cost Multi-Path Routing for Reconfigurable Data Center NetworksProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672245(433-448)Online publication date: 4-Aug-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media