Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3544216.3544226acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Open access

PLB: congestion signals are simple and effective for network load balancing

Published: 22 August 2022 Publication History
  • Get Citation Alerts
  • Abstract

    We present a new, host-based design for link load balancing and report the first experiences of link imbalance in datacenters. Our design, PLB (Protective Load Balancing), builds on transport protocols and ECMP/WCMP to reduce network hotspots. PLB randomly changes the paths of connections that experience congestion, preferring to repath after idle periods to minimize packet reordering. It repaths a connection by changing the IPv6 Flow Label on its packets, which switches include as part of ECMP/WCMP. Across hosts, this action drives down hotspots in the network, and lowers the latency of RPCs.
    PLB is used fleetwide at Google for TCP and Pony Express traffic. We could deploy it when other designs were infeasible because PLB requires only small transport modifications and switch configuration changes, and is backwards-compatible. It has produced excellent gains: the median utilization imbalance of highly-loaded ToR uplinks in Google datacenters fell by 60%, packet drops correspondingly fell by 33%, and the tail latency (99p) of small RPCs fell by 20%. PLB is also a general solution that works for settings from datacenters to backbone networks, as well as different transports.

    Supplementary Material

    PDF File (p207-qureshi-supp.pdf)
    Supplemental material.

    References

    [1]
    2018. Tolly Report: Mellanox Spectrum Switch vs. Broadcom Tomahawk. https://community.mellanox.com/s/article/tolly-report-mellanox-spectrum-switch-vs-broadcom-tomahawk. (2018).
    [2]
    2018. Trident 3 Dynamic Load Balancing. https://www.broadcom.com/video/b468431136744543913129cd6a0caa30. (2018).
    [3]
    Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, Amin Vahdat, et al. 2010. Hedera: dynamic flow scheduling for data center networks. In Nsdi, Vol. 10. San Jose, USA, 89--92.
    [4]
    Mohammad Alizadeh, Tom Edsall, et al. 2014. CONGA: Distributed congestion-aware load balancing for datacenters. In Proceedings of the 2014 ACM conference on SIGCOMM. 503--514.
    [5]
    Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data Center TCP (DCTCP). In SIGCOMM.
    [6]
    S Amante, B Carpenter, S Jiang, and J Rajahalme. 2011. RFC 6437: IPv6 flow label specification. IETF, November (2011).
    [7]
    Theophilus Benson, Aditya Akella, and David A. Maltz. 2010. Network Traffic Characteristics of Data Centers in the Wild. In IMC. 267--280.
    [8]
    Theophilus Benson, Ashok Anand, Aditya Akella, and Ming Zhang. 2011. MicroTE: Fine grained traffic engineering for data centers. In Proceedings of the seventh conference on emerging networking experiments and technologies. 1--12.
    [9]
    Olivier Bonaventure, Christoph Paasch, Gregory Detal, et al. 2017. Use cases and operational experience with multipath TCP. RFC 8041 (2017).
    [10]
    Neal Cardwell, Yuchung Cheng, et al. 2019. BBR v2: A Model-based Congestion Control. IETF 105. https://datatracker.ietf.org/meeting/105/materials/slides-105-iccrg-bbr-v2-a-model-based-congestion-control-00. (2019).
    [11]
    Neal Cardwell, Yuchung Cheng, et al. 2020. BBR Update:1: BBR.Swift; 2: Scalable Loss Handling. IETF 109. https://datatracker.ietf.org/meeting/109/materials/slides-109-iccrg-update-on-bbrv2-00. (Nov 2020).
    [12]
    Brian Carpenter and Shane Amante. 2011. Using the IPv6 flow label for equal cost multipath routing and link aggregation in tunnels. Technical Report. RFC 6438, November.
    [13]
    Yuchung Cheng, Neal Cardwell, Nandita Dukkipati, and Priyaranjan Jha. 2021. RFC 8985 The RACK-TLP Loss Detection Algorithm for TCP. (2021).
    [14]
    Andrew R Curtis, Wonho Kim, and Praveen Yalagandula. 2011. Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection. In 2011 Proceedings IEEE INFOCOM. IEEE, 1629--1637.
    [15]
    Advait Abhay Dixit, Pawan Prakash, Y. Charlie Hu, and Ramana Rao Kompella. 2013. On the impact of packet spraying in data center networks. In INFOCOM.
    [16]
    William Feller. 1954. Diffusion processes in one dimension. Trans. Amer. Math. Soc. 77, 1 (1954), 1--31.
    [17]
    Andrew D Ferguson, Steve Gribble, Chi-Yao Hong, Charles Edwin Killian, et al. 2021. Orion: Google's Software-Defined Networking Control Plane. In NSDI. 83--98.
    [18]
    Sally Floyd, Jamshid Mahdavi, Matt Mathis, and Matt Podolsky. 2000. RFC2883: An extension to the selective acknowledgement (SACK) option for TCP. (2000).
    [19]
    Yilong Geng, Vimalkumar Jeyakumar, Abdul Kabbani, and Mohammad Alizadeh. 2016. Juggler: a practical reordering resilient network stack for datacenters. In Proceedings of the Eleventh European Conference on Computer Systems. 1--16.
    [20]
    Soudeh Ghorbani, Zibin Yang, P Brighten Godfrey, Yashar Ganjali, and Amin Firoozshahian. 2017. Drill: Micro load balancing for low-latency data center networks. In Proceedings of the ACM SIGCOMM. 225--238.
    [21]
    Google. 2022. TCP-PLB source code. (2022). https://github.com/google/plb.
    [22]
    Douglas Richard Hanks. [n. d.]. Juniper QFX10000 Series. Chapter 4. Performance and Scale. https://www.oreilly.com/library/view/juniper-qfx10000-series/9781491922248/ch04.html. ([n. d.]).
    [23]
    Chi-Yao Hong, Subhasree Mandal, et al. 2018. B4 and after: managing hierarchy, partitioning, and asymmetry for availability and scale in google's software-defined WAN. In Proceedings of the ACM SIGCOMM. 74--87.
    [24]
    Abdul Kabbani, Balajee Vamanan, Jahangir Hasan, and Fabien Duchene. 2014. Flowbender: Flow-level adaptive routing for improved latency and throughput in datacenter networks. In Proceedings of ACM CoNEXT. 149--160.
    [25]
    Abdulkadir Karaagac and Jeroen Hoebeke. [n. d.]. In-band Network Telemetry for 6TiSCH Networks. Internet-Draft draft-karaagac-6tisch-int-00. Internet Engineering Task Force. https://datatracker.ietf.org/doc/html/draft-karaagac-6tisch-int-00 Work in Progress.
    [26]
    Naga Katta, Aditi Ghag, Mukesh Hira, Isaac Keslassy, Aran Bergman, Changhoon Kim, and Jennifer Rexford. 2017. Clove: Congestion-aware load balancing at the virtual edge. In Proceedings of the 13th International Conference on emerging Networking EXperiments and Technologies. 323--335.
    [27]
    Naga Katta, Mukesh Hira, Changhoon Kim, Anirudh Sivaraman, and Jennifer Rexford. 2016. Hula: Scalable load balancing using programmable data planes. In Proceedings of the Symposium on SDN Research. 1--12.
    [28]
    Frank Kelly. 2003. Fairness and stability of end-to-end congestion control. European journal of control 9, 2--3 (2003), 159--176.
    [29]
    William Knight and D. M. Bloom. 1973. E2386. The American Mathematical Monthly 80, 10 (1973), 1141--1142. http://www.jstor.org/stable/2318556
    [30]
    Gautam Kumar, Nandita Dukkipati, et al. 2020. Swift: Delay is Simple and Effective for Congestion Control in the Datacenter. In Proceedings of the SIGCOMM.
    [31]
    Ming Li, Deepak Ganesan, and Prashant Shenoy. 2009. PRESTO: Feedback-driven data management in sensor networks. IEEE/ACM Transactions on Networking 17, 4 (2009), 1256--1269.
    [32]
    Steven H Low. 2003. A duality model of TCP and queue management algorithms. IEEE/ACM Transactions On Networking 11, 4 (2003), 525--536.
    [33]
    Michael Marty, Marc de Kruijf, et al. 2019. Snap: a Microkernel Approach to Host Networking. In SOSP.
    [34]
    Kihong Park, Gitae Kim, and Mark E Crovella. 1997. Effect of traffic self-similarity on network performance. In Performance and Control of Network Systems, Vol. 3231. International Society for Optics and Photonics, 296--310.
    [35]
    Jonathan Perry, Amy Ousterhout, Hari Balakrishnan, Devavrat Shah, and Hans Fugal. 2014. Fastpass: A centralized zero-queue datacenter network. (2014).
    [36]
    Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C Snoeren. 2015. Inside the social network's (datacenter) network. In Proceedings of the ACM SIGCOMM. 123--137.
    [37]
    Siddhartha Sen, David Shue, Sunghwan Ihm, and Michael J Freedman. 2013. Scalable, optimal flow routing in datacenters via local link balancing. In Proceedings of the ninth ACM conference on Emerging networking experiments and technologies. 151--162.
    [38]
    Arjun Singhvi, Aditya Akella, Dan Gibson, Thomas F Wenisch, et al. 2020. 1RMA: Re-Envisioning Remote Memory Access for Multi-Tenant Datacenters. In Proceedings of the ACM SIGCOMM. 708--721.
    [39]
    Shan Sinha, Srikanth Kandula, and Dina Katabi. 2004. Harnessing TCP's burstiness with flowlet switching. In Proc. 3rd ACM Workshop on Hot Topics in Networks (Hotnets-III).
    [40]
    D Thaler and C Hopps. 2000. RFC 2991 Multipath Issues in Unicast and Multicast Next-Hop Selection. (2000).
    [41]
    Erico Vanini, Rong Pan, Mohammad Alizadeh, Parvin Taheri, and Tom Edsall. 2017. Let it flow: Resilient asymmetric load balancing with flowlet switching. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17). 407--420.
    [42]
    Peng Wang, Hong Xu, Zhixiong Niu, Dongsu Han, and Yongqiang Xiong. 2016. Expeditus: Congestion-aware load balancing in clos data center networks. In Proceedings of the Seventh ACM Symposium on Cloud Computing. 442--455.
    [43]
    David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, and Randy Katz. 2012. DeTail: Reducing the flow completion time tail in datacenter networks. In Proceedings of the ACM SIGCOMM. 139--150.
    [44]
    Hong Zhang, Junxue Zhang, Wei Bai, Kai Chen, and Mosharaf Chowdhury. 2017. Resilient datacenter load balancing in the wild. In Proceedings of the ACM SIGCOMM. 253--266.
    [45]
    Junlan Zhou, Malveeka Tewari, Min Zhu, Abdul Kabbani, Leon Poutievski, Arjun Singh, and Amin Vahdat. 2014. WCMP: Weighted cost multipathing for improved fairness in data centers. In Proceedings of the Ninth European Conference on Computer Systems. 1--14.

    Cited By

    View all
    • (2024)Opportunistic Packet Forwarding for Proactive Transport in Datacenters2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619903(1-9)Online publication date: 3-Jun-2024
    • (2024)POSTER: CAVER: Enhancing RDMA Load Balancing by Hunting Less-Congested PathsProceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos10.1145/3672202.3673729(39-41)Online publication date: 4-Aug-2024
    • (2024)Network Load Balancing with Parallel Flowlets for AI Training ClustersProceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing10.1145/3672198.3673794(18-25)Online publication date: 4-Aug-2024
    • Show More Cited By

    Index Terms

    1. PLB: congestion signals are simple and effective for network load balancing

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SIGCOMM '22: Proceedings of the ACM SIGCOMM 2022 Conference
        August 2022
        858 pages
        ISBN:9781450394208
        DOI:10.1145/3544216
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 22 August 2022

        Check for updates

        Author Tags

        1. congestion control
        2. datacenter fabric
        3. distributed
        4. load balancing

        Qualifiers

        • Research-article

        Conference

        SIGCOMM '22
        Sponsor:
        SIGCOMM '22: ACM SIGCOMM 2022 Conference
        August 22 - 26, 2022
        Amsterdam, Netherlands

        Acceptance Rates

        Overall Acceptance Rate 462 of 3,389 submissions, 14%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)2,741
        • Downloads (Last 6 weeks)288
        Reflects downloads up to 12 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Opportunistic Packet Forwarding for Proactive Transport in Datacenters2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619903(1-9)Online publication date: 3-Jun-2024
        • (2024)POSTER: CAVER: Enhancing RDMA Load Balancing by Hunting Less-Congested PathsProceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos10.1145/3672202.3673729(39-41)Online publication date: 4-Aug-2024
        • (2024)Network Load Balancing with Parallel Flowlets for AI Training ClustersProceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing10.1145/3672198.3673794(18-25)Online publication date: 4-Aug-2024
        • (2024)Alibaba HPN: A Data Center Network for Large Language Model TrainingProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672265(691-706)Online publication date: 4-Aug-2024
        • (2024)Turbo: Efficient Communication Framework for Large-scale Data Processing ClusterProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672241(540-553)Online publication date: 4-Aug-2024
        • (2024)μMon: Empowering Microsecond-level Network Monitoring with WaveletsProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672236(274-290)Online publication date: 4-Aug-2024
        • (2024)Triton: A Flexible Hardware Offloading Architecture for Accelerating Apsara vSwitch in Alibaba CloudProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672224(750-763)Online publication date: 4-Aug-2024
        • (2024)QALL: Distributed Queue-Behavior-Aware Load Balancing Using Programmable Data PlanesIEEE Transactions on Network and Service Management10.1109/TNSM.2023.334586221:2(2303-2322)Online publication date: Apr-2024
        • (2024)Cyclic Matrix Coding to Mitigate ACK Blocking of MPTCP in Data Center NetworksIEEE Transactions on Cloud Computing10.1109/TCC.2024.336653412:2(419-430)Online publication date: Apr-2024
        • (2024)DDR: A Deadline-Driven Routing Protocol for Delay Guaranteed ServiceIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621415(941-950)Online publication date: 20-May-2024
        • Show More Cited By

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media