Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3098822.3098825acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free access

Re-architecting datacenter networks and stacks for low latency and high performance

Published: 07 August 2017 Publication History

Abstract

Modern datacenter networks provide very high capacity via redundant Clos topologies and low switch latency, but transport protocols rarely deliver matching performance. We present NDP, a novel data-center transport architecture that achieves near-optimal completion times for short transfers and high flow throughput in a wide range of scenarios, including incast. NDP switch buffers are very shallow and when they fill the switches trim packets to headers and priority forward the headers. This gives receivers a full view of instantaneous demand from all senders, and is the basis for our novel, high-performance, multipath-aware transport protocol that can deal gracefully with massive incast events and prioritize traffic from different senders on RTT timescales. We implemented NDP in Linux hosts with DPDK, in a software switch, in a NetFPGA-based hardware switch, and in P4. We evaluate NDP's performance in our implementations and in large-scale simulations, simultaneously demonstrating support for very low-latency and high throughput.

Supplementary Material

WEBM File (rearchitectingdatacenternetworksandstacksforlowlatencyandhighperformance.webm)

References

[1]
M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In Proc. ACM SIGCOMM, Aug. 2010.
[2]
M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic flow scheduling for data center networks. In Proc. Usenix NSDI, 2010.
[3]
M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, V. T. Lam, F. Matus, R. Pan, N. Yadav, and G. Varghese. CONGA: Distributed Congestion-aware Load Balancing for Datacenters. In Proc. ACM SIGCOMM 2014, pages 503--514.
[4]
M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data center TCP (DCTCP). In Proc. ACM SIGCOMM, Aug. 2010.
[5]
M. Alizadeh, A. Kabbani, T. Edsall, B. Prabhakar, A. Vahdat, and M. Yasuda. Less is more: trading a little bandwidth for ultra-low latency in the data center. In Proc. Usenix NSDI, pages 253--266, 2012.
[6]
M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pFabric: Minimal near-optimal datacenter transport. In Proc. ACM SIGCOMM 2013.
[7]
T. Benson, A. Akella, and D. A. Maltz. Network traffic characteristics of data centers in the wild. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 267--280. ACM, 2010.
[8]
R. Braden. RFC 1644: T/TCP -- TCP extensions for transactions functional specification. Technical report, RFC Editor, July 1994.
[9]
P. Cheng, F. Ren, R. Shu, and C. Lin. Catch the whole lot in an action: Rapid precise packet loss notification in data centers. In Proc. Usenix NSDI, 2014.
[10]
Y. Cheng, J. Chu, S. Radhakrishnan, and A. Jain. RFC 7413: TCP fast open. Technical report, RFC Editor, Dec. 2014.
[11]
J. Chu, N. Dukkipati, Y. Cheng, and M. Mathis. RFC 6928: Increasing TCP's initial window. Technical report, RFC Editor, Apr. 2013.
[12]
A. Dixit, P. Prakash, Y. Hu, and R. Kompella. On the impact of packet spraying in data center networks. In Proc. IEEE INFOCOM 2013, 2013.
[13]
DPDK Data Plane Development Kit. http://dpdk.org. Accessed: 2017-01-27.
[14]
S. Floyd and V. Jacobson. Traffic phase effects in packet-switched gateways. SIGCOMM Comput. Commun. Rev., 21(2):26--42, Apr. 1991.
[15]
S. Floyd and J. Kempf. RFC 3714: IAB concerns regarding congestion control for voice traffic in the internet. Technical report, RFC Editor, Mar. 2004.
[16]
P. X. Gao, A. Narayan, G. Kumar, R. Agarwal, S. Ratnasamy, and S. Shenker. pHost: Distributed Near-optimal Datacenter Transport Over Commodity Network Fabric. In Proc. ACM CoNEXT, 2015.
[17]
A. Greenberg el al. VL2: a scalable and flexible data center network. In Proc. ACM SIGCOMM, Aug. 2009.
[18]
R. Griffith, Y. Chen, J. Liu, A. Joseph, and R. Katz. Understanding TCP incast throughput collapse in datacenter networks. In Proc. WREN Workshop, 2009.
[19]
C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. Bcube: A high performance, server-centric network architecture for modular data centers. In Proc. ACM SIGCOMM 2009.
[20]
C. Guo, H. Wu, Z. Deng, G. Soni, J. Ye, J. Padhye, and M. Lipshteyn. Rdma over commodity ethernet at scale. In Proc. ACM SIGCOMM 2016, pages 202--215.
[21]
K. He, E. Rozner, K. Agarwal, W. Felter, J. Carter, and A. Akella. Presto: Edge-based load balancing for fast datacenter networks. In Proc. ACM SIGCOMM 2015, pages 465--478.
[22]
C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing flows quickly with preemptive scheduling. In Proc. ACM SIGCOMM 2012.
[23]
IEEE DCB. 802.3bd - MAC Control Frame for Priority-based Flow Control Project. http://www.ieee802.org/3/bd/, 2010. Superseding IEEE 802.3x Full Duplex and Flow Control.
[24]
IEEE DCB. 802.1Qbb - Priority-based Flow Control. http://www.ieee802.org/1/pages/802.1bb.html, 2011.
[25]
Infiniband Trade Association. RoCEv2. https://cw.infinibandta.org/document/dl/7781, Sept. 2014.
[26]
V. Jacobson and M. J. Karels. Congestion avoidance and control. In Proc. ACM SIGCOMM, Stanford, CA, Aug. 1988.
[27]
C. Kent and J. Mogul. Fragmentation considered harmful. In Proc. ACM SIGCOMM, Aug. 1987.
[28]
R. Mittal, V. T. Lam, N. Dukkipati, E. Blem, H. Wassel, M. Ghobadi, A. Vahdat, Y. Wang, D. Wetherall, and D. Zats. Timely: Rtt-based congestion control for the datacenter. In Proce. ACM SIGCOMM 2015, pages 537--550.
[29]
The P4 Language Consortium. P416 language specification version 1.0.0. 2016.
[30]
J. Perry, A. Ousterhout, H. Balakrishnan, D. Shah, and H. Fugal. Fastpass: A centralized "zero-queue" datacenter network. In Proc. ACM SIGCOMM 2014.
[31]
C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley. Improving datacenter performance and robustness with Multipath TCP. In Proc. ACM SIGCOMM, Aug. 2011.
[32]
K. Ramakrishnan, S. Floyd, and D. Black. RFC 3168: the addition of explicit congestion notification (ECN) to IP. Technical report, RFC Editor, Sept. 2001.
[33]
A. Romanow and S. Floyd. Dynamics of TCP traffic over ATM networks. In Proc. ACM SIGCOMM, London, 1994.
[34]
A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren. Inside the social network's (datacenter) network. In Proc. ACM SIGCOMM 2015, pages 123--137.
[35]
S. Sen, D. Shue, S. Ihm, and M. J. Freedman. Scalable, optimal flow routing in datacenters via local link balancing. In Proc. ACM CoNEXT 2013, pages 151--162.
[36]
A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey. Jellyfish: Networking data centers randomly. In Proc. Usenix NSDI 2012.
[37]
B. Vamanan, J. Hasan, and T. Vijaykumar. Deadline-aware datacenter tcp (d2tcp). ACM SIGCOMM Computer Communication Review, 42(4):115--126, 2012.
[38]
V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger, G. A. Gibson, and B. Mueller. Safe and effective fine-grained tcp retransmissions for datacenter communication. In Proc.ACM SIGCOMM 2009, pages 303--314.
[39]
C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better never than late: Meeting deadlines in datacenter networks. In Proc. SIGCOMM '11, 2011.
[40]
Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang. Congestion control for large-scale rdma deployments. In Proc. ACM SIGCOMM 2015, pages 523--536.
[41]
N. Zilberman, Y. Audzevich, G. A. Covington, and A. W. Moore. NetFPGA SUME: Toward 100 Gbps as research commodity. Micro, 34(5), 2014.

Cited By

View all
  • (2025)A multilevel network-assisted congestion feedback mechanism for network congestion controlComputers and Electrical Engineering10.1016/j.compeleceng.2025.110067123(110067)Online publication date: Apr-2025
  • (2024)Towards domain-specific network transport for distributed DNN trainingProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691904(1421-1443)Online publication date: 16-Apr-2024
  • (2024)Multitenant in-network acceleration with SwitchVMProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691863(691-708)Online publication date: 16-Apr-2024
  • Show More Cited By

Index Terms

  1. Re-architecting datacenter networks and stacks for low latency and high performance

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data Communication
      August 2017
      515 pages
      ISBN:9781450346535
      DOI:10.1145/3098822
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 August 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Datacenters
      2. Network Stacks
      3. Transport Protocols

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      SIGCOMM '17
      Sponsor:
      SIGCOMM '17: ACM SIGCOMM 2017 Conference
      August 21 - 25, 2017
      CA, Los Angeles, USA

      Acceptance Rates

      Overall Acceptance Rate 462 of 3,389 submissions, 14%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2,007
      • Downloads (Last 6 weeks)237
      Reflects downloads up to 25 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)A multilevel network-assisted congestion feedback mechanism for network congestion controlComputers and Electrical Engineering10.1016/j.compeleceng.2025.110067123(110067)Online publication date: Apr-2025
      • (2024)Towards domain-specific network transport for distributed DNN trainingProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691904(1421-1443)Online publication date: 16-Apr-2024
      • (2024)Multitenant in-network acceleration with SwitchVMProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691863(691-708)Online publication date: 16-Apr-2024
      • (2024)HarmonyProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691845(329-343)Online publication date: 16-Apr-2024
      • (2024)A large-scale deployment of DCTCPProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691839(239-252)Online publication date: 16-Apr-2024
      • (2024)Revisiting congestion control for lossless ethernetProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691833(131-148)Online publication date: 16-Apr-2024
      • (2024)Flow scheduling with imprecise knowledgeProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691831(95-111)Online publication date: 16-Apr-2024
      • (2024)Congestion Control Mechanism Based on Backpressure Feedback in Data Center NetworksFuture Internet10.3390/fi1604013116:4(131)Online publication date: 15-Apr-2024
      • (2024)Network Data Plane Programming Languages: A SurveyComputers10.3390/computers1312031413:12(314)Online publication date: 26-Nov-2024
      • (2024)Opportunistic Packet Forwarding for Proactive Transport in Datacenters2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619903(1-9)Online publication date: 3-Jun-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media