Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3544216.3544235acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

dcPIM: near-optimal proactive datacenter transport

Published: 22 August 2022 Publication History

Abstract

Datacenter Parallel Iterative Matching (dcPIM) is a proactive data-center transport design that simultaneously achieves near-optimal tail latency for short flows and near-optimal network utilization, without requiring any specialized network hardware.
dcPIM places its intellectual roots in the classical PIM protocol, variations of which are used in almost all switch fabrics. The key technical result in dcPIM is a new theoretical analysis of the PIM protocol for the datacenter context: we show that, unlike switch fabrics where PIM requires log(n) rounds of control plane messages (for an n-port switch fabric) to guarantee near-optimal network utilization, the datacenter context enables PIM to guarantee near-optimal utilization with constant number of rounds (independent of the number of hosts in the datacenter)! dcPIM design builds upon insights gained from this analysis, and extends the PIM design to overcome the unique challenges introduced by datacenter networks (much larger scales and round trip times when compared to switch fabrics). We demonstrate, both theoretically and empirically, the near-optimality of dcPIM performance.

Supplementary Material

PDF File (p53-cai-supp.pdf)
Supplemental material.

References

[1]
Rachit Agarwal, Shijin Rajakrishnan, and David Shmoys. 2022. From Switch Scheduling to Datacenter Scheduling: Matching-Coordinated Greed Is Good. In ACMPODC.
[2]
Saksham Agarwal, Shijin Rajakrishnan, Akshay Narayan, Rachit Agarwal, David Shmoys, and Amin Vahdat. 2018. Sincronia: near-optimal network design for coflows. In ACM SIGCOMM.
[3]
Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. In ACM SIGCOMM.
[4]
Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, et al. 2014. CONGA: Distributed congestion-aware load balancing for datacenters. In ACM SIGCOMM.
[5]
Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data Center TCP (DCTCP). In ACM SIGCOMM.
[6]
Mohammad Alizadeh, Abdul Kabbani, Tom Edsall, Balaji Prabhakar, Amin Vahdat, and Masato Yasuda. 2012. Less is More: Trading a Little Bandwidth for Ultra-low Latency in the Data Center. In USENIX NSDI.
[7]
Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. 2013. pFabric: Minimal Near-optimal Data-center Transport. In ACM SIGCOMM.
[8]
Thomas E Anderson, Susan S Owicki, James B Saxe, and Charles P Thacker. 1993. High-speed switch scheduling for local-area networks. In ACM TOCS.
[9]
Theophilus Benson, Aditya Akella, and David A Maltz. 2010. Network traffic characteristics of data centers in the wild. In ACMIMC.
[10]
Qizhe Cai, Shubham Chaudhary, Midhul Vuppalapati, Jaehyun Hwang, and Rachit Agarwal. 2021. Understanding Host Network Stack Overheads. In ACM SIGCOMM.
[11]
Peng Cheng, Fengyuan Ren, Ran Shu, and Chuang Lin. 2014. Catch the Whole Lot in an Action: Rapid Precise Packet Loss Notification in Data Center. In USENIX NSDI.
[12]
Inho Cho, Keon Jang, and Dongsu Han. 2017. Credit-scheduled delay-bounded congestion control for datacenters. In ACM SIGCOMM.
[13]
Richard Cochran, Cristian Marinescu, and Christian Riesch. 2011. Synchronizing the Linux system time to a PTP hardware clock. In IEEE ISPCS.
[14]
Abhishek Dixit, Pawan Prakash, Yu Charlie Hu, and Ramana Rao Kompella. 2013. On The Impact Of Packet Spraying In Data Center Networks. In IEEE INFOCOM.
[15]
DPDK. 2022. Data Plane Development Kit. http://dpdk.org/. (2022).
[16]
Nathan Farrington, Erik Rubow, and Amin Vahdat. 2009. Data center switch architecture in the age of merchant silicon. In IEEE Symposium on High Performance Interconnects.
[17]
Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, et al. 2018. Azure accelerated networking: Smartnics in the public cloud. In USENIX NSDI.
[18]
Peter X Gao, Akshay Narayan, Gautam Kumar, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2015. phost: Distributed near-optimal datacenter transport over commodity network fabric. In ACM CoNEXT.
[19]
Monia Ghobadi, Ratul Mahajan, Amar Phanishayee, Nikhil Devanur, Janardhan Kulkarni, Gireeja Ranade, Pierre-Alexandre Blanche, Houman Rastegarfar, Madeleine Glick, and Daniel Kilper. 2016. Projector: Agile reconfigurable data center interconnect. In ACM SIGCOMM.
[20]
Albert Greenberg, James R Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A Maltz, Parveen Patel, and Sudipta Sengupta. 2009. VL2: a scalable and flexible data center network. In ACM SIGCOMM.
[21]
Matthew P Grosvenor, Malte Schwarzkopf, Ionel Gog, Robert NM Watson, Andrew W Moore, Steven Hand, and Jon Crowcroft. 2015. Queues don't matter when you can jump them!. In USENIX NSDI.
[22]
Mark Handley, Costin Raiciu, Alexandru Agache, Andrei Voinescu, Andrew Moore, Gianni Antichi, and Marcin Wojcik. 2017. Re-architecting datacenter networks and stacks for low latency and high performance. In ACM SIGCOMM.
[23]
Chi-Yao Hong, Matthew Caesar, and P Godfrey. 2012. Finishing flows quickly with preemptive scheduling. In ACM SIGCOMM.
[24]
Shuihai Hu, Wei Bai, Gaoxiong Zeng, Zilong Wang, Baochen Qiao, Kai Chen, Kun Tan, and Yi Wang. 2020. Aeolus: A Building Block for Proactive Transport in Datacenters. In ACM SIGCOMM.
[25]
Amos Israeli and Alon Itai. 1986. A fast and simple randomized parallela lgorithm for maximal matching. In Information Processing Letters.
[26]
Tara Javidi, Robert Magill, and Terry Hrabik. 2001. A high-throughput scheduling algorithm for a buffered crossbar switch fabric. In IEEE ICC.
[27]
Richard M Karp, Umesh V Vazirani, and Vijay V Vazirani. 1990. An optimal algorithm for on-line bipartite matching. In ACM STOC.
[28]
Yuliang Li, Rui Miao, Hongqiang Harry Liu, Yan Zhuang, Fei Feng, Lingbo Tang, Zheng Cao, Ming Zhang, Frank Kelly, Mohammad Alizadeh, and Minlan Yu. 2019. HPCC: High Precision Congestion Control. In ACM SIGCOMM.
[29]
Zvi Lotker, Boaz Patt-Shamir, and Seth Pettie. 2015. Improved distributed approximate matching. In ACM JACM.
[30]
M Ajmone Marsan, Andrea Bianco, Paolo Giaccone, Emilio Leonardi, and Fabio Neri. 2003. Multicast traffic in input-queued switches: optimal scheduling and maximum throughput. In IEEE/ACM ToN.
[31]
Nick McKeown. 1999. The iSLIP Scheduling Algorithm for Input-queued Switches. In IEEE/ACM ToN.
[32]
Nick McKeown, Adisak Mekkittikul, Venkat Anantharam, and Jean Walrand. 1999. Achieving 100% throughput in an input-queued switch. In IEEE Transactions on Communications.
[33]
Nick McKeown, Pravin Varaiya, and Jean Walrand. 1993. Scheduling cells in an input-queued switch. In Electronics Letters.
[34]
Behnam Montazeri, Yilong Li, Mohammad Alizadeh, and John Ousterhout. 2018. Homa: A Receiver-driven Low-latency Transport Protocol Using Network Priorities. In ACM SIGCOMM.
[35]
Ali Munir, Ghufran Baig, Syed M Irteza, Ihsan A Qazi, Alex X Liu, and Fahad R Dogar. 2014. Friends, not foes: synthesizing existing transport strategies for data center networks. In ACM SIGCOMM.
[36]
Jonathan Perry, Amy Ousterhout, Hari Balakrishnan, Devavrat Shah, and Hans Fugal. 2015. Fastpass: A centralized zero-queue datacenter network. In ACM SIGCOMM.
[37]
Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C Snoeren. 2015. Inside the social network's (datacenter) network. In ACM SIGCOMM.
[38]
Ahmed Saeed, Nandita Dukkipati, Vytautas Valancius, Carlo Contavalli, Amin Vahdat, et al. 2017. Carousel: Scalable Traffic Shaping at End Hosts. In ACM SIGCOMM.
[39]
Ahmed Saeed, Varun Gupta, Prateesh Goyal, Milad Sharif, Rong Pan, Mostafa Ammar, Ellen Zegura, Keon Jang, Mohammad Alizadeh, Abdul Kabbani, and Amin Vahdat. 2020. Annulus: A Dual Congestion Control Loop for Datacenter and WAN Traffic Aggregates. In ACM SIGCOMM.
[40]
Devavrat Shah, Paolo Giaccone, and Balaji Prabhakar. 2002. Efficient randomized algorithms for input-queued switch scheduling. In IEEE Micro.
[41]
Devavrat Shah and Damon Wischik. 2006. Optimal scheduling algorithms for input-queued switches. In IEEE INFOCOM.
[42]
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, et al. 2015. Jupiter rising: A decade of clos topologies and centralized control in google's datacenter network. In ACM SIGCOMM.
[43]
Robert Endre Tarjan. 1983. Data structures and network algorithms. Vol. 44. Siam.
[44]
George Varghese. 2005. Network Algorithmics: an interdisciplinary approach to designing fast networked devices. Morgan Kaufmann.
[45]
David Zats, Anand Padmanabha Iyer, Ganesh Ananthanarayanan, Rachit Agarwal, Randy Katz, Ion Stoica, and Amin Vahdat. 2015. FastLane: Making Short Flows Shorter with Agile Drop Notification. In ACM SoCC.
[46]
Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. 2015. Congestion control for large-scale RDMA deployments. In ACM SIGCOMM.
[47]
Noa Zilberman, Gabi Bracha, and Golan Schzukin. 2019. Stardust: Divide and Conquer in the Data Center Network. In USENIX NSDI.

Cited By

View all
  • (2024)CredenceProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691859(613-634)Online publication date: 16-Apr-2024
  • (2024)HarmonyProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691845(329-343)Online publication date: 16-Apr-2024
  • (2024)POSTER: Opportunistic Credit-Based Transport for Reconfigurable Data Center Networks with TidalProceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos10.1145/3672202.3673714(4-6)Online publication date: 4-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCOMM '22: Proceedings of the ACM SIGCOMM 2022 Conference
August 2022
858 pages
ISBN:9781450394208
DOI:10.1145/3544216
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2022

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. datacenter networks
  2. flow scheduling
  3. proactive transport

Qualifiers

  • Research-article

Funding Sources

  • Sloan fellowship
  • NSF
  • Google faculty research award

Conference

SIGCOMM '22
Sponsor:
SIGCOMM '22: ACM SIGCOMM 2022 Conference
August 22 - 26, 2022
Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)281
  • Downloads (Last 6 weeks)26
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)CredenceProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691859(613-634)Online publication date: 16-Apr-2024
  • (2024)HarmonyProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691845(329-343)Online publication date: 16-Apr-2024
  • (2024)POSTER: Opportunistic Credit-Based Transport for Reconfigurable Data Center Networks with TidalProceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos10.1145/3672202.3673714(4-6)Online publication date: 4-Aug-2024
  • (2024)Rethinking Transport Protocols for Reconfigurable Data Centers: An Empirical StudyProceedings of the 1st SIGCOMM Workshop on Hot Topics in Optical Technologies and Applications in Networking10.1145/3672201.3674120(7-13)Online publication date: 4-Aug-2024
  • (2024)LiteQUIC: Improving QoE of Video Streams by Reducing CPU Overhead of QUICProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681670(7918-7927)Online publication date: 28-Oct-2024
  • (2024)NegotiaToR: Towards A Simple Yet Effective On-demand Reconfigurable Datacenter NetworkProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672222(415-432)Online publication date: 4-Aug-2024
  • (2024)Pyxis: Scheduling Mixed Tasks in Disaggregated DatacentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.341862035:9(1536-1550)Online publication date: Sep-2024
  • (2024)Alleviating Congestion via Switch Design for Fair Buffer Allocation in DatacentersIEEE Transactions on Cloud Computing10.1109/TCC.2024.335759512:1(219-231)Online publication date: Jan-2024
  • (2024)Mild: A Zero-Wait Multi-Round Proactive Transport2024 IEEE Symposium on Computers and Communications (ISCC)10.1109/ISCC61673.2024.10733697(1-7)Online publication date: 26-Jun-2024
  • (2023)pUpdate: Priority-Based Scheduling for Continuous and Consistent Network Updates in SDN2023 19th International Conference on Network and Service Management (CNSM)10.23919/CNSM59352.2023.10327840(1-7)Online publication date: 30-Oct-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media