Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3337821.3337874acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Network Congestion Avoidance through Packet-chaining Reservation

Published: 05 August 2019 Publication History

Abstract

Endpoint congestion is a bottleneck in high-performance computing (HPC) networks and severely impacts system performance, especially for latency-sensitive applications. For long messages (or flows) whose duration is far larger than the round-trip time (RTT), endpoint congestion can be effectively mitigated by proactive or reactive counter-measures such that the injection rate of each source is dynamically controlled to a proper level. However, many HPC applications produce a hybrid traffic, a mix of short and long messages, and are dominated by short messages. Existing proactive congestion avoidance methods face the great challenge of scheduling the rapidly changing traffic pattern caused by these short messages. In this paper, we leverage the advantages of proactive and reactive congestion avoidance techniques and propose the Packet-chaining Reservation Protocol (PCRP) to make a dynamic balance between flows following proactive scheduling and packets subjected to reactive network conditions. We select the chaining packets as a flexible reservation granularity between the whole flow and one packet. We allow small flows to be speculatively transmitted without being discarded and give them higher priority over the entire network. Our PCRP can respond quickly to network conditions and effectively avoid the formation of endpoint congestion and reduce the average flow delay. We conduct extensive experiments to evaluate our PCRP and compare it with the state-of-the-art proactive reservation-based protocols, Speculative Reservation Protocol (SRP) and Bilateral Flow Reservation Protocol (BFRP). The simulation results demonstrate that in our design the flow latency can be reduced by 50.2% for hotspot traffic and 28.38% for uniform traffic.

References

[1]
David Bailey, Tim Harris, William Saphir, Rob Van Der Wijngaart, Alex Woo, and Maurice Yarrow. 1995. The NAS parallel benchmarks 2.0. Technical Report.
[2]
Youmin Chen, Youyou Lu, and Jiwu Shu. 2019. Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing. In Proceedings of EuroSys'19. 19:1--19:14.
[3]
William James Dally and Brian Patrick Towles. 2004. Principles and practices of interconnection networks. Elsevier.
[4]
Joan-Lluis Ferrer, Elvira Baydal, Antonio Robles, Pedro López, and José Duato. 2008. On the influence of the packet marking and injection control schemes in congestion management for mins. In Proceedings of Euro-Par'08. 930--939.
[5]
Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, et al. 2018. Azure accelerated networking: SmartNICs in the public cloud. In Proceeding of NSDI'18. 51--66.
[6]
J Geetha, Uday Bhaskar, et al. 2018. An Analytical Approach for Optimizing the Performance of Hadoop Map Reduce Over RoCE. International Journal of Information Communication Technologies and Human Development 10 (2018), 1--14.
[7]
Tsuyoshi Hamada and Naohito Nakasato. 2005. InfiniBand Trade Association, InfiniBand Architecture Specification, Volume 1, Release 1.0. In Proceedings of FPL'05. 366--373.
[8]
Shan Huang, Dezun Dong, and Wei Bai. 2018. Congestion control in high-speed lossless data center networks: A survey. Future Generation Computer Systems 89 (2018), 360--374.
[9]
Nan Jiang, Daniel U Becker, George Michelogiannakis, and William J Dally. 2012. Network congestion avoidance through speculative reservation. In Proceedings of ISCA'12. 443--454.
[10]
Nan Jiang, Larry Dennison, and William J Dally. 2015. Network endpoint congestion control for fine-grained communication. In Proceedings of SC'15. 35:1--35:12.
[11]
Gwangsun Kim, Changhyun Kim, Jiyun Jeong, Mike Parker, and John Kim. 2016. Contention-based congestion management in large-scale networks. In Proceedings of MICRO'16. 30:1--30:13.
[12]
John Kim, Wiliam J Dally, Steve Scott, and Dennis Abts. 2008. Technology-driven, highly-scalable dragonfly topology. In Proceeding of ISCA'08. 77--88.
[13]
Cunlu Li, Dezun Dong, Zhonghai Lu, and Xiangke Liao. 2018. RoB-Router: A Reorder Buffer Enabled Low Latency Network-on-Chip Router. IEEE Transactions on Parallel and Distributed Systems 29 (2018), 2090--2104.
[14]
George Michelogiannakis, Nan Jiang, Daniel Becker, and William J Dally. 2011. Packet chaining: Efficient single-cycle allocation for on-chip networks. In Proceedings of MICRO'11. 83--94.
[15]
Gianina Alina Negoita, Glenn R Luecke, Marina Kraeva, Gurpur Prabhu, and James P Vary. 2017. The performance and scalability of the shmem and corresponding mpi-3 routines on a cray xc30. In Proceedings of ISPDC'17. 62--69.
[16]
Gregory F Pfister and V Alan Norton. 1985. âĂIJHot spotâĂİ contention and combining in multistage interconnection networks. IEEE Trans. Comput. 100 (1985), 943--948.
[17]
Kadangode Ramakrishnan, Sally Floyd, and David Black. 2001. The addition of explicit congestion notification (ECN) to IP. Technical Report.
[18]
KK Ramakrishnan and Raj Jain. 1990. A binary feedback scheme for congestion avoidance in computer networks. ACM Transactions on Computer Systems 8 (1990), 158--181.
[19]
Yuma Sakakibara, Shin Morishima, Kohei Nakamura, and Hiroki Matsutani. 2018. A hardware-based caching system on FPGA NIC for Blockchain. IEICE Transactions on Information and Systems 101 (2018), 1350--1360.
[20]
Jose Renato Santos, Yoshio Turner, and G Janakiraman. 2003. End-to-end congestion control for InfiniBand. In Proceedings of INFOCOM'03. 1123--1133.
[21]
Alexander Shpiner, Zachy Haramaty, Saar Eliad, Vladimir Zdornov, Barak Gafni, and Eitan Zahavi. 2017. Dragonfly+: Low cost topology for scaling datacenters. In Proceedings of HiPINEB'17. 1--8.
[22]
Brent Stephens, Aditya Akella, and Michael Swift. 2019. Loom: Flexible and Efficient NIC Packet Scheduling. In Proceedings of NSDI'19. 33--46.
[23]
Cunlu Li Liquan Xiao Tianye Yang, Dezun Dong. 2018. BFRP: Endpoint Congestion Avoidance Through Bilateral Flow Reservation. In Proceedings of IPCCC'18.
[24]
Asaf Valadarsky, Gal Shahaf, Michael Dinitz, and Michael Schapira. 2016. Xpander: Towards optimal-performance datacenters. In Proceedings of CoNEXT'16. 205--219.
[25]
Min Xie, Yutong Lu, Kefei Wang, Lu Liu, Hongjia Cao, et al. 2012. Tianhe-1a interconnect and message-passing services. IEEE Micro 32 (2012), 8--20.
[26]
Tianye Yang, Dezun Dong, Cunlu Li, and Liquan Xiao. 2018. CRSP: Network Congestion Control through Credit Reservation. In Proceeding of ISPA'18. 692--699.

Cited By

View all
  • (2024)AS-Router: A novel allocation service for efficient Network-on-ChipEngineering Science and Technology, an International Journal10.1016/j.jestch.2023.10160750(101607)Online publication date: Feb-2024
  • (2023)BFC+: Analysis and Improvement of the Congestion Control Algorithm BFC2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)10.1109/AINIT59027.2023.10212521(132-135)Online publication date: 16-Jun-2023
  • (2023)EagerCC: An ultra-low latency congestion control mechanism in datacenter networksComputer Networks10.1016/j.comnet.2023.110009236(110009)Online publication date: Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '19: Proceedings of the 48th International Conference on Parallel Processing
August 2019
1107 pages
ISBN:9781450362955
DOI:10.1145/3337821
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Flow completion time
  2. Interconnection network
  3. Low latency

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2019

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 04 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)AS-Router: A novel allocation service for efficient Network-on-ChipEngineering Science and Technology, an International Journal10.1016/j.jestch.2023.10160750(101607)Online publication date: Feb-2024
  • (2023)BFC+: Analysis and Improvement of the Congestion Control Algorithm BFC2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)10.1109/AINIT59027.2023.10212521(132-135)Online publication date: 16-Jun-2023
  • (2023)EagerCC: An ultra-low latency congestion control mechanism in datacenter networksComputer Networks10.1016/j.comnet.2023.110009236(110009)Online publication date: Nov-2023
  • (2022)MUA-Router: Maximizing the Utility-of-Allocation for On-chip Pipelining RoutersACM Transactions on Architecture and Code Optimization10.1145/351902719:3(1-23)Online publication date: 4-May-2022
  • (2022)Near-optimal sparse allreduce for distributed deep learningProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508399(135-149)Online publication date: 2-Apr-2022
  • (2022)FastCredit: Expediting credit-based congestion control in datacentersComputer Networks10.1016/j.comnet.2022.109126214(109126)Online publication date: Sep-2022
  • (2022)Revisiting network congestion avoidance through adaptive packet-chaining reservationComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2022.109008212:COnline publication date: 20-Jul-2022
  • (2021)Receiver-Driven Congestion Control for InfiniBandProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472466(1-10)Online publication date: 9-Aug-2021
  • (2021)SB-Router: A Swapped Buffer Activated Low Latency Network-on-Chip RouterIEEE Access10.1109/ACCESS.2021.31112949(126564-126578)Online publication date: 2021
  • (2021)Taming Congestion and Latency in Low-Diameter High-Performance DatacentersNetwork and Parallel Computing10.1007/978-3-030-93571-9_18(229-242)Online publication date: 3-Nov-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media