Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Taming Congestion and Latency in Low-Diameter High-Performance Datacenters

  • Conference paper
  • First Online:
Network and Parallel Computing (NPC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13152))

Included in the following conference series:

  • 824 Accesses

Abstract

High-performance computing (HPC) and data centers are showing a trend of merging into high-performance data centers (HPDC). HPDC is committed to providing extremely low latency for HPC or data center workloads. In addition to adopting a low-diameter network topology, HPDC also requires a more advanced congestion control mechanism. This paper implements the state-of-the-art congestion control method in the data centers on the Dragonfly topology and proposes Bowshot, a fast and accurate congestion control method for low-latency HPDC. Bowshot uses fine-grained feedback to accurately describe the network state. It uses switch feedback, ACK-padding, and ACK-first to reduce feedback delay. Bowshot uses switch calculation to reduce the overhead of congestion control. As the large-scale evaluation shows, Bowshot reduced the average flow completion time (FCT) by 33% and the 99th percentile FCT by 45% compared to the state-of-the-art work. Bowshot reduces the feedback delay by 89%. In addition, Bowshot maintains higher throughput and a near-zero queue length.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. INET (2021). https://inet.omnetpp.org/

  2. OMNeT++ (2021). https://omnetpp.org

  3. Alizadeh, M., et al.: Data center TCP (DCTCP). In: Proceedings of the ACM SIGCOMM (2010)

    Google Scholar 

  4. Bailey, D., Harris, T., Saphir, W., Van Der Wijngaart, R., Woo, A., Yarrow, M.: The NAS parallel benchmarks 2.0. Technical report, Technical Report NAS-95-020, NASA Ames Research Center (1995)

    Google Scholar 

  5. Chunduri, S., et al.: GPCNeT: designing a benchmark suite for inducing and measuring contention in HPC networks. In: Proceedings of the SC (2019)

    Google Scholar 

  6. De Sensi, D., Di Girolamo, S., Hoefler, T.: Mitigating network noise on dragonfly networks through application-aware routing. In: Proceedings of the SC (2019)

    Google Scholar 

  7. De Sensi, D., Di Girolamo, S., McMahon, K., Roweth, D., Hoefler, T.: An in-depth analysis of the slingshot interconnect (September 2020)

    Google Scholar 

  8. Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)

    Article  Google Scholar 

  9. Faanes, G., et al.: Cray cascade: a scalable HPC system based on a dragonfly network. In: Proceedings of the SC (2012)

    Google Scholar 

  10. Floyd, S.: TCP and explicit congestion notification. ACM SIGCOMM Comput. Commun. Rev. 24(5), 8–23 (1994)

    Article  MathSciNet  Google Scholar 

  11. Jiang, N., Becker, D.U., Michelogiannakis, G., Dally, W.J.: Network congestion avoidance through speculative reservation. In: Proceedings of the ISCA (2012)

    Google Scholar 

  12. Jiang, N., Dennison, L., Dally, W.J.: Network endpoint congestion control for fine-grained communication. In: Proceedings of the SC (2015)

    Google Scholar 

  13. Kathareios, G., Minkenberg, C., Prisacari, B., Rodriguez, G., Hoefler, T.: Cost-effective diameter-two topologies: analysis and evaluation. In: Proceedings of the SC (2015)

    Google Scholar 

  14. Kim, C., Sivaraman, A., Katta, N., Bas, A., Dixit, A., Wobker, L.J.: In-band network telemetry via programmable dataplanes. In: Proceedings of the ACM SIGCOMM (2015)

    Google Scholar 

  15. Kim, J., Dally, W.J., Scott, S., Abts, D.: Technology-driven, highly-scalable dragonfly topology. In: Proceedings of the ISCA (2008)

    Google Scholar 

  16. Lee, C., Park, C., Jang, K., Moon, S., Han, D.: Accurate latency-based congestion feedback for datacenters. In: Proceedings of the USENIX ATC (2015)

    Google Scholar 

  17. Leiserson, C.E.: Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans. Comput. 100(10), 892–901 (1985)

    Article  Google Scholar 

  18. Li, Y., et al.: HPCC: high precision congestion control. In: Proceedings of the ACM SIGCOMM (2019)

    Google Scholar 

  19. Misra, P.A., Borge, M.F., Goiri, Í., Lebeck, A.R., Zwaenepoel, W., Bianchini, R.: Managing tail latency in datacenter-scale file systems under production constraints. In: Proceedings of the EuroSys (2019)

    Google Scholar 

  20. Mittal, R., et al.: Timely: RTT-based congestion control for the datacenter. In: Proceedings of the ACM SIGCOMM (2015)

    Google Scholar 

  21. Roy, A., Zeng, H., Bagga, J., Porter, G., Snoeren, A.C.: Inside the social network’s (datacenter) network. In: Proceedings of the ACM SIGCOMM (2015)

    Google Scholar 

  22. Santos, J.R., Turner, Y., Janakiraman, G.: End-to-end congestion control for infiniband. In: Proceedings of the IEEE INFOCOM (2003)

    Google Scholar 

  23. Thaler, P.: IEEE 802.1Qau congestion notification (2006)

    Google Scholar 

  24. Wu, K., Dong, D., Li, C., Huang, S., Dai, Y.: Network congestion avoidance through packet-chaining reservation. In: Proceedings of the ICPP (2019)

    Google Scholar 

  25. Zhou, R., Dong, D., Huang, S., Bai, Y.: FastTune: timely and precise congestion control in data center network. In: Proceedings of the IEEE ISPA (2021)

    Google Scholar 

  26. Zhou, R., Yuan, G., Dong, D., Huang, S.: APCC: agile and precise congestion control in datacenters. In: Proceedings of the IEEE ISPA (2020)

    Google Scholar 

  27. Zhu, Y., et al.: Congestion control for large-scale RDMA deployments. In: Proceedings of the ACM SIGCOMM (2015)

    Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their insightful comments. The work was supported by the National Key R&D Program of China under Grant No. 2018YFB0204300, Excellent Youth Foundation of Hunan Province (Dezun Dong) and National Postdoctoral Program for Innovative Talents Grant No. BX20190091.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dezun Dong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, R., Dong, D., Huang, S., Zhou, Z., Bai, Y. (2022). Taming Congestion and Latency in Low-Diameter High-Performance Datacenters. In: Cérin, C., Qian, D., Gaudiot, JL., Tan, G., Zuckerman, S. (eds) Network and Parallel Computing. NPC 2021. Lecture Notes in Computer Science(), vol 13152. Springer, Cham. https://doi.org/10.1007/978-3-030-93571-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93571-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93570-2

  • Online ISBN: 978-3-030-93571-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics