Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/GLOBECOM46510.2021.9685963guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Flow-level Adaptive Routing Scheme for RDMA enabled Dragonfly Network

Published: 07 December 2021 Publication History

Abstract

To minimize the number of expensive global links, Dragonfly topology is developed greatly in today's data centers. However, deploying Remote Direct Memory Access (RDMA) applications inside Dragonfly requires the network to provide a routing scheme running at the flow level to avoid packet disorder. The existing Dragonfly routing scheme uses queue length to estimate link load, which is not a reasonable criterion for flow-level routing. In this paper, we use the amount of remaining data of flows to estimate the flow completion time and propose our routing scheme, named Remaining Data-based Adaptive Load-balance (RDAL). We compare the performance of RDAL with another routing scheme at the flow level. Our simulation shows that for flow-level routing, RDAL provides improvements in both average flow completion time and saturation throughput, especially in the adversarial traffic pattern. At most, RDAL can increase saturation throughput by 12% and reduce average flow completion time by 34% than UGAL, the state-of-the-art routing scheme for Dragonfly,

References

[1]
C. Guo, H. Wu, Z. Deng, G. Soni, J. Ye, J. Padhye, and M. Lipshteyn, “RDMA over commodity ethernet at scale, ” in Proceedings of the ACM SIGCOMM 2016 Conference. ACM, 2016, pp. 202–215.
[2]
J. Kim, W. J. Dally, S. Scott, and D. Abts, “Technology-driven, highly-scalable dragonfly topology, ” in 35th International Symposium on Com-puter Architecture. IEEE Computer Society, 2008, pp. 77–88.
[3]
G. Faanes, A. Bataineh, D. Roweth, T. Court, E. Froese, R. Alverson, T. Johnson, J. Kopnick, M. Higgins, and J. Reinhard, “Cray cascade: a scalable HPC system based on a dragonfly network, ” in SC Conference on High Performance Computing Networking, Storage and Analysis. IEEE/ACM, 2012, p. 103.
[4]
J. J. Wilke and J. P. Kenny, “Opportunities and limitations of quality-of-service in message passing applications on adaptively routed dragonfly and fat tree networks, ” in IEEE International Conference on Cluster Computing. IEEE, 2020, pp. 109–118.
[5]
G. Maglione Mathey, J. Escudero-Sahuquillo, P. J. García, F. J. Quiles, and E. Zahavi, “Leveraging infiniband controller to configure deadlock-free routing engines for dragonflies, ” J. Parallel Distributed Comput., vol. 147, pp. 16–33, 2021.
[6]
M. García, E. Vallejo, R. Beivide, M. Odriozola, C. Camarero, M. Valero, G. Rodríguez, J. Labarta, and C. Minkenberg, “On-the-fly adaptive routing in high-radix hierarchical networks, ” in 41st International Conference on Parallel Processing. IEEE Computer Society, 2012, pp. 279–288.
[7]
M. S. Rahman, S. Bhowmik, Y. Ryasnianskiy, X. Yuan, and M. Lang, “Topology-custom UGAL routing on dragonfly, ” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 2019, pp. 17:1–17:15.
[8]
R. Mittal, V. T. Lam, N. Dukkipati, E. R. Blem, H. M. G. Wassel, M. Ghobadi, A. Vahdat, Y. Wang, D. Wetherall, and D. Zats, “TIMELY: rtt-based congestion control for the datacenter, ” in Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM 2015. ACM, 2015, pp. 537–550.
[9]
Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang, “Congestion control for large-scale RDMA deployments, ” in Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM 2015. ACM, 2015, pp. 523–536.
[10]
InfiniBandTM Architecture Specification Volume 2, InfiniBand Trade Association and others, 11 2016, release 1. 3.1.
[11]
N. Jiang, J. Kim, and W. J. Dally, “Indirect adaptive routing on large scale interconnection networks, ” in 36th International Symposium on Computer Architecture. ACM, 2009, pp. 220–231.
[12]
Omnet++ discrete event simulator.” [Online]. Available: http://omnetpp.org/
[13]
Inet framework.” [Online]. Available: http://omnetpp.org/
[14]
Y. Zhu, M. Ghobadi, V. Misra, and J. Padhye, “ECN or delay: Lessons learnt from analysis of DCQCN and TIMELY, ” in Proceedings of the 12th International on Conference on emerging Networking EXperiments and Technologies. ACM, 2016, pp. 313–327.
[15]
R. Mittal, A. Shpiner, A. Panda, E. Zahavi, A. Krishnamurthy, S. Rat-nasamy, and S. Shenker, “Revisiting network support for RDMA, ” in Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. ACM, 2018, pp. 313–326.
[16]
S. Abbasloo, Y. Xu, and H. J. Chao, “To schedule or not to schedule: When no-scheduling can beat the best-known flow scheduling algorithm in datacenter networks, ” Comput. Networks, vol. 172, p. 107177, 2020.

Index Terms

  1. Flow-level Adaptive Routing Scheme for RDMA enabled Dragonfly Network
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          2021 IEEE Global Communications Conference (GLOBECOM)
          Dec 2021
          3571 pages

          Publisher

          IEEE Press

          Publication History

          Published: 07 December 2021

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 13 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media