Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Congestion control in high-speed lossless data center networks: : A survey

Published: 01 December 2018 Publication History

Abstract

In data centers, packet losses cause high retransmission delays, which is harmful to many real-time workloads. To prevent packet losses, the lossless fabrics have been deployed in many production data centers. However, when network congestion happens, the lossless fabric also causes many problems like saturation tree and unfairness, which seriously degrade the performance of data center applications. Therefore, how to control the congestion in high-speed lossless data center networks is a significant problem.
In this paper, we first introduce link layer flow control schemes to provide lossless fabrics. Then we survey congestion control schemes in high-speed lossless data center networks. In particular, we classify existing congestion control schemes into two categories: reactive and proactive. Finally, we present the challenges and opportunities for future research in this area.

Highlights

This paper introduces the network congestion in high-speed lossless networks.
This paper presents the typical Link-layer Flow Control (LFC) schemes which can achieve network lossless.
This paper surveys several congestion control schemes which are deployed in lossless networks.
This paper compares the existing congestion control schemes and presents their merits and demerits.
This paper puts forward several challenges and opportunities of designing congestion control schemes for high-speed lossless DCNs.

References

[1]
Kandula S., Sengupta S., Greenberg A., Patel P., Chaiken R., The nature of data center traffic: measurements & analysis, in: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement Conference, ACM, 2009, pp. 202–208.
[2]
Greenberg A., Hamilton J., Maltz D.A., Patel P., The cost of a cloud: research problems in data center networks, ACM SIGCOMM Comput. Commun. Rev. 39 (1) (2008) 68–73.
[3]
Di S., Kondo D., Cappello F., Characterizing cloud applications on a google data center, in: Parallel Processing, ICPP, 2013 42nd International Conference on, IEEE, 2013, pp. 468–473.
[4]
Wang G., Ng T.E., The impact of virtualization on network performance of amazon ec2 data center, in: Infocom, 2010 Proceedings IEEE, IEEE, 2010, pp. 1–9.
[5]
Kalia A., Kaminsky M., Andersen D.G., Using rdma efficiently for key–value services, in: ACM SIGCOMM Computer Communication Review, vol. 44, no. 4, ACM, 2014, pp. 295–306.
[6]
C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, S. Yekhanin, et al. Erasure coding in windows azure storage, in: Usenix Annual Technical Conference, Boston, MA, 2012, pp. 15–26.
[7]
D’ambrosia J., 40 gigabit ethernet and 100 gigabit ethernet: The development of a flexible architecture [commentary], IEEE Commun. Mag. 47 (3) (2009).
[8]
Singh A., Ong J., Agarwal A., Anderson G., Armistead A., Bannon R., Boving S., Desai G., Felderman B., Germano P., Kanagala A., Provost J., Simmons J., Tanda E., Wanderer J., Hölzle U., Stuart S., Vahdat A., Jupiter rising: A decade of clos topologies and centralized control in google’s datacenter network, SIGCOMM Comput. Commun. Rev. 45 (4) (2015) 183–197.
[9]
Roy A., Zeng H., Bagga J., Porter G., Snoeren A.C., Inside the social network’s (datacenter) network, in: ACM SIGCOMM Computer Communication Review, vol. 45, no. 4, ACM, 2015, pp. 123–137.
[10]
Dean J., Ghemawat S., Mapreduce: simplified data processing on large clusters, Commun. ACM 51 (1) (2008) 107–113.
[11]
Abts D., Kim J., High performance datacenter networks: Architectures, algorithms, and opportunities, Synthesis Lect. Comput. Architect. 6 (1) (2011) 1–115.
[12]
Low Y., Bickson D., Gonzalez J., Guestrin C., Kyrola A., Hellerstein J.M., Distributed graphlab: a framework for machine learning and data mining in the cloud, Proc. VLDB Endowment 5 (8) (2012) 716–727.
[13]
Evangelinos C., Hill C., Cloud computing for parallel scientific hpc applications: Feasibility of running coupled atmosphere-ocean climate models on amazons ec2, Ratio 2 (2.40) (2008) 2–34.
[14]
Armbrust M., Fox A., Griffith R., Joseph A.D., Katz R., Konwinski A., Lee G., Patterson D., Rabkin A., Stoica I., et al., A view of cloud computing, Commun. ACM 53 (4) (2010) 50–58.
[15]
Bai W., Chen K., Hu S., Tan K., Xiong Y., Congestion control for high-speed extremely shallow-buffered datacenter networks, in: Proceedings of the First Asia-Pacific Workshop on Networking, ACM, 2017, pp. 29–35.
[16]
Alizadeh M., Greenberg A., Maltz D.A., Padhye J., Patel P., Prabhakar B., Sengupta S., Sridharan M., Data center tcp (dctcp), in: ACM SIGCOMM Computer Communication Review, vol. 40, no. 4, ACM, 2010, pp. 63–74.
[17]
Guo C., Wu H., Deng Z., Soni G., Ye J., Padhye J., Lipshteyn M., Rdma over commodity ethernet at scale, in: Proceedings of the Conference on ACM SIGCOMM 2016 Conference, ACM, 2016, pp. 202–215.
[18]
Pfister G.F., Norton V.A., Hot spot contention and combining in multistage interconnection networks, IEEE Trans. Comput. 100 (10) (1985) 943–948.
[19]
Zhu Y., Eran H., Firestone D., Guo C., Lipshteyn M., Liron Y., Padhye J., Raindel S., Yahia M.H., Zhang M., Congestion control for large-scale rdma deployments, in: ACM SIGCOMM Computer Communication Review, vol. 45, no. 4, ACM, 2015, pp. 523–536.
[20]
Afanasyev A., Tilley N., Reiher P., Kleinrock L., Host-to-host congestion control for tcp, IEEE Commun. Surv. Tutor. 12 (3) (2010) 304–342.
[21]
S. Liu, H. Xu, Z. Cai, Low latency datacenter networking: A short survey, 2013. ArXiv preprint arXiv:1312.3455.
[22]
Rojas-Cessa R., Kaymak Y., Dong Z., Schemes for fast transmission of flows in data center networks, IEEE Commun. Surv. Tutor. 17 (3) (2015) 1391–1422.
[23]
M. Noormohammadpour, C.S. Raghavendra, Datacenter traffic control: Understanding techniques and trade-offs, 2017. ArXiv preprint arXiv:1712.03530.
[24]
Kim G., Kim C., Jeong J., Parker M., Kim J., Contention-based congestion management in large-scale networks, in: Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, IEEE, 2016, pp. 1–13.
[25]
Michelogiannakis G., Jiang N., Becker D., Dally W.J., Channel reservation protocol for over-subscribed channels and destinations, in: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ACM, 2013, p. 52.
[26]
Garcia P.J., Quiles F.J., Flich J., Duato J., Johnson I., Naven F., Efficient, scalable congestion management for interconnection networks, IEEE Micro 26 (5) (2006) 52–66.
[27]
Dukkipati N., McKeown N., Why flow-completion time is the right metric for congestion control, ACM SIGCOMM Comput. Commun. Rev. 36 (1) (2006) 59–62.
[28]
Hong C.-Y., Caesar M., Godfrey P., Finishing flows quickly with preemptive scheduling, in: Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, ACM, 2012, pp. 127–138.
[29]
Munir A., Qazi I.A., Qaisar S.B., On achieving low latency in data centers, in: Communications, ICC, 2013 IEEE International Conference on, IEEE, 2013, pp. 3721–3725.
[30]
Jiang N., Dennison L., Dally W.J., Network endpoint congestion control for fine-grained communication, in: High Performance Computing, Networking, Storage and Analysis, 2015 SC-International Conference for, IEEE, 2015, pp. 1–12.
[31]
Bonald T., Massoulié L., Proutiere A., Virtamo J., A queueing analysis of max–min fairness, proportional fairness and balanced fairness, Queueing Syst. 53 (1) (2006) 65–84.
[32]
Lan T., Kao D., Chiang M., Sabharwal A., An Axiomatic Theory of Fairness in Network Resource Allocation, IEEE, 2010.
[33]
Greenberg A., Lahiri P., Maltz D.A., Patel P., Sengupta S., Towards a next generation data center architecture: scalability and commoditization, in: Proceedings of the ACM Workshop on Programmable Routers for Extensible Services of Tomorrow, ACM, 2008, pp. 57–62.
[34]
Dally W.J., Towles B.P., Principles and Practices of Interconnection Networks, Elsevier, 2004, pp. 245–250.
[35]
Kung H., Blackwell T., Chapman A., Credit-based flow control for atm networks: credit update protocol, adaptive credit allocation and statistical multiplexing, in: ACM SIGCOMM Computer Communication Review, vol. 24, no. 4, ACM, 1994, pp. 101–114.
[36]
T. Blackwell, K. Chang, H. Kung, D. Lin, Credit-based flow control for atm networks, in these Proceedings, 1994.
[37]
Pfister G.F., An introduction to the infiniband architecture, in: High Performance Mass Storage and Parallel I/O, vol. 42, 2001, pp. 617–632.
[38]
Mayhew D., Krishnan V., Pci express and advanced switching: evolutionary path to building next generation interconnects, in: High Performance Interconnects, 2003 Proceedings. 11th Symposium on, IEEE, 2003, pp. 21–29.
[39]
Jiang W., Ren F., Wang J., Survey on link layer congestion management of lossless switching fabric, Comput. Standards Interfaces (2017).
[40]
H. Barrass, et al. Proposal for priority based flow control, vol. 2, 2008, pp. 1–9.
[41]
H. Barrass, et al. Definition for new pause function, 2007.
[42]
D.R. Pannell, Network switch with head of line input buffer queue clearing, Jan. 21 2003, uS Patent 6,510,138.
[43]
D. Lee, S.J. Golestani, M.J. Karol, Prevention of deadlocks and livelocks in lossless, backpressured packet networks, Feb. 22 2005, uS Patent 6,859,435.
[44]
Alizadeh M., Kabbani A., Edsall T., Prabhakar B., Vahdat A., Yasuda M., Less is more: trading a little bandwidth for ultra-low latency in the data center, in: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, USENIX Association, 2012, pp. 19–19.
[45]
Bai W., Chen L., Chen K., Wu H., Enabling ecn in multi-service multi-queue data centers, in: NSDI, 2016, pp. 537–549.
[46]
C. Lee, C. Park, K. Jang, S.B. Moon, D. Han, Accurate latency-based congestion feedback for datacenters, in: USENIX Annual Technical Conference, 2015, pp. 403–415.
[47]
Brakmo L.S., O’Malley S.W., Peterson L.L., TCP Vegas: New Techniques for Congestion Detection and Avoidance, vol. 24, no. 4, ACM, 1994.
[48]
Alizadeh M., Yang S., Sharif M., Katti S., McKeown N., Prabhakar B., Shenker S., pfabric: Minimal near-optimal datacenter transport, in: ACM SIGCOMM Computer Communication Review, vol. 43, no. 4, ACM, 2013, pp. 435–446.
[49]
Fall K., Floyd S., Simulation-based comparisons of tahoe, reno and sack tcp, ACM SIGCOMM Comput. Commun. Rev. 26 (3) (1996) 5–21.
[50]
Newman P., Traffic management for atm local area networks, IEEE Commun. Mag. 32 (8) (1994) 44–50.
[51]
Newman P., Backward explicit congestion notification for atm local area networks, in: Global Telecommunications Conference, 1993, Including a Communications Theory Mini-Conference. Technical Program Conference Record, IEEE in Houston. GLOBECOM’93., IEEE, IEEE, 1993, pp. 719–723.
[52]
R. Pan, B. Prabhakar, A. Laxmikantha, Qcn: Quantized congestion notification, IEEE802, vol. 1, 2007.
[53]
Alizadeh M., Atikoglu B., Kabbani A., Lakshmikantha A., Pan R., Prabhakar B., Seaman M., Data center transport mechanisms: Congestion control theory and ieee standardization, in: Communication, Control, and Computing, 2008 46th Annual Allerton Conference on, IEEE, 2008, pp. 1270–1277.
[54]
Kabbani A., Alizadeh M., Yasuda M., Pan R., Prabhakar B., Af-qcn: Approximate fairness with quantized congestion notification for multi-tenanted data centers, in: High Performance Interconnects, HOTI, 2010 IEEE 18th Annual Symposium on, IEEE, 2010, pp. 58–65.
[55]
Pan R., Breslau L., Prabhakar B., Shenker S., Approximate fairness through differential dropping:(summary), ACM SIGCOMM Comput. Commun. Rev. 32 (1) (2002) 72–72.
[56]
Pan R., Breslau L., Prabhakar B., Shenker S., Approximate fairness through differential dropping, ACM SIGCOMM Comput. Commun. Rev. 33 (2) (2003) 23–39.
[57]
Chrysos N., Neeser F., Clauberg R., Crisan D., Valk K.M., Basso C., Minkenberg C., Gusat M., Unbiased quantized congestion notification for scalable server fabrics, IEEE Micro 36 (6) (2016) 50–58.
[58]
Gusat M., Crisan D., Minkenberg C., DeCusatis C., R3c2: reactive route and rate control for cee, in: High Performance Interconnects, HOTI, 2010 IEEE 18th Annual Symposium on, IEEE, 2010, pp. 50–57.
[59]
Siu K.-Y., Tzeng H.-Y., Intelligent congestion control for abr service in atm networks, ACM SIGCOMM Comput. Commun. Rev. 24 (5) (1994) 81–106.
[60]
K. Ramakrishnan, S. Floyd, A Proposal to Add Explicit Congestion Notification (Ecn) to Ip, Tech. Rep., 1998.
[61]
Floyd S., Tcp and explicit congestion notification, ACM SIGCOMM Comput. Commun. Rev. 24 (5) (1994) 8–23.
[62]
T. InfiniBand, Architecture specification, volume 1, release 1.2. 1, 2007.
[63]
Gran E.G., Reinemo S.-A., Lysne O., Skeie T., Zahavi E., Shainer G., Exploring the scope of the infiniband congestion control mechanism, in: Parallel &Amp; Distributed Processing Symposium, IPDPS, 2012 IEEE 26th International, IEEE, 2012, pp. 1131–1143.
[64]
Zats D., Das T., Mohan P., Borthakur D., Katz R., Detail: reducing the flow completion time tail in datacenter networks, ACM SIGCOMM Comput. Commun. Rev. 42 (4) (2012) 139–150.
[65]
I.T. Association, et al. Rocev2, 2014.
[66]
Mittal R., Dukkipati N., Blem E., Wassel H., Ghobadi M., Vahdat A., Wang Y., Wetherall D., Zats et al D., Timely: Rtt-based congestion control for the datacenter, in: ACM SIGCOMM Computer Communication Review, vol. 45 no. 4, ACM, 2015, pp. 537–550.
[67]
Perry J., Ousterhout A., Balakrishnan H., Shah D., Fugal H., Fastpass: A centralized zero-queue datacenter network, ACM SIGCOMM Comput. Commun. Rev. 44 (4) (2015) 307–318.
[68]
Perry J., Balakrishnan H., Shah D., Flowtune: flowlet control for datacenter networks, in: NSDI, 2017, pp. 421–435.
[69]
Gao P.X., Narayan A., Kumar G., Agarwal R., Ratnasamy S., Shenker S., phost: Distributed near-optimal datacenter transport over commodity network fabric, in: Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies, ACM, 2015, p. 1.
[70]
Wilson C., Ballani H., Karagiannis T., Rowtron A., Better never than late: Meeting deadlines in datacenter networks, in: ACM SIGCOMM Computer Communication Review, vol. 41, no. 4, ACM, 2011, pp. 50–61.
[71]
Jose L., Yan L., Alizadeh M., Varghese G., McKeown N., Katti S., High speed networks need proactive congestion control, in: Proceedings of the 14th ACM Workshop on Hot Topics in Networks, ACM, 2015, p. 14.
[72]
Zhang J., Ren F., Shu R., Cheng P., Tfc: token flow control in data center networks, in: Proceedings of the Eleventh European Conference on Computer Systems, ACM, 2016, p. 23.
[73]
Handley M., Raiciu C., Agache A., Voinescu A., Moore A.W., Antichi G., Wójcik M., Re-architecting datacenter networks and stacks for low latency and high performance, in: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, ACM, 2017, pp. 29–42.
[74]
Cho I., Jang K., Han D., Proceedings of the Conference of the, ACM Special Interest Group on Data Communication, ACM, 2017, pp. 239–252.
[75]
S. Sinha, S. Kandula, D. Katabi, Harnessing tcp’s burstiness with flowlet switching, in: Proc. 3rd ACM Workshop on Hot Topics in Networks, Hotnets-III, 2004.
[76]
Hwang F., Control algorithms for rearrangeable clos networks, IEEE Trans. Commun. 31 (8) (1983) 952–954.
[77]
Jiang N., Becker D.U., Michelogiannakis G., Dally W.J., Network congestion avoidance through speculative reservation, in: High Performance Computer Architecture, HPCA, 2012 IEEE 18th International Symposium on, IEEE, 2012, pp. 1–12.
[78]
Dally W.J., Virtual-channel flow control, IEEE Trans. Parallel Distrib. Syst. 3 (2) (1992) 194–205.
[79]
Greenberg A., Hamilton J.R., Jain N., Kandula S., Kim C., Lahiri P., Maltz D.A., Patel P., Sengupta S., Vl2: a scalable and flexible data center network, in: ACM SIGCOMM Computer Communication Review, vol. 39, no. 4, ACM, 2009, pp. 51–62.
[80]
Al-Fares M., Loukissas A., Vahdat A., A scalable, commodity data center network architecture, in: ACM SIGCOMM Computer Communication Review, vol. 38, no. 4, ACM, 2008, pp. 63–74.

Cited By

View all
  • (2024)COER: A Network Interface Offloading Architecture for RDMA and Congestion Control Protocol CodesignACM Transactions on Architecture and Code Optimization10.1145/366052521:3(1-26)Online publication date: 22-Apr-2024
  • (2023)GPU Cluster RDMA communication technology and congestion controlProceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things10.1145/3603781.3603876(541-547)Online publication date: 26-May-2023
  • (2023)Congestion Control for Datacenter Networks: A Control-Theoretic ApproachIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.325979934:5(1682-1696)Online publication date: 1-May-2023
  • Show More Cited By

Index Terms

  1. Congestion control in high-speed lossless data center networks: A survey
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Future Generation Computer Systems
          Future Generation Computer Systems  Volume 89, Issue C
          Dec 2018
          816 pages

          Publisher

          Elsevier Science Publishers B. V.

          Netherlands

          Publication History

          Published: 01 December 2018

          Author Tags

          1. High-speed lossless data center network (DCN)
          2. Network congestion
          3. Congestion control

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 06 Oct 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)COER: A Network Interface Offloading Architecture for RDMA and Congestion Control Protocol CodesignACM Transactions on Architecture and Code Optimization10.1145/366052521:3(1-26)Online publication date: 22-Apr-2024
          • (2023)GPU Cluster RDMA communication technology and congestion controlProceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things10.1145/3603781.3603876(541-547)Online publication date: 26-May-2023
          • (2023)Congestion Control for Datacenter Networks: A Control-Theoretic ApproachIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.325979934:5(1682-1696)Online publication date: 1-May-2023
          • (2022)DC4: Reconstructing Data-Credit-Coupled Congestion Control for Data CentersProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545023(1-11)Online publication date: 29-Aug-2022
          • (2019)ExpressPass+Proceedings of the ACM SIGCOMM 2019 Conference Posters and Demos10.1145/3342280.3342348(169-171)Online publication date: 19-Aug-2019
          • (2019)Network Congestion Avoidance through Packet-chaining ReservationProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337874(1-10)Online publication date: 5-Aug-2019

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media