Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Revisiting TCP Congestion Control in a Virtual Cluster Environment

Published: 01 August 2016 Publication History

Abstract

Virtual machines VMs are widely adopted today to provide elastic computing services in datacenters, and they still heavily rely on TCP for congestion control. VM scheduling delays due to CPU sharing can cause frequent spurious retransmit timeouts RTOs. Using current detection methods, we find that such spurious RTOs cannot be effectively identified because of the retransmission ambiguity caused by the delayed ACK DelACK mechanism. Disabling DelACK would add significant CPU overhead to the VMs and thus degrade the network's performance. In this paper, we first report our practical experience about TCP's reaction to VM scheduling delays. We then provide an analysis of the problem that has two components corresponding to VM preemption on the sender side and the receiver side, respectively. Finally, we propose PVTCP, a ParaVirtualized approach to counteract the distortion of congestion information caused by the hypervisor scheduler. PVTCP is completely embedded in the guest OS and requires no modification in the hypervisor. Taking incast congestion as an example, we evaluate our solution in a 21-node testbed. The results show that PVTCP has high adaptability in virtualized environments and deals satisfactorily with the throughput collapse problem.

References

[1]
"Amazon EC2 instance types," [Online]. Available: http://aws.amazon.com/ec2/instance-types/
[2]
"ns-3 network simulator," [Online]. Available: https://www.nsnam.org/
[3]
"Xen credit scheduler," [Online]. Available: http://wiki.xen.org/wiki/credit_scheduler
[4]
"The architecture of VMware ESXi," VMware White Paper, 2008.
[5]
"Timekeeping in VMware virtual machines," VMware Information Guide, 2011.
[6]
"The CPU scheduler in VMware vSphere 5.1," VMware Technical White Paper, 2013.
[7]
I. F. Akyildiz, G. Morabito, and S. Palazzo, "TCP-Peach: A new congestion control scheme for satellite IP networks," IEEE/ACM Trans. Netw., vol. 9, no. 3, pp. 307-321, Jun. 2001.
[8]
M. Alizadeh et al., "Data center TCP (DCTCP)," in Proc. SIGCOMM, 2010, pp. 63-74.
[9]
M. Allmanm and V. Paxson, "On estimating end-to-end network path properties," in Proc. SIGCOMM, 1999, pp. 263-274.
[10]
P. Barham et al., "Xen and the art of virtualization," in Proc. SOSP, 2003, pp. 164-177.
[11]
S. K. Barker and P. Shenoy, "Empirical evaluation of latency-sensitive application performance in the cloud," in Proc. MMSys, 2010, pp. 35-46.
[12]
Y. Chen, R. Griffith, J. Liu, R. H. Katz, and A. D. Joseph, "Understanding TCP incast throughput collapse in datacenter networks," in Proc. WREN, 2009, pp. 73-82.
[13]
L. Cheng and C.-L. Wang, "vBalance: Using interrupt load balance to improve I/O performance for SMP virtual machines," in Proc. ACM SoCC, 2012, Art. no. 2.
[14]
L. Cheng, C.-L. Wang, and F. C. M. Lau, "PVTCP: Towards practical and effective congestion control in virtualized datacenters," in Proc. IEEE ICNP, 2013, pp. 1-10.
[15]
C. Clark et al., "Live migration of virtual machines," in Proc. NSDI, 2005.
[16]
J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," Commun. ACM, vol. 51, no. 1, pp. 107-113, 2008.
[17]
Y. Dong et al., "High performance network virtualization with SR-IOV," J. Parallel Distrib. Comput., vol. 72, no. 11, pp. 1471-1480, 2012.
[18]
S. Gamage, A. Kangarlou, R. R. Kompella, and D. Xu, "Opportunistic flooding to improve TCP transmit performance in virtualized clouds," in Proc. ACM SoCC, 2011, Art. no. 24.
[19]
T. Goff, J. Moronski, D. S. Phatak, and V. Gupta, "Freeze-TCP: A true end-to-end TCP enhancement mechanism for mobile environments," in Proc. IEEE INFOCOM, 2000, pp. 1537-1545.
[20]
Z. Hill, J. Li, M. Mao, A. Ruiz-Alvarez, and M. Humphrey, "Early observations on the performance of Windows Azure," in Proc. HPDC, 2010, pp. 367-376.
[21]
C.-Y. Hong, M. Caesar, and P. B. Godfrey, "Finishing flows quickly with preemptive scheduling," in Proc. SIGCOMM, 2012, pp. 127-138.
[22]
Y. Hu, X. Long, J. Zhang, J. He, and L. Xia, "I/O scheduling model of virtual machine based on multi-core dynamic partitioning," in Proc. HPDC, 2010, pp. 142-154.
[23]
V. Jacobson, "Congestion avoidance and control," in Proc. SIGCOMM, 1988, pp. 314-329.
[24]
A. Kangarlou, S. Gamage, R. R. Kompella, and D. Xu, "vSnoop: Improving TCP throughput in virtualized environments via acknowledgement offload," in Proc. SC, 2010, pp. 1-11.
[25]
B. Kim and J. Lee, "Retransmission loss recovery by duplicate acknowledgment counting," IEEE Commun. Lett., vol. 8, no. 1, pp. 69-71, Jan. 2004.
[26]
S. Kopparty, S. Krishnamurthy, M. Faloutsos, and S. Tripathi, "Split TCP for mobile ad hoc networks," in Proc. IEEE GLOBECOM, 2002, pp. 138-142.
[27]
E. Krevat et al., "On application-level approaches to avoiding TCP throughput collapse in cluster-based storage systems," in Proc. PDSW, 2007, pp. 1-4.
[28]
M. Lee, A. S. Krishnakumar, P. Krishnan, N. Singh, and S. Yajnik, "Supporting soft real-time tasks in the Xen hypervisor," in Proc. VEE, 2010, pp. 97-108.
[29]
J. Liu and S. Singh, "ATCP: TCP for mobile ad hoc networks," IEEE J. Sel. Areas Commun., vol. 19, no. 7, pp. 1300-1315, Jul. 2001.
[30]
R. Ludwig and R. H. Katz, "The Eifel algorithm: Making TCP robust against spurious retransmissions," Comput. Commun. Rev., vol. 30, no. 1, pp. 30-36, Jan. 2000.
[31]
D. Milojicic, F. Douglis, Y. Paindaveine, R. Wheeler, and S. Zhou, "Process migration," Comput. Surv., vol. 32, no. 3, pp. 241-299, 2000.
[32]
D. Nagle, D. Serenyi, and A. Matthews, "The Panasas ActiveScale storage cluster: Delivering scalable high bandwidth storage," in SC, 2004, p. 53.
[33]
R. Nishtala et al., "Scaling Memcache at Facebook," in Proc. NSDI, 2013, pp. 385-398.
[34]
A. Nordal, A. Kvalnes, and D. Johansen, "Paravirtualizing TCP," in Proc. VTDC Workshop, 2012, pp. 3-10.
[35]
D. Ongaro, A. L. Cox, and S. Rixner, "Scheduling I/O in virtual machine monitors," in Proc. VEE, 2008, pp. 1-10.
[36]
B. Pfaff et al., "Extending networking into the virtualization layer," in Proc. HotNets, 2009.
[37]
A. Phanishayee et al., "Measurement and analysis of TCP throughput collapse in cluster-based storage systems," in Proc. FAST, 2008, Art. no. 12.
[38]
M. Podlesny and C. Williamson, "An application-level solution for the TCP-incast problem in data center networks," in Proc. IWQoS, 2011, Art. no. 23.
[39]
M. Podlesny and C. Williamson, "Solving the TCP-incast problem with application-level scheduling," in Proc. MASCOTS, 2012, pp. 99-106.
[40]
P. Sarolahti, M. Kojo, and K. E. E. Raatikainen, "F-RTO: An enhanced recovery algorithm for TCP retransmission timeouts," Comput. Commun. Rev., vol. 33, no. 2, pp. 51-63, 2003.
[41]
P. Sarolahti and A. Kuznetsov, "Congestion control in Linux TCP," in Proc. USENIX ATC, 2002, pp. 49-62.
[42]
A. S.-W. Tam, K. Xi, Y. Xu, and H. J. Chao, "Preventing TCP incast throughput collapse at the initiation, continuation, and termination," in Proc. IWQoS, 2012, Art. no. 29.
[43]
Y. Tian, K. Xu, and N. Ansari, "TCP in wireless environments: Problems and solutions," IEEE Commun. Mag., vol. 43, no. 3, pp. S27-S32, Mar. 2005.
[44]
B. Vamanan, J. Hasan, and T. N. Vijaykumar, "Deadline-aware datacenter TCP (D2 TCP)," in Proc. SIGCOMM, 2012, pp. 115-126.
[45]
V. Vasudevan et al., "Safe and effective fine-grained TCP retransmissions for datacenter communication," in Proc. SIGCOMM, 2009, pp. 303-314.
[46]
G. Wang and T. Ng, "The impact of virtualization on network performance of Amazon EC2 data center," in Proc. IEEE INFOCOM, 2010, pp. 1-9.
[47]
C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron, "Better never than late: Meeting deadlines in datacenter networks," in Proc. SIGCOMM, 2011, pp. 50-61.
[48]
H. Wu, Z. Feng, C. Guo, and Y. Zhang, "ICTCP: Incast congestion control for TCP in data center networks," in Proc. CoNEXT, 2010, Art. no. 13.
[49]
C. Xu, S. Gamage, H. Lu, R. Kompella, and D. Xu, "vTurbo: Accelerating virtual machine I/O processing using designated turbo-sliced core," in Proc. USENIX ATC, 2013, pp. 243-254.
[50]
C. Xu et al., "vSlicer: Latency-aware virtual machine scheduling via differentiated-frequency CPU slicing," in HPDC, 2012, pp. 3-14.
[51]
M. Yu et al., "Profiling network performance for multi-tier data center applications," in Proc. NSDI, 2011, pp. 57-70.
[52]
J. Zhang, F. Ren, and C. Lin, "Modeling and understanding TCP incast in data center networks," in Proc. IEEE INFOCOM, 2011, pp. 1377-1385.
[53]
J. Zhang, F. Ren, L. Tang, and C. Lin, "Taming TCP incast throughput collapse in data center networks," in Proc. IEEE ICNP, 2013, pp. 1-10.
[54]
Y. Zhang and N. Ansari, "On mitigating TCP incast in data center networks," in Proc. IEEE INFOCOM Mini-Conf., 2011, pp. 51-55.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Networking
IEEE/ACM Transactions on Networking  Volume 24, Issue 4
August 2016
642 pages

Publisher

IEEE Press

Publication History

Published: 01 August 2016
Published in TON Volume 24, Issue 4

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 58
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media