Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2523616.2523620acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Small is better: avoiding latency traps in virtualized data centers

Published: 01 October 2013 Publication History
  • Get Citation Alerts
  • Abstract

    Public clouds have become a popular platform for building Internet-scale applications. Using virtualization, public cloud services grant customers full control of guest operating systems and applications, while service providers still retain the management of their host infrastructure. Because applications built with public clouds are often highly sensitive to response time, infrastructure builders strive to reduce the latency of their data center's internal network. However, most existing solutions require modification to the software stack controlled by guests. We introduce a new host-centric solution for improving latency in virtualized cloud environments. In this approach, we extend a classic scheduling principle---Shortest Remaining Time First---from the virtualization layer, through the host network stack, to the network switches. Experimental and simulation results show that our solution can reduce median latency of small flows by 40%, with improvements in the tail of almost 90%, while reducing throughput of large flows by less than 3%.

    References

    [1]
    Code patch for this paper. http://goo.gl/yYjU9.
    [2]
    M. Al-Fares, A. Loukissas, and A. Vahdat. A Scalable, Commodity Data Center Network Architecture. In Proceedings of the ACM SIGCOMM 2008 conference, Seattle, WA, USA, August 2008.
    [3]
    M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data Center TCP (DCTCP). In Proceedings of the ACM SIGCOMM 2010 conference, New Delhi, India, August 2010.
    [4]
    M. Alizadeh, A. Kabbani, T. Edsall, B. Prabhakar, A. Vahdat, and M. Yasuda. Less is More: Trading a little Bandwidth for Ultra-Low Latency in the Data Center. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (NSDI'12), San Jose, CA, USA, April 2012.
    [5]
    M. Alizadeh, S. Yang, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. Deconstructing Datacenter Packet Transport. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks (HotNets'12), Redmond, WA, USA, October 2012.
    [6]
    Amazon Web Services LLC. Amazon Elastic Compute Cloud (EC2). http://aws.amazon.com/ec2/.
    [7]
    G. Appenzeller, I. Keslassy, and N. McKeown. Sizing Router Buffers. In Proceedings of the ACM SIGCOMM 2004 conference, Portland, OR, USA, August 2004.
    [8]
    H. Ballani, K. Jang, T. Karagiannis, C. Kim, D. Gunawardena, and G. O'Shea. Chatty Tenants and the Cloud Network Sharing Problem. In Proceedings of the 10th USENIX Symposium on Networked System Design and Implementation (NSDI'13), Lombard, IL, April 2013.
    [9]
    P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the Art of Virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03), Bolton Landing, NY, USA, October 2003.
    [10]
    S. K. Barker and P. Shenoy. Empirical Evaluation of Latency-sensitive Application Performance in the Cloud. In Proceedings of the 1st annual ACM SIGMM conference on Multimedia systems (MMSys'10), Scottsdale, AZ, USA, February 2010.
    [11]
    T. Benson, A. Akella, and D. A. Maltz. Network Traffic Characteristics of Data Centers in the Wild. In Proceedings of the 2010 Internet Measurement Conference (IMC'10), Melbourne, Australia, November 2010.
    [12]
    S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An Analysis of Linux Scalability to Many Cores. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI'10), Vancouver, BC, Canada, October 2010.
    [13]
    P. Chen and B. Noble. When Virtual Is Better Than Real. In Proceedings of the 8th Workshop on Hot Topics in Operating Systems (HotOS'01), Washington, DC, USA, May 2001.
    [14]
    L. Cheng and C.-L. Wang. vBalance: Using Interrupt Load Balance to Improve I/O Performance for SMP Virtual Machines. In Proceedings of ACM Symposium on Cloud Computing 2012 (SoCC'12), San Jose, CA, USA, October 2012.
    [15]
    G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-Value Store. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP'07), Stevenson, WA, USA, October 2007.
    [16]
    G. W. Dunlap. Scheduler Development Update. In Xen Summit Asia 2009, Shanghai, China, November 2009.
    [17]
    S. Govindan, A. R. Nath, A. Das, B. Urgaonkar, and A. Sivasubramaniam. Xen and Co.: Communication-Aware CPU Scheduling for Consolidated Xen-based Hosting Platforms. In Proceedings of the 3rd international conference on Virtual execution environments (VEE'07), San Diego, CA, 2007, June 2007.
    [18]
    L. Guo and I. Matta. The War Between Mice and Elephants. In Proceedings of the Ninth International Conference on Network Protocols (ICNP'01), Riverside, CA, USA, November 2001.
    [19]
    T. J. Hacker, B. D. Noble, and B. D. Athey. Improving Throughput and Maintaining Fairness Using Parallel TCP. In Proceedings of the 23rd conference on Information communications (INFOCOM'04), Hong Kong, China, March 2004.
    [20]
    M. Harchol-Balter, B. Schroeder, N. Bansal, and M. Agrawal. Size-Based Scheduling to Improve Web Performance. ACM Transactions on Computer Systems, 21(2): 207--233, May 2003.
    [21]
    J. L. Henning. Spec cpu2006 benchmark descriptions. ACM SIGARCH Computer Architecture News, 34, September 2006.
    [22]
    T. Herbert. bql: Byte Queue Limits. http://lwn.net/Articles/454378/.
    [23]
    C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing Flows Quickly with Preemptive Scheduling. In Proceedings of the ACM SIGCOMM 2012 conference, Helsinki, Finland, August 2012.
    [24]
    Y. Hu, X. Long, J. Zhang, J. He, and L. Xia. I/O Scheduling Model of Virtual Machine Based on Multi-core Dynamic Partitioning. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC'10), hicago, IL, USA, June 2010.
    [25]
    Intel LAN Access Division. Intel VMDq Technology. Technical report, Intel, March 2008.
    [26]
    V. Jeyakumar, M. Alizadeh, D. Mazières, B. Prabhakar, C. Kim, and A. Greenberg. EyeQ: Practical Network Performance Isolation at the Edge. In Proceedings of the 10th USENIX Symposium on Networked System Design and Implementation (NSDI'13), Lombard, IL, April 2013.
    [27]
    R. Kapoor, G. Porter, M. Tewari, G. M. Voelker, and A. Vahdat. Chronos: Predictable Low Latency for Data Center Applications. In Proceedings of ACM Symposium on Cloud Computing 2012 (SoCC'12), San Jose, CA, USA, October 2012.
    [28]
    H. Kim, H. Lim, J. Jeong, H. Jo, and J. Lee. Task-aware Virtual Machine Scheduling for I/O Performance. In Proceedings of the 5th international conference on virtual execution environments (VEE'09), Washington, DC, USA, March 2009.
    [29]
    M. Lee, A. S. Krishnakumar, P. Krishnan, N. Singh, and S. Yajnik. Supporting Soft Real-Time Tasks in the Xen Hypervisor. In Proceedings of the 6th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'10), Pittsburgh, PA, USA, March 2010.
    [30]
    A. Li, X. Yang, S. Kandula, and M. Zhang. CloudCmp: Comparing Public Cloud Providers. In Proceedings of the 2010 Internet Measurement Conference (IMC'10), Melbourne, Australia, November 2010.
    [31]
    B. Lin and P. A. Dinda. VSched: Mixing Batch And Interactive Virtual Machines Using Periodic Real-time Scheduling. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing (SC'05), Seattle, WA, November 2005.
    [32]
    K. Nichols and V. Jacobson. Controlling Queue Delay. Queue, 10(5): 20:20--20:34, May 2012.
    [33]
    NS-3. http://www.nsnam.org/.
    [34]
    D. A. Patterson. Latency Lags Bandwidth. Communication of ACM, 47(10): 71--75, Oct 2004.
    [35]
    L. Popa, G. Kumar, M. Chowdhury, A. Krishnamurthy, S. Ratnasamy, and I. Stoica. FairCloud: Sharing the Network in Cloud Computing. In Proceedings of the ACM SIGCOMM 2012 conference, Helsinki, Finland, August 2012.
    [36]
    T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. Hey, You, Get Off of My Cloud! Exploring Information Leakage in Third-Party Compute Clouds. In Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS'09), Chicago, IL, Nov. 2009.
    [37]
    S. M. Rumble, D. Ongaro, R. Stutsman, M. Rosenblum, and J. K. Ousterhout. It's Time for Low Latency. In Proceedings of the 13th Workshop on Hot Topics in Operating Systems (HotOS XIII), Napa, CA, USA, May 2011.
    [38]
    J. Schad, J. Dittrich, and J.-A. Quiané-Ruiz. Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance. In Proceedings of the 36th International Conference on Very Large Data Bases (VLDB'10), Singapore, September 2010.
    [39]
    L. E. Schrage and L. W. Miller. The Queue M/G/1 with the Shortest Remaining Processing Time Discipline. Operation Research, 14(4): 670--684, July--August 1966.
    [40]
    A. Shieh, S. Kandula, A. Greenberg, C. Kim, and B. Saha. Sharing the Data Center Network. In Proceedings of the 8th USENIX conference on Networked Systems Design and Implementation (NSDI'11), Boston, MA, USA, March 2011.
    [41]
    B. Vamanan, J. Hasan, and T. N. Vijaykumar. Deadline-Aware Datacenter TCP (D2TCP). In Proceedings of the ACM SIGCOMM 2012 conference, Helsinki, Finland, August 2012.
    [42]
    V. Varadarajan, T. Kooburat, B. Farley, T. Ristenpart, and M. M. Swift. Resource-Freeing Attacks: Improve Your Cloud Performance (at Your Neighbor's Expense). In Proceedings of the 19th ACM Conference on Computer and Communications Security (CCS'12), Raleigh, NC, USA, October 2012.
    [43]
    G. Wang and T. S. E. Ng. The Impact of Virtualization on Network Performance of Amazon EC2 Data Center. In Proceedings of the 29th conference on Information communications (INFOCOM'10), San Diego, CA, USA, March 2010.
    [44]
    C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better Never than Late: Meeting Deadlines in Datacenter Networks. In Proceedings of the ACM SIGCOMM 2011 conference, Toronto, ON, CA, August 2011.
    [45]
    www.bufferbloat.net. Best Practices for Benchmarking CoDel and FQ CoDel. http://goo.gl/2RhwY.
    [46]
    xen.org. Xen Credit Scheduler. http://wiki.xen.org/wiki/Credit_Scheduler.
    [47]
    S. Xi, J. Wilson, C. Lu, and C. Gill. RT-Xen: Towards Real-time Hypervisor Scheduling in Xen. In Proceedings of the 11th International Conference on Embedded Software (EMSOFT'11), Taipei, Taiwan, October 2011.
    [48]
    C. Xu, S. Gamage, H. Lu, R. Kompella, and D. Xu. vTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core. In Proceedings of the USENIX 2013 Annual Technical Conference (ATC'13), San Jose, CA, USA, June 2013.
    [49]
    C. Xu, S. Gamage, P. N. Rao, A. Kangarlou, R. Kompella, and D. Xu. vSlicer: Latency-Aware Virtual Machine Scheduling via Differentiated-Frequency CPU Slicing. In Proceedings of the 21st ACM International Symposium on High Performance Distributed Computing (HPDC'12), Delft, The Netherlands, June 2012.
    [50]
    Y. Xu, M. Bailey, F. Jahanian, K. Joshi, M. Hiltunen, and R. Schlichting. An Exploration of L2 Cache Covert Channels in Virtualized Environments. In Proceedings of the 3rd ACM Cloud Computing Security Workshop (CCSW'11), Chicago, IL, USA, October 2011.
    [51]
    Y. Xu, Z. Musgrave, B. Noble, and M. Bailey. Bobtail: Avoiding Long Tails in the Cloud. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI'13), Lombard, IL, April 2013.
    [52]
    D. Zats, T. Das, P. Mohan, D. Borthakur, and R. Katz. DeTail: Reducing the Flow Completion Time Tail in Datacenter Networks. In Proceedings of the ACM SIGCOMM 2012 conference, Helsinki, Finland, August 2012.

    Cited By

    View all
    • (2024)Suppressing the Interference Within a Datacenter: Theorems, Metric and StrategyIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.335441835:5(732-750)Online publication date: May-2024
    • (2023)Ah-Q: Quantifying and Handling the Interference within a Datacenter from a System Perspective2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071128(471-484)Online publication date: Feb-2023
    • (2022) A Labeled Architecture for Low-Entropy Clouds: Theory, Practice, and Lessons Intelligent Computing10.34133/2022/97954762022Online publication date: Jan-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing
    October 2013
    427 pages
    ISBN:9781450324281
    DOI:10.1145/2523616
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cloud computing
    2. latency
    3. virtualization

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SOCC '13
    Sponsor:
    SOCC '13: ACM Symposium on Cloud Computing
    October 1 - 3, 2013
    California, Santa Clara

    Acceptance Rates

    SOCC '13 Paper Acceptance Rate 23 of 114 submissions, 20%;
    Overall Acceptance Rate 169 of 722 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)5

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Suppressing the Interference Within a Datacenter: Theorems, Metric and StrategyIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.335441835:5(732-750)Online publication date: May-2024
    • (2023)Ah-Q: Quantifying and Handling the Interference within a Datacenter from a System Perspective2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071128(471-484)Online publication date: Feb-2023
    • (2022) A Labeled Architecture for Low-Entropy Clouds: Theory, Practice, and Lessons Intelligent Computing10.34133/2022/97954762022Online publication date: Jan-2022
    • (2022)Worker Assignment for Multiple Masters to Speed up Coded Distributed Computing in Heterogeneous ClustersIEEE Transactions on Services Computing10.1109/TSC.2022.3201550(1-16)Online publication date: 2022
    • (2022)Stability and Optimization of Speculative Queueing NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2021.312877830:2(911-922)Online publication date: Apr-2022
    • (2022)Virtualization-Aware Traffic Control for Soft Real-Time Network Traffic on XenIEEE/ACM Transactions on Networking10.1109/TNET.2021.311405530:1(257-270)Online publication date: Feb-2022
    • (2022)User-Centric Interference-Aware Load Balancing for Cloud-Deployed ApplicationsIEEE Transactions on Cloud Computing10.1109/TCC.2019.294356010:1(736-748)Online publication date: 1-Jan-2022
    • (2021)MPTCP under Virtual Machine Scheduling Impact2021 IEEE Global Communications Conference (GLOBECOM)10.1109/GLOBECOM46510.2021.9685569(1-6)Online publication date: 7-Dec-2021
    • (2020)Be Good Neighbors: A Novel Application Isolation Metric Used to Optimize the Initial Container Placement in CaaSIEEE Access10.1109/ACCESS.2020.30253388(178195-178207)Online publication date: 2020
    • (2019)Mitigating Tail Response Time of n-Tier ApplicationsACM Transactions on Internet Technology10.1145/334046219:3(1-25)Online publication date: 25-Jul-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media