Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3433701.3433743acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Alita: comprehensive performance isolation through bias resource management for public clouds

Published: 09 November 2020 Publication History
  • Get Citation Alerts
  • Abstract

    The tenants of public cloud platforms share hardware resources on the same node, resulting in the potential for performance interference (or malicious attacks). A tenant is able to degrade the performance of its neighbors on the same node significantly through overuse of the shared memory bus, last level cache (LLC)/memory bandwidth, and power. To eliminate such unfairness we propose Alita, a runtime system consisting of an online interference identifier and adaptive interference eliminator. The interference identifier monitors hardware and system-level event statistics to identify resource polluters. The eliminator improves the performance of normal applications by throttling only the resource usage of polluters. Specifically, Alita adopts bus lock sparsification, bias LLC/bandwidth isolation, and selective power throttling to throttle the resource usage of polluters. Results for an experimental platform and in-production cloud platform with 30,000 nodes demonstrate that Alita significantly improves the performance of co-located virtual machines in the presence of resource polluters based on system-level knowledge.

    References

    [1]
    3ds max. www.autodesk.com/products/3ds-max.
    [2]
    Alibaba Cloud. www.alibabacloud.com.
    [3]
    Amazon AWS. www.aws.amazon.com.
    [4]
    Autodesk rendering. www.autodesk.com/products/rendering.
    [5]
    Azure batch rendering. www.azure.microsoft.com/en-in/services/batch/rendering.
    [6]
    Blender. www.blender.org.
    [7]
    Developers split over split-lock detection. https://lwn.net/Articles/806466/.
    [8]
    Google Cloud. www.cloud.google.com.
    [9]
    Google cloud rendering. www.zyncrender.com.
    [10]
    Houdini. www.sidefx.com.
    [11]
    Maya. www.autodesk.com/products/maya.
    [12]
    Microsoft Azure. www.azure.microsoft.com.
    [13]
    Split lock detection sent in for linux 5.7 to spot performance issues, unprivileged dos. www.phoronix.com/scan.php?page=news\_item\&px=Linux-5.7-Split-Lock-Detection.
    [14]
    Intel R 64 and IA-32 Architectures Software Developer's Manual. Volume 3b: System Programming Guide (Part 2), 2013.
    [15]
    Steven F Barrett and Daniel J Pack. Microcontrollers fundamentals for engineers and scientists. Synthesis Lectures on Digital Circuits and Systems, 1(1):1--124, 2005.
    [16]
    Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT, pages 72--81, New York, NY, USA, 2008. ACM.
    [17]
    Len Brown. Ubuntu 14.04 manpages: turbostat.
    [18]
    Shuang Chen, Christina Delimitrou, and José F Martínez. Parties: Qos-aware resource partitioning for multiple interactive services. In ASPLOS, pages 107--120. ACM, 2019.
    [19]
    Christina Delimitrou and Christos Kozyrakis. Paragon: QoS-aware Scheduling for Heterogeneous Datacenters. In ASPLOS, pages 77--88, New York, NY, USA, 2013. ACM.
    [20]
    Christina Delimitrou and Christos Kozyrakis. Quasar: Resource-efficient and QoS-aware Cluster Management. In ASPLOS, pages 127--144, New York, NY, USA, 2014. ACM.
    [21]
    Craig Disselkoen, David Kohlbrenner, Leo Porter, and Dean Tullsen. Prime+abort: A timer-free high-precision l3 cache attack using intel TSX. In USENIX Security, pages 51--67, 2017.
    [22]
    Jack J Dongarra, Cleve Barry Moler, James R Bunch, and Gilbert W Stewart. LINPACK users' guide. SIAM, 1979.
    [23]
    Tracy Fullerton, Jenova Chen, Kellee Santiago, Erik Nelson, Vincent Diamante, Aaron Meyers, Glenn Song, and John DeWeese. That cloud game: dreaming (and doing) innovative game design. In ACM SIGGRAPH symposium on Videogames, pages 51--59, 2006.
    [24]
    Karthik Ganesan and Lizy K John. Maximum multicore power (mampo): an automatic multithreaded synthetic power virus generation framework for multicore systems. In SC, page 53. ACM, 2011.
    [25]
    Chaima Ghribi, Makhlouf Hadji, and Djamal Zeghlache. Energy efficient vm scheduling for cloud data centers: Exact allocation and migration algorithms. In International Symposium on Cluster, Cloud, and Grid Computing, pages 671--678. IEEE, 2013.
    [26]
    Intel. Intel Resource Director Technology. 2016.
    [27]
    Mohammad A Islam and Shaolei Ren. Ohm's law in data centers: A voltage side channel for timing power attacks. In CCS, pages 146--162. ACM, 2018.
    [28]
    Mohammad A Islam, Shaolei Ren, and Adam Wierman. Exploiting a thermal side channel for power attacks in multi-tenant data centers. In CCS, pages 1079--1094. ACM, 2017.
    [29]
    Thomas Kailath. The divergence and bhattacharyya distance measures in signal selection. IEEE Transactions on Communication Technology, 15(1):52--60, 1967.
    [30]
    Harshad Kasture and Daniel Sanchez. Ubik: Efficient cache sharing with strict qos for latency-critical workloads. In ASPLOS, pages 729--742, New York, NY, USA, 2014. ACM.
    [31]
    Harshad Kasture and Daniel Sanchez. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization (IISWC), pages 1--10. IEEE, 2016.
    [32]
    Etienne Le Sueur and Gernot Heiser. Dynamic voltage and frequency scaling: The laws of diminishing returns. In Proceedings of International Conference on Power Aware Computing and Systems, pages 1--8, 2010.
    [33]
    David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. Heracles: Improving resource efficiency at scale. In ISCA, pages 450--462. ACM, 2015.
    [34]
    Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Micro, pages 248--259, New York, NY, USA, 2011. ACM.
    [35]
    Philip J Mucci, Kevin London, and John Thurman. The cachebench report. University of Tennessee, Knoxville, TN, 19, 1998.
    [36]
    Michael Nelson, Beng-Hong Lim, and Greg Hutchins. Fast transparent migration for virtual machines. In USENIX ATC, pages 391--394, 2005.
    [37]
    Hoai Viet Nguyen, Luigi Lo Iacono, and Hannes Federrath. Your cache has fallen: Cache-poisoned denial-of-service attack. In CCS, 2019.
    [38]
    Jinsu Park, Seongbeom Park, and Woongki Baek. Copart: Coordinated partitioning of last-level cache and memory bandwidth for fairness-aware workload consolidation on commodity servers. In EuroSys, page 10. ACM, 2019.
    [39]
    Sam Silvestro, Hongyu Liu, Tianyi Liu, Zhiqiang Lin, and Tongping Liu. Guarder: A tunable secure allocator. In USENIX Security, pages 117--133, 2018.
    [40]
    Allan Snavely and Dean M. Tullsen. Symbiotic jobscheduling for a simultaneous mutlithreading processor. SIGPLAN Not., 35(11):234--244, November 2000.
    [41]
    H V Sorensen, D Jones, Michael Heideman, and C Burrus. Real-valued fast fourier transform algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(6):849--863, 1987.
    [42]
    Paul Turner, Bharata B Rao, and Nikhil Rao. Cpu bandwidth control for cfs. 2010.
    [43]
    Venkatanathan Varadarajan, Thawan Kooburat, Benjamin Farley, Thomas Ristenpart, and Michael M Swift. Resource-freeing attacks: improve your cloud performance (at your neighbor's expense). In CCS, pages 281--292, 2012.
    [44]
    Vish Viswanathan, Karthik Kumar, T Willhalm, P Lu, B Filipiak, and S Sakthivelu. Intel memory latency checker. Intel Corporation, 2013.
    [45]
    Yaocheng Xiang, Xiaolin Wang, Zihui Huang, Zeyu Wang, Yingwei Luo, and Zhenlin Wang. Dcaps: dynamic cache allocation with partial sharing. In EuroSys, page 13. ACM, 2018.
    [46]
    Cong Xu, Karthick Rajamani, Alexandre Ferreira, Wesley Felter, Juan Rubio, and Yang Li. dcat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service. In EuroSys, page 14. ACM, 2018.
    [47]
    Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. In ISCA, pages 607--618, New York, NY, USA, 2013. ACM.
    [48]
    Yunqi Zhang, George Prekas, Giovanni M. Fumarola, Marcus Fontoura, Inigo Goiri, and Ricardo Bianchini. History-based harvesting of spare cycles and storage in large-scale datacenters. In OSDI, Berkeley, CA, USA, 2016. USENIX.
    [49]
    Haishan Zhu and Mattan Erez. Dirigent: Enforcing qos for latency-critical tasks on shared multicore systems. In ASPLOS, pages 33--47, New York, NY, USA, 2016. ACM.

    Cited By

    View all
    • (2021)Provisioning Differentiated Last-Level Cache Allocations to VMs in Public CloudsProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3487006(319-334)Online publication date: 1-Nov-2021

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
    November 2020
    1454 pages
    ISBN:9781728199986

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    IEEE Press

    Publication History

    Published: 09 November 2020

    Check for updates

    Qualifiers

    • Research-article

    Conference

    SC '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Provisioning Differentiated Last-Level Cache Allocations to VMs in Public CloudsProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3487006(319-334)Online publication date: 1-Nov-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media