Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
abstract

Machine Learning Systems are Bloated and Vulnerable

Published: 13 June 2024 Publication History

Abstract

Today's software is bloated with both code and features that are not used by most users. This bloat is prevalent across the entire software stack, from operating systems and applications to containers. Containers are lightweight virtualization technologies used to package code and dependencies, providing portable, reproducible and isolated environments. For their ease of use, data scientists often utilize machine learning containers to simplify their workflow. However, this convenience comes at a cost: containers are often bloated with unnecessary code and dependencies, resulting in very large sizes. In this paper, we analyze and quantify bloat in machine learning containers. We develop MMLB, a framework for analyzing bloat in software systems, focusing on machine learning containers. MMLB measures the amount of bloat at both the container and package levels, quantifying the sources of bloat. In addition, MMLB integrates with vulnerability analysis tools and performs package dependency analysis to evaluate the impact of bloat on container vulnerabilities. Through experimentation with 15 machine learning containers from TensorFlow, PyTorch, and Nvidia, we show that bloat accounts for up to 80% of machine learning container sizes, increasing container provisioning times by up to 370% and exacerbating vulnerabilities by up to 99%. For more detail, see the full paper, ~\citezhang2024machine.

References

[1]
Mohannad Alhanahnah, Rithik Jain, Vaibhav Rastogi, Somesh Jha, and Thomas Reps. 2022. Lightweight, Multi-Stage, Compiler-Assisted Application Specialization. In 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). 251--269. https://doi.org/10.1109/EuroSP53844.2022.00024
[2]
Anchore. 2022. Grype. https://github.com/anchore/grype. [Online; accessed 2022--10--27].
[3]
Cloud Architecture Center. 2023. MLOps: Continuous delivery and automation pipelines in machine learning. Google Cloud.
[4]
Conda. 2023. Conda Documentation. https://docs.conda.io/en/latest/. [Online; accessed 2023--12--15].
[5]
Ward Cunningham. 1992. The WyCash portfolio management system. ACM SIGPLAN OOPS Messenger, Vol. 4, 2 (1992), 29--30.
[6]
Klaus Haller. 2022. Managing AI in the Enterprise. Springer.
[7]
Kihong Heo, Woosuk Lee, Pardis Pashakhanloo, and Mayur Naik. 2018. Effective program debloating via reinforcement learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 380--394.
[8]
David OBrien, Sumon Biswas, Sayem Imtiaz, Rabe Abdalkareem, Emad Shihab, and Hridesh Rajan. 2022. 23 shades of self-admitted technical debt: an empirical study on machine learning software. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 734--746.
[9]
PIP. 2023. pip. https://pip.pypa.io/en/stable/. [Online; accessed 2023--12--15].
[10]
Chenxiong Qian, Hong Hu, Mansour Alharthi, Pak Ho Chung, Taesoo Kim, and Wenke Lee. 2019. $$RAZOR$$: A framework for post-deployment software debloating. In 28th USENIX Security Symposium (USENIX Security 19). 1733--1750.
[11]
David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. Advances in neural information processing systems, Vol. 28 (2015).
[12]
Aquacha Security. 2022. Trivy. https://github.com/a2018bloatasecurity/trivy. [Online; accessed 2022--10--27].
[13]
Yiming Tang, Raffi Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, and Anita Raja. 2021. An empirical study of refactorings and technical debt in Machine Learning systems. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 238--250.
[14]
Ubuntu. 2023. Package Managerment. https://ubuntu.com/server/docs/package-management. [Online; accessed 2023--12--15].
[15]
Huaifeng Zhang, Mohannad Alhanahnah, Fahmi Abdulqadir Ahmed, Dyako Fatih, Philipp Leitner, and Ahmed Ali-Eldin. 2024. Machine Learning Systems are Bloated and Vulnerable. Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 8, 1 (2024), 1--30. io

Index Terms

  1. Machine Learning Systems are Bloated and Vulnerable

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGMETRICS Performance Evaluation Review
    ACM SIGMETRICS Performance Evaluation Review  Volume 52, Issue 1
    SIGMETRICS '24
    June 2024
    104 pages
    DOI:10.1145/3673660
    • Editor:
    • Bo Ji
    Issue’s Table of Contents
    • cover image ACM Conferences
      SIGMETRICS/PERFORMANCE '24: Abstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems
      June 2024
      120 pages
      ISBN:9798400706240
      DOI:10.1145/3652963
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2024
    Published in SIGMETRICS Volume 52, Issue 1

    Check for updates

    Author Tags

    1. machine learning systems
    2. software debloating

    Qualifiers

    • Abstract

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 16
      Total Downloads
    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 21 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media