Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2969442.2969519guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Hidden technical debt in Machine learning systems

Published: 07 December 2015 Publication History
  • Get Citation Alerts
  • Abstract

    Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we find it is common to incur massive ongoing maintenance costs in real-world ML systems. We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns.

    References

    [1]
    R. Ananthanarayanan, V. Basker, S. Das, A. Gupta, H. Jiang, T. Qiu, A. Reznichenko, D. Ryabkov, M. Singh, and S. Venkataraman. Photon: Fault-tolerant and scalable joining of continuous data streams. In SIGMOD '13: Proceedings of the 2013 international conference on Management of data, pages 577-588, New York, NY, USA, 2013.
    [2]
    A. Anonymous. Machine learning: The high-interest credit card of technical debt. SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop).
    [3]
    L. Bottou, J. Peters, J. Quiñonero Candela, D. X. Charles, D. M. Chickering, E. Portugaly, D. Ray, P. Simard, and E. Snelson. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14(Nov), 2013.
    [4]
    W. J. Brown, H. W. McCormick, T. J. Mowbray, and R. C. Malveau. Antipatterns: refactoring software, architectures, and projects in crisis. 1998.
    [5]
    T. M. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI '14, Broomfield, CO, USA, October 6-8, 2014., pages 571-582, 2014.
    [6]
    B. Dalessandro, D. Chen, T. Raeder, C. Perlich, M. Han Williams, and F. Provost. Scalable hands-free transfer learning for online advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1573-1582. ACM, 2014.
    [7]
    M. Fowler. Code smells. http://http://martinfowler.com/bliki/CodeSmell.html.
    [8]
    M. Fowler. Refactoring: improving the design of existing code. Pearson Education India, 1999.
    [9]
    J. Langford and T. Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in neural information processing systems, pages 817-824, 2008.
    [10]
    M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B. Su. Scaling distributed machine learning with the parameter server. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI '14, Broomfield, CO, USA, October 6-8, 2014., pages 583-598, 2014.
    [11]
    J. Lin and D. Ryaboy. Scaling big data mining infrastructure: the twitter experience. ACM SIGKDD Explorations Newsletter, 14(2):6-19, 2013.
    [12]
    H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, S. Chikkerur, D. Liu, M. Wattenberg, A. M. Hrafnkelsson, T. Boulos, and J. Kubica. Ad click prediction: a view from the trenches. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11-14, 2013, 2013.
    [13]
    J. D. Morgenthaler, M. Gridnev, R. Sauciuc, and S. Bhansali. Searching for build debt: Experiences managing technical debt at google. In Proceedings of the Third International Workshop on Managing Technical Debt, 2012.
    [14]
    D. Sculley, M. E. Otey, M. Pohl, B. Spitznagel, J. Hainsworth, and Y. Zhou. Detecting adversarial advertisements in the wild. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21-24, 2011, 2011.
    [15]
    Securities and E. Commission. SEC Charges Knight Capital With Violations of Market Access Rule, 2013.
    [16]
    A. Spector, P. Norvig, and S. Petrov. Google's hybrid approach to research. Communications of the ACM, 55 Issue 7, 2012.
    [17]
    A. Zheng. The challenges of building machine learning tools for the masses. SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop).

    Cited By

    View all
    • (2024)Navigating Challenges and Technical Debt in Large Language Models DeploymentProceedings of the 4th Workshop on Machine Learning and Systems10.1145/3642970.3655840(192-199)Online publication date: 22-Apr-2024
    • (2024)Law and the Emerging Political Economy of Algorithmic AuditsProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658970(1255-1267)Online publication date: 3-Jun-2024
    • (2023)Error discovery by clustering influence embeddingsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667931(41765-41777)Online publication date: 10-Dec-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2
    December 2015
    3626 pages

    Publisher

    MIT Press

    Cambridge, MA, United States

    Publication History

    Published: 07 December 2015

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Navigating Challenges and Technical Debt in Large Language Models DeploymentProceedings of the 4th Workshop on Machine Learning and Systems10.1145/3642970.3655840(192-199)Online publication date: 22-Apr-2024
    • (2024)Law and the Emerging Political Economy of Algorithmic AuditsProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658970(1255-1267)Online publication date: 3-Jun-2024
    • (2023)Error discovery by clustering influence embeddingsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667931(41765-41777)Online publication date: 10-Dec-2023
    • (2023)TPCx-AI - An Industry Standard Benchmark for Artificial Intelligence and Machine Learning SystemsProceedings of the VLDB Endowment10.14778/3611540.361155416:12(3649-3661)Online publication date: 1-Aug-2023
    • (2023)SensiX++: Bringing MLOps and Multi-tenant Model Serving to Sensory Edge DevicesACM Transactions on Embedded Computing Systems10.1145/361750722:6(1-27)Online publication date: 9-Nov-2023
    • (2023)Quantitative Decomposition of Prediction Errors Revealing Multi-Cause Impacts: An Insightful Framework for MLOpsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615238(4259-4263)Online publication date: 21-Oct-2023
    • (2023)Taming the Diversity of Computational NotebooksProceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A10.1145/3579027.3608974(27-33)Online publication date: 28-Aug-2023
    • (2022)End-to-end Machine Learning using KubeflowProceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)10.1145/3493700.3493768(336-338)Online publication date: 8-Jan-2022
    • (2021)Using VDMS to index and search 100M imagesProceedings of the VLDB Endowment10.14778/3476311.347638114:12(3240-3252)Online publication date: 28-Oct-2021
    • (2021)AutoMLProceedings of the 22nd International Middleware Conference: Doctoral Symposium10.1145/3491087.3493674(4-5)Online publication date: 6-Dec-2021
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media