Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Machine learning (ML), including deep learning, has recently gained tremendous popularity in a wide range of applications. However, like traditional software, ML applications are not immune to the bugs that result from programming errors. Explicit programming errors usually manifest through error messages and stack traces. These stack traces describe the chain of function calls that lead to an anomalous situation, or exception. Indeed, these exceptions may cross the entire software stack (including applications and libraries). Thus, studying the ML-related patterns in stack traces can help practitioners and researchers understand the causes of exceptions in ML applications and the challenges faced by ML developers. To that end, we mine Stack Overflow (SO) and study 18, 538 ML-related stack traces related to seven popular Python ML libraries. First, we observe that ML questions that contain stack traces are less likely to get accepted answers than questions that don’t, even though they gain more attention (i.e., more views and comments). Second, we observe that recurrent patterns exist in ML stack traces, even across different ML libraries, with a small portion of patterns covering many stack traces. Third, we derive five high-level categories and 26 low-level types from the stack trace patterns: most patterns are related to model training, python basic syntax, parallelization, subprocess invocation, and external module execution. Furthermore, the patterns related to external dependencies (e.g., file operations) or manipulations of artifacts (e.g., model conversion) are among the least likely to get accepted answers on SO. Our findings provide insights for researchers, ML library developers, and technical forum moderators to better support ML developers in writing error-free ML code. For example, future research can leverage the common patterns of stack traces to help ML developers locate solutions to problems similar to theirs or to identify experts who have experience solving similar patterns of problems. Researchers and ML library developers could prioritize efforts to help ML developers identify misuses of ML APIs, mismatches in data formats, and potential data/resource contentions so that ML developers can better avoid/fix model-related exception patterns, data-related exception patterns, and multi-process-related exception patterns, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The data and scripts used to produce this work are shared in a replication package (MOOSELab 2022).

Notes

  1. https://data.stackexchange.com/stackoverflow/

  2. We used a small time overlap in the databases to ensure that we did not miss any SO posts. We remove any duplication in our final dataset.

  3. https://data.stackexchange.com/stackoverflow/queries

  4. https://stackoverflow.com/tags

  5. Separating Text Blocks aids in determining the length of descriptions in SO questions.

  6. https://meta.stackexchange.com/questions/36728/how-are-the-number-of-views-in-a-question-calculated

  7. https://stackoverflow.com/questions/57192638

  8. https://stackoverflow.com/questions/63311732

  9. https://stackoverflow.com/questions/74960725

  10. https://stackoverflow.com/questions/71365812

  11. https://stackoverflow.com/questions/74911581

  12. https://stackoverflow.com/questions/70969833

  13. https://www.k-alpha.org/

  14. https://delvetool.com/blog/axialcoding

  15. https://stackoverflow.com/questions/43604917

  16. https://stackoverflow.com/questions/63481755

  17. https://stackoverflow.com/questions/52832028

  18. https://stackoverflow.com/questions/56622503

  19. https://www.tensorflow.org/api_docs/python/tf/config/threading

  20. https://stackoverflow.com/questions/56447556

  21. https://stackoverflow.com/questions/42223668

  22. https://stackoverflow.com/questions/73636723

  23. https://stackoverflow.com/questions/64273829

  24. https://www.tensorflow.org/datasets

  25. https://stackoverflow.com/questions/39321495

  26. https://stackoverflow.com/questions/64771558

References

  • Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) \(\{\)TensorFlow\(\}\): A system for \(\{\)Large-Scale\(\}\) machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283

  • Alharthi H, Outioua D, Baysal O (2016) Predicting questions’ scores on Stack Overflow. In: 2016 IEEE/ACM 3rd international workshop on crowdsourcing in software engineering (CSI-SE), pp 1–7

  • Alshangiti M, Sapkota H, Murukannaiah PK, Liu X, Yu Q (2019) Why is developing machine learning applications challenging? a study on Stack Overflow posts. In: 2019 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM). IEEE, pp 1–11

  • Amershi S, Begel A, Bird C, DeLine R, Gall H, Kamar E, Nagappan N, Nushi B, Zimmermann T (2019) Software engineering for machine learning: A case study. In: 2019 IEEE/ACM 41st international conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp 291–300

  • Atwi H, Lin B, Tsantalis N, Kashiwa Y, Kamei Y, Ubayashi N, Bavota G, Lanza M (2021) Pyref: Refactoring detection in python projects. In: 2021 IEEE 21st international working conference on source code analysis and manipulation (SCAM), pp 136–141

  • Baltes S, Dumani L, Treude C, Diehl S (2018) Sotorrent.” Proceedings of the 15th international conference on mining software repositories

  • Bangash AA, Sahar H, Chowdhury S, Wong AW, Hindle A, Ali K (2019) What do developers know about machine learning: a study of ml discussions on stackoverflow. In: 2019 IEEE/ACM 16th international conference on mining software repositories (MSR). IEEE, pp 260–264

  • Bhat V, Gokhale A, Jadhav R, Pudipeddi J, Akoglu L (2014) Min (e) d your tags: Analysis of question response time in Stack Overflow. In: 2014 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM 2014). IEEE, pp 328–335

  • Borges H, Hora A, Valente MT (2016a) Predicting the popularity of github repositories. In: Proceedings of the The 12th international conference on predictive models and data analytics in software engineering, pp 1–10

  • Borges H, Hora A, Valente MT (2016b) Understanding the factors that impact the popularity of Github repositories. In: 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 334–344

  • Braiek HB, Khomh F, Adams B (2018) The open-closed principle of modern machine learning frameworks. In: Proceedings of the 15th international conference on mining software repositories, pp 353–363

  • Burnard P (1991) A method of analysing interview transcripts in qualitative research. Nurse Educ Today 11(6):461–466

    Article  Google Scholar 

  • Castelvecchi D (2016) Can we open the black box of ai? Nat News 538(7623):20

    Article  Google Scholar 

  • Chandrasekar P (2020) Scripting the future of Stack Overflow. [Online]. Available: https://stackoverflow.blog/2020/01/21/scripting-the-future-of-stack-2020-plans-vision/

  • Chollet F et al (2015) Keras. [Online]. Available: https://github.com/fchollet/keras

  • Debbarma MK, Debbarma S, Debbarma N, Chakma K, Jamatia A (2013) A review and analysis of software complexity metrics in structural testing. Int J Comput Commun Eng 2(2):129–133

    Article  Google Scholar 

  • Deloitte (2020) Annual report. https://www2.deloitte.com/content/dam/Deloitte/dk/Documents/about-deloitte/Impact_Report_20_21_web.pdf

  • Dilhara M, Ketkar A, Dig D (2021) Understanding software-2.0: A study of machine learning library usage and evolution. ACM Trans Softw Eng Methodol (TOSEM) 30(4):1–42

    Article  Google Scholar 

  • Gao W, Wu J, Xu G (2022) Detecting duplicate questions in Stack Overflow via source code modeling. Int J Softw Eng Knowl Eng 32(02):227–255

    Article  Google Scholar 

  • Gupta S (2021) What is the best language for machine learning? [Online]. Available: https://www.springboard.com/blog/data-science/best-language-for-machine-learning/

  • Hamidi A, Antoniol G, Khomh F, Di Penta M, Hamidi M (2021) Towards understanding developers’ machine-learning challenges: A multi-language study on Stack Overflow. In: 2021 IEEE 21st international working conference on source code analysis and manipulation (SCAM). IEEE, pp 58–69

  • Hayes AF, Krippendorff K (2007) Answering the call for a standard reliability measure for coding data. Commun Methods Meas 1(1):77–89

    Article  Google Scholar 

  • Huang C, Yao L, Wang X, Benatallah B, Sheng QZ (2017) Expert as a service: software expert recommendation via knowledge domain embeddings in Stack Overflow. In: 2017 IEEE international conference on web services (ICWS), pp 317–324

  • Humbatova N, Jahangirova G, Bavota G, Riccio V, Stocco A, Tonella P (2020) Taxonomy of real faults in deep learning systems. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 1110–1121

  • Humbatova N, Jahangirova G, Bavota G, Riccio V, Stocco A, Tonella P (2020) Taxonomy of real faults in deep learning systems. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 1110–1121

  • Islam MJ, Nguyen HA, Pan R, Rajan H (2019) What do developers ask about ml libraries? a large-scale study using Stack Overflow. Preprint arXiv:1906.11940

  • Islam MJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering (ESEC/FSE), pp 510–520

  • Islam MJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 510–520

  • Jordan MI, Mitchell TM (2015) Machine learning: Trends, perspectives, and prospects. Science 349(6245):255–260

    Article  MathSciNet  Google Scholar 

  • Khandkar SH (2009) Open coding. University of Calgary, vol 23, p 2009

  • Kou B, Di Y, Chen M, Zhang T (2022) Sosum: A dataset of Stack Overflow post summaries. In: Proceedings of the 19th international conference on mining software repositories, ser. MSR ’22. New York, NY, USA: Association for Computing Machinery, pp 247–251. [Online]. Available: https://doi.org/10.1145/3524842.3528487

  • Krippendorff K (2011) Computing krippendorff’s alpha-reliability

  • LaMorte WW (2017) Mann whitney u test (wilcoxon rank sum test). [Online]. Available: https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/BS704_Nonparametric4.html

  • Liu J, Baltes S, Treude C, Lo D, Zhang Y, Xia X (2021) Characterizing search activities on Stack Overflow. In: Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2021. New York, NY, USA: Association for Computing Machinery, pp 919–931

  • Loper E, Bird S (2002) Nltk: The natural language toolkit. Preprint arXiv:cs/0205028

  • Lune H, Berg BL (2017) Qualitative research methods for the social sciences. Pearson

  • Lyu Y, Li H, Sayagh M, Jiang ZM, Hassan AE (2021) An empirical study of the impact of data splitting decisions on the performance of aiops solutions. ACM Trans Softw Eng Methodol (TOSEM) 30(4):1–38

    Article  Google Scholar 

  • Majidi F, Openja M, Khomh F, Li H (2022) An empirical study on the usage of automated machine learning tools. Preprint arXiv:2208.13116

  • Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 50–60

  • Marzi G, Balzano M, Marchiori D (2024) K-alpha calculator’s alpha calculator: A user-friendly tool for computing krippendorff’s alpha inter-rater reliability coefficient. MethodsX, vol 12, p 102545. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2215016123005411

  • Medeiros M, Kulesza U, Bonifacio R, Adachi E, Coelho R (2020) Improving bug localization by mining crash reports: An industrial study. In: 2020 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 766–775

  • Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D, Freeman J, Tsai D, Amde M, Owen S, Xin D, Xin R, Franklin MJ, Zadeh R, Zaharia M, Talwalkar A (2016) Mllib: Machine learning in apache spark. J Mach Learn Res 17(1):1235–1241

    MathSciNet  Google Scholar 

  • MOOSELab (2022) Replication: What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow. https://github.com/mooselab/ML_StackTrace

  • NewVantage Partners (2022) The quest to achieve data-driven leadership: A progress report on the state of corporate data initiatives. [Online]. Available: https://c6abb8db-514c-4f5b-b5a1-fc710f1e464e.filesusr.com/ugd/e5361a_2f859f3457f24cff9b2f8a2bf54f82b7.pdf

  • Nguyen G, Dlugolinsky S, Bobák M, Tran V, Lopez Garcia A, Heredia I, Malík P, Hluchỳ L (2019) Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey. Artif Intell Rev 52(1):77–124

    Article  Google Scholar 

  • Overflow S (2021) Stack Overflow developer survey 2021. [Online]. Available: https://insights.stackoverflow.com/survey/2021#overview

  • Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, vol 32

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  Google Scholar 

  • Raschka S, Patterson J, Nolet C (2020) Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence

  • Raschka S, Patterson J, Nolet C (2020) Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information, vol 11, no 4. [Online]. Available: https://www.mdpi.com/2078-2489/11/4/193

  • Rubei R, Di Sipio C, Nguyen PT, Di Rocco J, Di Ruscio D (2020) Postfinder: Mining Stack Overflow posts to support software developers. Inf Softw Technol 127:106367

    Article  Google Scholar 

  • Sabor KK, Hamdaqa M, Hamou-Lhadj A (2020) Automatic prediction of the severity of bugs using stack traces and categorical features. Inf Softw Technol 123:106205

    Article  Google Scholar 

  • Stol K-J, Ralph P, Fitzgerald B (2016) Grounded theory in software engineering research: a critical review and guidelines. In: Proceedings of the 38th International conference on software engineering, pp 120–131

  • Sui L, Dietrich J, Tahir A (2017) On the use of mined stack traces to improve the soundness of statically constructed call graphs. In: 2017 24th Asia-Pacific software engineering conference (APSEC), pp 672–676

  • Sun X, Zhou T, Li G, Hu J, Yang H, Li B (2017) An empirical study on real bugs for machine learning programs. In: 2017 24th Asia-Pacific software engineering conference (APSEC). IEEE, pp 348–357

  • Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M et al (2019) Huggingface’s transformers: State-of-the-art natural language processing. Preprint arXiv:1910.03771

  • Xu S, Bennett A, Hoogeveen D, Lau JH, Baldwin T (2018) Preferred answer selection in Stack Overflow: Better text representations ... and metadata, metadata, metadata. In: Proceedings of the 2018 EMNLP workshop W-NUT: The 4th workshop on noisy user-generated text. Brussels, Belgium: Association for Computational Linguistics, pp 137–147. [Online]. Available: https://aclanthology.org/W18-6119

  • Zhang J, Wang Y, Yang D (2015) Ccspan: Mining closed contiguous sequential patterns. Knowl-Based Syst 89:1–13

    Article  Google Scholar 

  • Zhang T, Gao C, Ma L, Lyu M, Kim M (2019) An empirical study of common challenges in developing deep learning applications. In: 2019 IEEE 30th international symposium on software reliability engineering (ISSRE). IEEE, pp 104–115

  • Zhang R, Xiao W, Zhang H, Liu Y, Lin H, Yang M (2020) An empirical study on program failures of deep learning jobs. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, ser. ICSE ’20. New York, NY, USA: Association for Computing Machinery, pp 1159–1170

Download references

Acknowledgements

We would like to gratefully acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Fonds de recherche du Québec - Nature et technologies (FRQNT) for their funding support for this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amin Ghadesi.

Ethics declarations

Competing Interest

No financial support was provided by any institution for the compilation of this work.

Additional information

Communicated by: Davide Falessi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghadesi, A., Lamothe, M. & Li, H. What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow. Empir Software Eng 29, 107 (2024). https://doi.org/10.1007/s10664-024-10499-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-024-10499-9

Keywords