What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow

Ghadesi, Amin; Lamothe, Maxime; Li, Heng

doi:10.1007/s10664-024-10499-9

What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow

Published: 03 July 2024

Volume 29, article number 107, (2024)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

167 Accesses
Explore all metrics

Abstract

Machine learning (ML), including deep learning, has recently gained tremendous popularity in a wide range of applications. However, like traditional software, ML applications are not immune to the bugs that result from programming errors. Explicit programming errors usually manifest through error messages and stack traces. These stack traces describe the chain of function calls that lead to an anomalous situation, or exception. Indeed, these exceptions may cross the entire software stack (including applications and libraries). Thus, studying the ML-related patterns in stack traces can help practitioners and researchers understand the causes of exceptions in ML applications and the challenges faced by ML developers. To that end, we mine Stack Overflow (SO) and study 18, 538 ML-related stack traces related to seven popular Python ML libraries. First, we observe that ML questions that contain stack traces are less likely to get accepted answers than questions that don’t, even though they gain more attention (i.e., more views and comments). Second, we observe that recurrent patterns exist in ML stack traces, even across different ML libraries, with a small portion of patterns covering many stack traces. Third, we derive five high-level categories and 26 low-level types from the stack trace patterns: most patterns are related to model training, python basic syntax, parallelization, subprocess invocation, and external module execution. Furthermore, the patterns related to external dependencies (e.g., file operations) or manipulations of artifacts (e.g., model conversion) are among the least likely to get accepted answers on SO. Our findings provide insights for researchers, ML library developers, and technical forum moderators to better support ML developers in writing error-free ML code. For example, future research can leverage the common patterns of stack traces to help ML developers locate solutions to problems similar to theirs or to identify experts who have experience solving similar patterns of problems. Researchers and ML library developers could prioritize efforts to help ML developers identify misuses of ML APIs, mismatches in data formats, and potential data/resource contentions so that ML developers can better avoid/fix model-related exception patterns, data-related exception patterns, and multi-process-related exception patterns, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic prediction of rejected edits in Stack Overflow

Article 24 November 2022

Jeu de mots paronomasia: a StackOverflow-driven bug discovery approach

Article Open access 03 April 2023

A literature review and existing challenges on software logging practices

Article 18 June 2024

Data Availability

The data and scripts used to produce this work are shared in a replication package (MOOSELab 2022).

Notes

https://data.stackexchange.com/stackoverflow/
We used a small time overlap in the databases to ensure that we did not miss any SO posts. We remove any duplication in our final dataset.
https://data.stackexchange.com/stackoverflow/queries
https://stackoverflow.com/tags
Separating Text Blocks aids in determining the length of descriptions in SO questions.
https://meta.stackexchange.com/questions/36728/how-are-the-number-of-views-in-a-question-calculated
https://stackoverflow.com/questions/57192638
https://stackoverflow.com/questions/63311732
https://stackoverflow.com/questions/74960725
https://stackoverflow.com/questions/71365812
https://stackoverflow.com/questions/74911581
https://stackoverflow.com/questions/70969833
https://www.k-alpha.org/
https://delvetool.com/blog/axialcoding
https://stackoverflow.com/questions/43604917
https://stackoverflow.com/questions/63481755
https://stackoverflow.com/questions/52832028
https://stackoverflow.com/questions/56622503
https://www.tensorflow.org/api_docs/python/tf/config/threading
https://stackoverflow.com/questions/56447556
https://stackoverflow.com/questions/42223668
https://stackoverflow.com/questions/73636723
https://stackoverflow.com/questions/64273829
https://www.tensorflow.org/datasets
https://stackoverflow.com/questions/39321495
https://stackoverflow.com/questions/64771558

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) $\{$TensorFlow$\}$: A system for $\{$Large-Scale$\}$ machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283
Alharthi H, Outioua D, Baysal O (2016) Predicting questions’ scores on Stack Overflow. In: 2016 IEEE/ACM 3rd international workshop on crowdsourcing in software engineering (CSI-SE), pp 1–7
Alshangiti M, Sapkota H, Murukannaiah PK, Liu X, Yu Q (2019) Why is developing machine learning applications challenging? a study on Stack Overflow posts. In: 2019 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM). IEEE, pp 1–11
Amershi S, Begel A, Bird C, DeLine R, Gall H, Kamar E, Nagappan N, Nushi B, Zimmermann T (2019) Software engineering for machine learning: A case study. In: 2019 IEEE/ACM 41st international conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp 291–300
Atwi H, Lin B, Tsantalis N, Kashiwa Y, Kamei Y, Ubayashi N, Bavota G, Lanza M (2021) Pyref: Refactoring detection in python projects. In: 2021 IEEE 21st international working conference on source code analysis and manipulation (SCAM), pp 136–141
Baltes S, Dumani L, Treude C, Diehl S (2018) Sotorrent.” Proceedings of the 15th international conference on mining software repositories
Bangash AA, Sahar H, Chowdhury S, Wong AW, Hindle A, Ali K (2019) What do developers know about machine learning: a study of ml discussions on stackoverflow. In: 2019 IEEE/ACM 16th international conference on mining software repositories (MSR). IEEE, pp 260–264
Bhat V, Gokhale A, Jadhav R, Pudipeddi J, Akoglu L (2014) Min (e) d your tags: Analysis of question response time in Stack Overflow. In: 2014 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM 2014). IEEE, pp 328–335
Borges H, Hora A, Valente MT (2016a) Predicting the popularity of github repositories. In: Proceedings of the The 12th international conference on predictive models and data analytics in software engineering, pp 1–10
Borges H, Hora A, Valente MT (2016b) Understanding the factors that impact the popularity of Github repositories. In: 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 334–344
Braiek HB, Khomh F, Adams B (2018) The open-closed principle of modern machine learning frameworks. In: Proceedings of the 15th international conference on mining software repositories, pp 353–363
Burnard P (1991) A method of analysing interview transcripts in qualitative research. Nurse Educ Today 11(6):461–466
Article Google Scholar
Castelvecchi D (2016) Can we open the black box of ai? Nat News 538(7623):20
Article Google Scholar
Chandrasekar P (2020) Scripting the future of Stack Overflow. [Online]. Available: https://stackoverflow.blog/2020/01/21/scripting-the-future-of-stack-2020-plans-vision/
Chollet F et al (2015) Keras. [Online]. Available: https://github.com/fchollet/keras
Debbarma MK, Debbarma S, Debbarma N, Chakma K, Jamatia A (2013) A review and analysis of software complexity metrics in structural testing. Int J Comput Commun Eng 2(2):129–133
Article Google Scholar
Deloitte (2020) Annual report. https://www2.deloitte.com/content/dam/Deloitte/dk/Documents/about-deloitte/Impact_Report_20_21_web.pdf
Dilhara M, Ketkar A, Dig D (2021) Understanding software-2.0: A study of machine learning library usage and evolution. ACM Trans Softw Eng Methodol (TOSEM) 30(4):1–42
Article Google Scholar
Gao W, Wu J, Xu G (2022) Detecting duplicate questions in Stack Overflow via source code modeling. Int J Softw Eng Knowl Eng 32(02):227–255
Article Google Scholar
Gupta S (2021) What is the best language for machine learning? [Online]. Available: https://www.springboard.com/blog/data-science/best-language-for-machine-learning/
Hamidi A, Antoniol G, Khomh F, Di Penta M, Hamidi M (2021) Towards understanding developers’ machine-learning challenges: A multi-language study on Stack Overflow. In: 2021 IEEE 21st international working conference on source code analysis and manipulation (SCAM). IEEE, pp 58–69
Hayes AF, Krippendorff K (2007) Answering the call for a standard reliability measure for coding data. Commun Methods Meas 1(1):77–89
Article Google Scholar
Huang C, Yao L, Wang X, Benatallah B, Sheng QZ (2017) Expert as a service: software expert recommendation via knowledge domain embeddings in Stack Overflow. In: 2017 IEEE international conference on web services (ICWS), pp 317–324
Humbatova N, Jahangirova G, Bavota G, Riccio V, Stocco A, Tonella P (2020) Taxonomy of real faults in deep learning systems. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 1110–1121
Humbatova N, Jahangirova G, Bavota G, Riccio V, Stocco A, Tonella P (2020) Taxonomy of real faults in deep learning systems. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 1110–1121
Islam MJ, Nguyen HA, Pan R, Rajan H (2019) What do developers ask about ml libraries? a large-scale study using Stack Overflow. Preprint arXiv:1906.11940
Islam MJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering (ESEC/FSE), pp 510–520
Islam MJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 510–520
Jordan MI, Mitchell TM (2015) Machine learning: Trends, perspectives, and prospects. Science 349(6245):255–260
Article MathSciNet Google Scholar
Khandkar SH (2009) Open coding. University of Calgary, vol 23, p 2009
Kou B, Di Y, Chen M, Zhang T (2022) Sosum: A dataset of Stack Overflow post summaries. In: Proceedings of the 19th international conference on mining software repositories, ser. MSR ’22. New York, NY, USA: Association for Computing Machinery, pp 247–251. [Online]. Available: https://doi.org/10.1145/3524842.3528487
Krippendorff K (2011) Computing krippendorff’s alpha-reliability
LaMorte WW (2017) Mann whitney u test (wilcoxon rank sum test). [Online]. Available: https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/BS704_Nonparametric4.html
Liu J, Baltes S, Treude C, Lo D, Zhang Y, Xia X (2021) Characterizing search activities on Stack Overflow. In: Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2021. New York, NY, USA: Association for Computing Machinery, pp 919–931
Loper E, Bird S (2002) Nltk: The natural language toolkit. Preprint arXiv:cs/0205028
Lune H, Berg BL (2017) Qualitative research methods for the social sciences. Pearson
Lyu Y, Li H, Sayagh M, Jiang ZM, Hassan AE (2021) An empirical study of the impact of data splitting decisions on the performance of aiops solutions. ACM Trans Softw Eng Methodol (TOSEM) 30(4):1–38
Article Google Scholar
Majidi F, Openja M, Khomh F, Li H (2022) An empirical study on the usage of automated machine learning tools. Preprint arXiv:2208.13116
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 50–60
Marzi G, Balzano M, Marchiori D (2024) K-alpha calculator’s alpha calculator: A user-friendly tool for computing krippendorff’s alpha inter-rater reliability coefficient. MethodsX, vol 12, p 102545. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2215016123005411
Medeiros M, Kulesza U, Bonifacio R, Adachi E, Coelho R (2020) Improving bug localization by mining crash reports: An industrial study. In: 2020 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 766–775
Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D, Freeman J, Tsai D, Amde M, Owen S, Xin D, Xin R, Franklin MJ, Zadeh R, Zaharia M, Talwalkar A (2016) Mllib: Machine learning in apache spark. J Mach Learn Res 17(1):1235–1241
MathSciNet Google Scholar
MOOSELab (2022) Replication: What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow. https://github.com/mooselab/ML_StackTrace
NewVantage Partners (2022) The quest to achieve data-driven leadership: A progress report on the state of corporate data initiatives. [Online]. Available: https://c6abb8db-514c-4f5b-b5a1-fc710f1e464e.filesusr.com/ugd/e5361a_2f859f3457f24cff9b2f8a2bf54f82b7.pdf
Nguyen G, Dlugolinsky S, Bobák M, Tran V, Lopez Garcia A, Heredia I, Malík P, Hluchỳ L (2019) Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey. Artif Intell Rev 52(1):77–124
Article Google Scholar
Overflow S (2021) Stack Overflow developer survey 2021. [Online]. Available: https://insights.stackoverflow.com/survey/2021#overview
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, vol 32
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet Google Scholar
Raschka S, Patterson J, Nolet C (2020) Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence
Raschka S, Patterson J, Nolet C (2020) Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information, vol 11, no 4. [Online]. Available: https://www.mdpi.com/2078-2489/11/4/193
Rubei R, Di Sipio C, Nguyen PT, Di Rocco J, Di Ruscio D (2020) Postfinder: Mining Stack Overflow posts to support software developers. Inf Softw Technol 127:106367
Article Google Scholar
Sabor KK, Hamdaqa M, Hamou-Lhadj A (2020) Automatic prediction of the severity of bugs using stack traces and categorical features. Inf Softw Technol 123:106205
Article Google Scholar
Stol K-J, Ralph P, Fitzgerald B (2016) Grounded theory in software engineering research: a critical review and guidelines. In: Proceedings of the 38th International conference on software engineering, pp 120–131
Sui L, Dietrich J, Tahir A (2017) On the use of mined stack traces to improve the soundness of statically constructed call graphs. In: 2017 24th Asia-Pacific software engineering conference (APSEC), pp 672–676
Sun X, Zhou T, Li G, Hu J, Yang H, Li B (2017) An empirical study on real bugs for machine learning programs. In: 2017 24th Asia-Pacific software engineering conference (APSEC). IEEE, pp 348–357
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M et al (2019) Huggingface’s transformers: State-of-the-art natural language processing. Preprint arXiv:1910.03771
Xu S, Bennett A, Hoogeveen D, Lau JH, Baldwin T (2018) Preferred answer selection in Stack Overflow: Better text representations ... and metadata, metadata, metadata. In: Proceedings of the 2018 EMNLP workshop W-NUT: The 4th workshop on noisy user-generated text. Brussels, Belgium: Association for Computational Linguistics, pp 137–147. [Online]. Available: https://aclanthology.org/W18-6119
Zhang J, Wang Y, Yang D (2015) Ccspan: Mining closed contiguous sequential patterns. Knowl-Based Syst 89:1–13
Article Google Scholar
Zhang T, Gao C, Ma L, Lyu M, Kim M (2019) An empirical study of common challenges in developing deep learning applications. In: 2019 IEEE 30th international symposium on software reliability engineering (ISSRE). IEEE, pp 104–115
Zhang R, Xiao W, Zhang H, Liu Y, Lin H, Yang M (2020) An empirical study on program failures of deep learning jobs. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, ser. ICSE ’20. New York, NY, USA: Association for Computing Machinery, pp 1159–1170

Download references

Acknowledgements

We would like to gratefully acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Fonds de recherche du Québec - Nature et technologies (FRQNT) for their funding support for this work.

Author information

Authors and Affiliations

The Department of Computer and Software Engineering, Polytechnique Montreal, Montreal, QC, Canada
Amin Ghadesi, Maxime Lamothe & Heng Li

Authors

Amin Ghadesi
View author publications
You can also search for this author in PubMed Google Scholar
Maxime Lamothe
View author publications
You can also search for this author in PubMed Google Scholar
Heng Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amin Ghadesi.

Ethics declarations

Competing Interest

No financial support was provided by any institution for the compilation of this work.

Additional information

Communicated by: Davide Falessi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ghadesi, A., Lamothe, M. & Li, H. What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow. Empir Software Eng 29, 107 (2024). https://doi.org/10.1007/s10664-024-10499-9

Download citation

Accepted: 17 May 2024
Published: 03 July 2024
DOI: https://doi.org/10.1007/s10664-024-10499-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic prediction of rejected edits in Stack Overflow

Jeu de mots paronomasia: a StackOverflow-driven bug discovery approach

A literature review and existing challenges on software logging practices

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic prediction of rejected edits in Stack Overflow

Jeu de mots paronomasia: a StackOverflow-driven bug discovery approach

A literature review and existing challenges on software logging practices

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation