Abstract
Detecting Bug Inducing Commit (BIC) or Just in Time (JIT) defect prediction using Machine Learning (ML) based models requires tabulated feature values extracted from the source code or historical maintenance data of a software system. Existing studies have utilized meta-data from source code repositories (we named them GitHub Statistics or GS), n-gram-based source code text processing, and developer’s information (e.g., the experience of a developer) as the feature values in ML-based bug detection models. However, these feature values do not represent the source code syntax styles or patterns that a developer might prefer over available valid alternatives provided by programming languages. This investigation proposed a method to extract features from its source code syntax patterns to represent software commits and investigate whether they are helpful in detecting bug proneness in software systems. We utilize six manually and two automatically labeled datasets from eight open-source software projects written in Java, C++, and Python programming languages. Our datasets contain 642 manually labeled and 4014 automatically labeled buggy and non-buggy commits from six and two subject systems, respectively. The subject systems contain a diverse number of revisions, and they are from various application domains. Our investigation shows the inclusion of the proposed features increases the performance of detecting buggy and non-buggy software commits using five different machine learning classification models. Our proposed features also perform better in detecting buggy commits using the Deep Belief Network generated features and classification model. This investigation also implemented a state-of-the-art tool to compare the explainability of predicted buggy commits using our proposed and traditional features and found that our proposed features provide better reasoning about buggy commit detection compared to the traditional features. The continuation of this study can lead us to enhance software effectiveness by identifying, minimizing, and fixing software bugs during its maintenance and evolution.
Similar content being viewed by others
Data availability
The datasets and source files generated during and/or analyzed during this study are available in our GitHub repository (https://github.com/mnadims/bicDetectionSF/) for readers to investigate and facilitate any replication study.
References
albertbup. (2017). A python implementation of deep belief networks built upon numpy and tensorflow with scikit-learn compatibility. https://github.com/albertbup/deep-belief-network
Asaduzzaman, M., Roy, C. K., & Schneider, K. A. (2011). Viscad: Flexible code clone analysis support for nicad. In: Proceedings of the 5th International Workshop on Software Clones (IWSC’11). Association for Computing Machinery, New York, NY, USA, pp 77–78.
Asaduzzaman, M., Bullock, M. C., Roy, C. K., et al. (2012). Bug introducing changes: A case study with android. In: Proceedings of the 9th IEEE Working Conference on Mining Software Repositories (MSR’12), pp 116–119.
Asaduzzaman, M., Ahasanuzzaman, M., Roy, C. K., et al. (2016). How developers use exception handling in java? In: Proceedings of the IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp 516–519.
Aversano, L., Cerulo, L., & DelGrosso, C. (2007). Learning from bug-introducing changes to prevent fault prone code. In: Proceedings of the 9th International Workshop on Principles of Software Evolution: In Conjunction with the 6th ESEC/FSE Joint Meeting (IWPSE’07), pp 19–26.
Bavota, G., DeCarluccio, B., DeLucia, A., et al. (2012). When does a refactoring induce bugs? an empirical study. In: Proceedings of the IEEE 12th International Working Conference on Source Code Analysis and Manipulation (SCAM’12), pp 104–113.
Bernardi, M. L., Canfora, G., Di Lucca, G. A., et al. (2012). Do developers introduce bugs when they do not communicate? the case of eclipse and mozilla. In: Proceedings of the 16th European Conference on Software Maintenance and Reengineering (CSMR’12), pp 139–148.
Borg, M., Svensson, O., Berg, K., et al. (2019). Szz unleashed: an open implementation of the szz algorithm - featuring example usage in a study of just-in-time bug prediction for the jenkins project. In: Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE’19).
Brownlee, J. (2017). A gentle introduction to the bag-of-words model. https://machinelearningmastery.com/gentle-introduction-bag-words-model/. [Online; Accessed 28 Sept 2021].
Canfora, G., Ceccarelli, M., Cerulo, L., et al. (2011). How long does a bug survive? an empirical study. In: Proceedings of the 18th Working Conference on Reverse Engineering (WCRE’11), pp 191–200.
Casalnuovo, C., Lee, K., Wang, H., et al. (2019). Do people prefer “natural” code? CoRR.
Cavnar, W., & Trenkle, J. (1994). N-gram-based text categorization. Ann Arbor MI, 48113(2), 161–175.
Cordy, J. R., & Roy, C. K. (2011). The nicad clone detector. In: Proceedings of the IEEE International Conference on Program Comprehension (ICPC’11), pp 219–220.
da Costa, D. A., McIntosh, S., Shang, W., et al. (2017). A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Transactions on Software Engineering, 43(7), 641–657.
Davies, S., Roper, M., & Wood, M. (2014). Comparing text-based and dependence-based approaches for determining the origins of bugs. Journal of Software: Evolution and Process, 26(1), 107–139.
Developers.Google. (2020a). Classification: Precision and Recall - Machine Learning Crash Course. https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall. Accessed 26 Aug 2021.
Developers.Google. (2020b). Classification: ROC Curve and AUC - Machine Learning Crash Course. https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc. Accessed 26 Aug 2021.
Ell, J. (2013). Identifying failure inducing developer pairs within developer networks. In: Proceedings of the 35th International Conference on Software Engineering (ICSE’13), pp 1471–1473.
Eyolfson, J., Tan, L., & Lam, P. (2011). Do time of day and developer experience affect commit bugginess? In: Proceedings of the 8th Working Conference on Mining Software Repositories (MSR’11), pp 153–162.
Fukushima, T., Kamei, Y., McIntosh, S., et al. (2014). An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR’14), pp 172–181.
Goues, C. L., Pradel, M., & Roychoudhury, A. (2019). Automated program repair. Communications of the ACM, p. 56–65.
Gu, Z., Barr, E. T., Hamilton, D. J., et al. (2010). Has the bug really been fixed? In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE’10), pp 55–64.
Hall, T., Beecham, S., Bowes, D., et al. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304.
Hinton, G. E. (2007). Learning multiple layers of representation. Trends in Cognitive Sciences, 11(10), 428–434.
Hinton, G. E., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.
Hoang, T., Khanh Dam, H., Kamei, Y., et al. (2019). Deepjit: An end-to-end deep learning framework for just-in-time defect prediction. In: Proceedings of the IEEE/ACM 16th International Conference on Mining Software Repositories (MSR’19), pp 34–45.
Hoang, T., Kang, H. J., Lo, D., et al. (2020). Cc2vec: Distributed representations of code changes. In: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20), pp 518–529.
Jeffrey, D., Feng, M., Neelam, G., et al. (2009). Bugfix: A learning-based tool to assist developers in fixing bugs. In: Proceedings of the IEEE 17th International Conference on Program Comprehension, pp 70–79.
Jiang, J., Xiong, Y., Zhang, H, et al. (2018). Shaping program repair space with existing patches and similar code. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’18), p 298–309.
Jimenez, M., Maxime, C., LeTraon, Y., et al. (2018). On the impact of tokenizer and parameters on n-gram based code analysis. In: 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 437–448. https://doi.org/10.1109/ICSME.2018.00053
Kamei, Y., Shihab, E., Adams, B., et al. (2013). A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering, pp. 757–773.
Kamei, Y., Fukushima, T., Mcintosh, S., et al. (2016). Studying just-in-time defect prediction using cross-project models. Empirical Software Engineering, p 2072–2106.
Kim, D., Nam, J., Song, J., et al. (2013). Automatic patch generation learned from human-written patches. In: Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13), p 802–811.
Kim, S., & Whitehead Jr, E. J. (2006). How long did it take to fix bugs? In: Proceedings of the International Workshop on Mining Software Repositories (MSR’06), pp 173 – 174.
Kim, S., Pan, K., Whitehead, E. E. J. (2006a). Memories of bug fixes. In: Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, SIGSOFT ’06/FSE-14, p 35–45. https://doi.org/10.1145/1181775.1181781
Kim, S., Zimmermann, T., Pan, K., et al. (2006b). Automatic identification of bug-introducing changes. In: Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE’06), pp 81–90.
Kim, S., Zimmermann, T., Whitehead Jr, E. J., et al. (2007). Predicting faults from cached history. In: Proceedings of the 29th International Conference on Software Engineering (ICSE’07), pp 489–498.
Kim, S., Whitehead, E. J., Jr, & Zhang, Y. (2008). Classifying software changes: Clean or buggy? IEEE Transactions on Software Engineering, 34(2), 181–196.
Kirch, W., (ed). (2008). Pearson’s Correlation Coefficient, Springer Netherlands, Dordrecht, pp 1090–1091. https://doi.org/10.1007/978-1-4020-5614-7_2569
Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, 32(ICML’14), II–1188–II–1196.
Li, K., Xiang, Z., Chen, T., et al. (2020a). Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: An empirical study. In: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20), pp 566–577.
Li, Y., Wang, S., Nguyen, T. N. (2020b). Dlfix: Context-based code transformation learning for automated program repair. In: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20), pp 602–614.
Liu, K., Koyuncu, A., Kim, D., et al. (2019). Tbar: Revisiting template-based automated program repair. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’19), p 31–42.
Martinez, M., & Monperrus, M. (2015). Mining software repair models for reasoning on the search space of automated program fixing. Empirical Software Engineering, p 176–205.
Martinez, M., Weimer, W., & Monperrus, M. (2014). Do the fix ingredients already exist? an empirical inquiry into the redundancy assumptions of program repair approaches. In: Companion Proceedings of the 36th International Conference on Software Engineering, p 492–495.
Mizuno, O., & Hata, H. (2013). A metric to detect fault-prone software modules using text filtering. International Journal of Reliability and Safety, 7(1), 17–31.
Nadim, M. (2020). Investigating the techniques to detect and reduce bug inducing commits during change operations in software systems. Master’s thesis, University of Saskatchewan, Saskatoon, Canada, https://harvest.usask.ca/handle/10388/13125
Nadim, M., Mondal, M., & Roy, C. K. (2020). Evaluating performance of clone detection tools in detecting cloned cochange candidates. In: Proceedings of the 14th International Workshop on Software Clones (IWSC’20), pp 15–21.
Nayrolles, M., & Hamou-Lhadj, A. (2018). Clever: Combining code metrics with clone detection for just-in-time fault prevention and resolution in large industrial projects. In: Proceedings of the IEEE/ACM 15th International Conference on Mining Software Repositories (MSR’18), pp 153–164.
Pedregosa, F., Varoquaux, G., Gramfort, A., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Pei, Y., Furia, C. A., Nordio, M., et al. (2014). Automatic program repair by fixing contracts. In: Proceedings of Fundamental Approaches to Software Engineering, pp 246–260.
Pornprasit, C., & Tantithamthavorn, C. (2021). Jitline: A simpler, better, faster, finer-grained just-in-time defect prediction. In: Proceedings of the International Conference on Mining Software Repositories (MSR), p To Appear.
Pornprasit, C., Tantithamthavorn, C., Jiarpakdee, J., et al. (2021). Pyexplainer: Explaining the predictions of just-in-time defect models. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp 407–418. https://doi.org/10.1109/ASE51524.2021.9678763
Rahman, M. M., & Roy, C. K. (2018). Improving ir-based bug localization with context-aware query reformulation. In: Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’18). Association for Computing Machinery, p 621–632.
Rosen, C., Grawi, B., & Shihab, E. (2015). Commit guru: Analytics and risk prediction of software commits. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’15), pp 966–969.
Rosner, B., Glynn, R. J., & Lee, M. L. T. (2006). The wilcoxon signed rank test for paired comparisons of clustered data. Biometrics, 62(1), 185–192.
Shivaji, S., James Whitehead, E., Akella, R., et al. (2013). Reducing features to improve code change-based bug prediction. IEEE Transactions on Software Engineering, 39(4), 552–569.
Śliwerski, J., Zimmermann, T., & Zeller, A. (2005a). Hatari: Raising risk awareness. ACM SIGSOFT Software Engineering Notes, 30(5), 107–110.
Śliwerski, J., Zimmermann, T., & Zeller, A. (2005b). When do changes induce fixes? ACM SIGSOFT Software Engineering Notes, 30(4), 1–5.
Spearman Rank Correlation Coefficient. (2008). Springer New York, New York, NY, pp 502–505. https://doi.org/10.1007/978-0-387-32833-1_379
Tabassum, S., Minku, L. L., Feng, D., et al. (2020). An investigation of cross-project learning in online just-in-time software defect prediction. In: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20), pp 554–565.
Tan, M., Tan, L., Dara, S., et al. (2015). Online defect prediction for imbalanced data. In: Proceedings of the 37th International Conference on Software Engineering (ICSE’15), pp 99–108.
Taunk, K., De, S., Verma, S., et al. (2019). A brief review of nearest neighbor algorithm for learning and classification. In: 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp 1255–1260. https://doi.org/10.1109/ICCS45141.2019.9065747
Vieira, R., da Silva, A., Rocha, L., et al. (2019). From reports to bug-fix commits: A 10 years dataset of bug-fixing activity from 55 apache’s open source projects. In: Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering. Association for Computing Machinery, New York, NY, USA, PROMISE’19, p 80–89. https://doi.org/10.1145/3345629.3345639
Virtanen, P., Gommers, R., Oliphant, T., et al. (2020). SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17, 261–272.
Wen, M., Wu, R., & Cheung, S. C. (2016). Locus: Locating bugs from software changes. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE’16), pp 262–273.
Wen, M., Wu, R., Liu, Y., et al. (2019). Exploring and exploiting the correlations between bug-inducing and bug-fixing commits. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’19), pp 326–337.
Wen, M., Liu, Y., & Cheung, S. C. (2020). Boosting automated program repair with bug-inducing commits. In: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER’20), pp 77–80.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83. http://www.jstor.org/stable/3001968
Wu, R., Wen, M., Cheung, S. C., et al. (2018). Changelocator: locate crash-inducing changes based on crash reports. Empirical Software Engineering, 23(5), 2866–2900.
Xin, Q., & Reiss, S. P. (2019). Better code search and reuse for better program repair. In: Proceedings of the 6th International Workshop on Genetic Improvement (GI ’19), p 10–17.
Yang, X., Lo, D., Xia, X., et al. (2015). Deep learning for just-in-time defect prediction. In: Proceedings of the IEEE International Conference on Software Quality, Reliability and Security (QRS’15), pp 17–26.
Yin, Z., Yuan, D., Zhou, Y., et al. (2011). How do fixes become bugs? In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11), pp 26–36.
Yue, R., Meng, N., & Wang, Q. (2017). A characterization study of repeated bug fixes. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 422–432. https://doi.org/10.1109/ICSME.2017.16
Zeng, Z., Zhang, Y., Zhang, H., et al. (2021). Deep just-in-time defect prediction: How far are we? In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. Association for Computing Machinery, New York, NY, USA, ISSTA 2021, p 427–438. https://doi.org/10.1145/3460319.3464819
Zhao, R., & Mao, K. (2018). Fuzzy bag-of-words model for document representation. IEEE Transactions on Fuzzy Systems, 26(2), 794–804.
Zibran, M. F., & Roy, C. K. (2012). Ide-based real-time focused search for near-miss clones. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, New York, NY, USA, SAC 2012, pp 1235–1242.
Funding
This research is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery grants, and by an NSERC Collaborative Research and Training Experience (CREATE) grant.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nadim, M., Roy, B. Utilizing source code syntax patterns to detect bug inducing commits using machine learning models. Software Qual J 31, 775–807 (2023). https://doi.org/10.1007/s11219-022-09611-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-022-09611-3