Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Cross-project smell-based defect prediction

Published: 01 November 2021 Publication History

Abstract

Defect prediction is a technique introduced to optimize the testing phase of the software development pipeline by predicting which components in the software may contain defects. Its methodology trains a classifier with data regarding a set of features measured on each component from the target software project to predict whether the component may be defective or not. However, suppose the defective information is not available in the training set. In that case, we need to rely on an alternate approach that uses the training set of external projects to train the classifier. This approached is called cross-project defect prediction. Bad code smells are a category of features that have been previously explored in defect prediction and have been shown to be a good predictor of defects. Code smells are patterns of poor development in the code and indicate flaws in its design and implementation. Although they have been previously studied in the context of defect prediction, they have not been studied as features for cross-project defect prediction. In our experiment, we train defect prediction models for 100 projects to evaluate the predictive performance of the bad code smells. We implemented four cross-project approaches known in the literature and compared the performance of 37 smells with 56 code metrics, commonly used for defect prediction. The results show that the cross-project defect prediction models trained with code smells significantly improved 6.50% on the ROC AUC compared against the code metrics.

References

[1]
Bal PR (2018) Cross project software defect prediction using extreme learning machine: an ensemble based study. In: Proceedings of the 13th international conference on software technologies, SCITEPRESS - Science and Technology Publications, Porto, Portugal, pp 354–361, http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0006886503540361
[2]
Booch G, Booch G (eds) (2007) Object-oriented analysis and design with applications, 3rd edn. The Addison-Wesley object technology series, Addison-Wesley, Upper Saddle River, NJ, p oCLC: ocm80020116
[3]
Borg M, Svensson O, Berg K, Hansson D (2019) SZZ unleashed: an open implementation of the SZZ algorithm - featuring example usage in a study of just-in-time bug prediction for the Jenkins project. In: Proceedings of the 3rd ACM SIGSOFT international workshop on machine learning techniques for software quality evaluation - MaLTeSQuE 2019, ACM Press, Tallinn, Estonia, pp 7–12., http://dl.acm.org/citation.cfm?doid=3340482.3342742
[4]
Brito e Abreu F, Carapuça R, (1994) In: Zenodo McLean, VA, USA, DOI, (eds) Object-Oriented Software Engineering: Measuring And Controlling The Development Process. In: 4th International. publisher: Zenodo, p,
[5]
Brown WJ AntiPatterns: refactoring software, architectures, and projects in crisis 1998 New York Wiley
[6]
Cedrim D, Sousa L (2018) opus-research/organic. https://github.com/opus-research/organic
[7]
Chawla NV, Bowyer KW, Hall LO, and Kegelmeyer WP SMOTE: synthetic minority over-sampling technique J Artif Intel Res 2002 16 321-357
[8]
Chidamber S and Kemerer C A metrics suite for object oriented design IEEE Trans Softw Eng 1994 20 6 476-493
[9]
Cruz AEC, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: 2009 3rd International Symposium on Empirical Software Engineering and Measurement, IEEE, Lake Buena Vista, FL, USA, pp 460–463., http://ieeexplore.ieee.org/document/5316002/
[10]
Fowler M and Beck K Refactoring: improving the design of existing code 1999 Addison-Wesley, Reading, MA The Addison-Wesley object technology series
[11]
Goel L, Damodaran D, Khatri SK, Sharma M (2017) A literature review on cross project defect prediction. In: 2017 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (UPCON), IEEE, Mathura, pp 680–685, http://ieeexplore.ieee.org/document/8251131/
[12]
Guo J, Rahimi M, Cleland-Huang J, Rasin A, Hayes JH, Vierhauser M (2016) Cold-start software analytics. In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, Austin Texas, pp 142–153., https://dl.acm.org/doi/10.1145/2901739.2901740
[13]
Halstead MH (1977) Elements of software science. No. 2 in Operating and programming systems series, Elsevier, New York
[14]
Hassan AE (2009) Predicting faults using the complexity of code changes, In: 2009 IEEE 31st International Conference on Software Engineering, IEEE, Vancouver, BC, Canada, pp 78–88, http://ieeexplore.ieee.org/document/5070510/
[15]
Herbold S, Trautsch A, and Grabowski J A comparative study to benchmark cross-project defect prediction approaches IEEE Trans Softw Eng 2018 44 9 811-833
[16]
Hosseini S, Turhan B, and Gunarathna D A systematic literature review and meta-analysis on cross project defect prediction IEEE Trans Softw Eng 2019 45 2 111-147
[17]
Huang GB, Zhu QY, and Siew CK Extreme learning machine: theory and applications Neurocomputing 2006 70 1–3 489-501
[18]
Ivanov R, Veach R, Bludov P, Paikin A, Dubinin I, Selkin A, Lisetskii V, Burn O, Kordas M, Diachenko R, Izmailov B, Yaroslavtsev D, Sopov I, Kühne L, Giles R, Sukhodolsky O, Studman M, Schneeberger T (2021) checkstyle – Checkstyle 8.41.1. https://checkstyle.sourceforge.io/
[19]
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering - PROMISE ’10, ACM Press, Timişoara, Romania
[20]
Kitchenham BA, Mendes E, and Travassos GH Cross versus within-company cost estimation studies: a systematic review IEEE Trans Softw Eng 2007 33 5 316-329
[21]
Kotte A, Qyser D, and Moiz AA A survey of different machine learning models for software defect testing Eur J Mol Clin Med 2021 7 9 3256-3268
[22]
Li Z, Jing XY, and Zhu X Progress on approaches to software defect prediction IET Softw 2018 12 3 161-175
[23]
McCabe T A complexity measure IEEE Trans Softw Eng SE 1976 2 4 308-320
[24]
[25]
Moser R, Pedrycz W, Succi G (2008) Analysis of the reliability of a subset of change metrics for defect prediction. In: Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement - ESEM ’08, ACM Press, Kaiserslautern, Germany, http://portal.acm.org/citation.cfm?doid=1414004.1414063
[26]
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on Software engineering - ICSE ’05, ACM Press, St. Louis, MO, USA, p 284, http://portal.acm.org/citation.cfm?doid=1062455.1062514
[27]
Paterson D, Campos J, Abreu R, Kapfhammer GM, Fraser G, McMinn P (2019) An empirical study on the use of defect prediction for test case prioritization. In: 2019 12th IEEE conference on software testing, validation and verification (ICST), IEEE, Xi’an, China, pp 346–357, https://ieeexplore.ieee.org/document/8730206/
[28]
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, and Duchesnay E Scikit-learn: machine learning in python J Mach Learn Res 2011 12 2825-2830
[29]
Piotrowski P and Madeyski L Poniszewska-Marańda A, Kryvinska N, Jarzbek S, and Madeyski L Software defect prediction using bad code smells: a systematic literature review Data-centric business and applications: towards software development (volume 4) 2020 Cham Springer International Publishing 77-99
[30]
Porto F, Minku L, Mendes E, Simao A (2019) A systematic study of cross-project defect prediction with meta-learning. arXiv:1802.06025 [cs]
[31]
Radjenović D, Heričko M, Torkar R, Živkovič A (2013) Software fault prediction metrics: A systematic literature review. Information and Software Technology 55(8):1397–1418., https://linkinghub.elsevier.com/retrieve/pii/S0950584913000426
[32]
Rathore SS and Kumar S A study on software fault prediction techniques Artif Intell Rev 2019 51 2 255-327
[33]
Sharma T (2018) DesigniteJava.
[34]
Suryanarayana G, Samarthyam G, Sharma T (2015) Refactoring for software design smells: managing technical debt. Elsevier, Morgan Kaufmann, Morgan Kaufmann is an imprint of Elsevier, Amsterdam, Boston
[35]
Taba SES, Khomh F, Zou Y, Hassan AE, Nagappan M (2013) Predicting Bugs Using Antipatterns, In: 2013 IEEE International Conference on Software Maintenance, IEEE, Eindhoven, Netherlands, pp 270–279, http://ieeexplore.ieee.org/document/6676898/
[36]
Turhan B, Menzies T, Bener AB, and Di Stefano J On the relative value of cross-company and within-company data for defect prediction Empir Softw Eng 2009 14 5 540-578
[37]
Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter languagereuse. In: Proceedings of the 4th international workshop on Predictor models in software engineering - PROMISE ’8, ACM Press, Leipzig, Germany, p 19, http://portal.acm.org/citation.cfm?doid=1370788.1370794
[38]
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering on European software engineering conference and foundations of software engineering symposium - ESEC/FSE ’09, ACM Press, Amsterdam, The Netherlands, p 91, http://portal.acm.org/citation.cfm?doid=1595696.1595713

Cited By

View all
  • (2025)Cross-project defect prediction based on autoencoder with dynamic adversarial adaptationApplied Intelligence10.1007/s10489-024-06087-555:5Online publication date: 1-Apr-2025
  • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
  • (2024)Parameter tuning for software fault prediction with different variants of differential evolutionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121251237:PCOnline publication date: 1-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Soft Computing - A Fusion of Foundations, Methodologies and Applications
Soft Computing - A Fusion of Foundations, Methodologies and Applications  Volume 25, Issue 22
Nov 2021
628 pages
ISSN:1432-7643
EISSN:1433-7479
Issue’s Table of Contents

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 November 2021
Accepted: 09 September 2021

Author Tags

  1. Cross-project defect prediction
  2. Defect prediction
  3. Code smell
  4. Mining software repositories
  5. Software quality
  6. Software engineering

Qualifiers

  • Research-article

Funding Sources

  • Cyber Security Research Center at the Ben-Gurion University of the Negev

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Cross-project defect prediction based on autoencoder with dynamic adversarial adaptationApplied Intelligence10.1007/s10489-024-06087-555:5Online publication date: 1-Apr-2025
  • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
  • (2024)Parameter tuning for software fault prediction with different variants of differential evolutionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121251237:PCOnline publication date: 1-Mar-2024
  • (2023)Issues-Driven features for software fault predictionInformation and Software Technology10.1016/j.infsof.2022.107102155:COnline publication date: 1-Mar-2023
  • (2023)Implicit and explicit mixture of experts models for software defect predictionSoftware Quality Journal10.1007/s11219-023-09640-631:4(1331-1368)Online publication date: 1-Dec-2023
  • (2022)Computational intelligence in software defects rules discoverySoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-021-06646-926:14(6925-6939)Online publication date: 1-Jul-2022

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media