Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Do different cross‐project defect prediction methods identify the same defective modules?

Published: 20 April 2020 Publication History

Abstract

Cross‐project defect prediction (CPDP) is needed when the target projects are new projects or the projects have less training data, since these projects do not have sufficient historical data to build high‐quality prediction models. The researchers have proposed many CPDP methods, and previous studies have conducted extensive comparisons on the performance of different CPDP methods. However, to the best of our knowledge, it remains unclear whether different CPDP methods can identify the same defective modules, and this issue has not been thoroughly explored. In this article, we select 12 state‐of‐the‐art CPDP methods, including eight supervised methods and four unsupervised methods. We first compare the performance of these methods in the same experiment settings on five widely used datasets (ie, NASA, SOFTLAB, PROMISE, AEEEM, and ReLink) and rank these methods via the Scott‐Knott test. Final results confirm the competitiveness of unsupervised methods. Then we perform diversity analysis on defective modules for these methods by using the McNemar test. Empirical results verify that different CPDP methods may lead to difference in the modules predicted as defective, especially when the comparison is performed between the supervised methods and unsupervised methods. Finally, we also find there exist a certain number of defective modules, which cannot be correctly identified by any of the CPDP methods or can be correctly identified by only one CPDP method. These findings can be utilized to design more effective methods to further improve the performance of CPDP.

References

[1]
Hall T, Beecham S, Bowes D, Gray D, Counsell S. A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering. 2012;38(6):1276‐1304.
[2]
Kamei Y, Shihab E. Defect prediction: accomplishments and future challenges. In: Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering; 2016:33‐45.
[3]
Li Z, Jing XY, Zhu X. Progress on approaches to software defect prediction. IET Soft. 2018;12(3):161‐175.
[4]
Radjenovic D, Hericko M, Torkar R, Zivkovic A. Software fault prediction metrics: a systematic literature review. Inf Softw Technol. 2013;55(8):1397‐1418.
[5]
Turhan B, Menzies T, Bener AB, Di Stefano J. On the relative value of cross‐company and within‐company data for defect prediction. Empir Softw Eng. 2009;14(5):540‐578.
[6]
Pan SJ, Yang Q. Asurvey on transfer learning. IEEE Trans Knowl Data Eng. 2010;22(10):1345‐1359.
[7]
Hosseini S, Turhan B, Gunarathna D. A systematic literature review and meta‐analysis on cross project defect prediction. IEEE Trans Softw Eng. 2019;45(2):111‐147.
[8]
Nam J, Kim S. CLAMI: defect prediction on unlabeled datasets. In: Proceedings of International Conference on Automated Software Engineering; 2015:452‐463.
[9]
Zhang F, Zheng Q, Zou Y, Hassan AE. Cross‐project defect prediction using a connectivity‐based unsupervised classifier. In: Proceedings of the International Conference on Software Engineering; 2016:309‐320.
[10]
Zhou Y, Yang Y, Lu H, et al. How far we have progressed in the journey? An examination of cross‐project defect prediction. ACM Trans Softw Eng Methodol. 2018;27(1):1:1‐1:51.
[11]
Chen X, Zhang D, Zhao Y, Cui Z, Ni C. Software defect number prediction: unsupervised vs supervised methods. Inf Softw Technol. 2019;106:161‐181.
[12]
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B. Cross‐project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of Joint Meeting of the European Software Engineering Conference and the Symposium on Foundations of Software Engineering; 2009:91‐100.
[13]
He Z, Shu F, Yang Y, Li M, Wang Q. An investigation on the feasibility of cross‐project defect prediction. Autom. Softw Eng. 2012;19(2):167‐199.
[14]
Briand LC, Melo WL, Wüst J. Assessing the applicability of fault‐proneness models across object‐oriented software projects. IEEE transactions on Software Engineering. 2002;7:706‐720.
[15]
Jureczko M, Madeyski L. Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the International Conference on Predictive Models in Software Engineering; 2010:9:1‐9:10.
[16]
Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N. An empirical study of just‐in‐time defect prediction using cross‐project models. In: Proceedings of theWorking Conference on Mining Software Repositories; 2014:172‐181.
[17]
Kamei Y, Fukushima T, McIntosh S, Yamashita K, Ubayashi N, Hassan AE. Studying just‐in‐time defect prediction using cross‐project models. Empir Softw Eng. 2016;21(5):2072‐2106.
[18]
Kamei Y, Shihab E, Adams B, et al. A large‐scale empirical study of just‐in‐time quality assurance. IEEE Trans Software Eng. 2013;39(6):757‐773.
[19]
Rahman F, Posnett D, Devanbu P. Recalling the imprecision of cross‐project defect prediction. In: Proceedings of the International Symposium on Foundations of Software Engineering; 2012:61:1‐61:11.
[20]
Turhan B. On the dataset shift problem in software engineering prediction models. Empir Softw Eng. 2012;17(1–2):62‐74.
[21]
Herbold S, Trautsch A, Grabowski J. A comparative study to benchmark cross‐project defect prediction approaches. IEEE Trans Softw Eng. 2018;44(9):811‐833.
[22]
Camargo Cruz AE, Ochimizu K. Towards logistic regression models for predicting fault‐prone code across software projects. In: Proceedings of the International Symposium on Empirical Software Engineering and Measurement; 2009:460‐463.
[23]
Nam J, Pan SJ, Kim S, Transfer defect learning. In: Proceedings of the International Conference on Software Engineering; 2013:382‐391.
[24]
Zhang F, Keivanloo I, Zou Y. Data transformation in cross‐project defect prediction. Empir Softw Eng. 2017;22(6):3186‐3218.
[25]
Liu C, Yang D, Xia X, Yan M, Zhang X. A two‐phase transfer learning model for cross‐project defect prediction. Inf Softw Technol. 2019;107:125‐136.
[26]
Krishna R, Menzies T, Fu W. Too much automation? The bellwether effect and its implications for transfer learning. In: Proceedings of International Conference on Automated Software Engineering; 2016:122‐131.
[27]
Krishna R, Menzies T. Bellwethers: A baseline method for transfer Learning. IEEE Transactions on Software Engineering. 2018. https://doi.org/10.1109/TSE.2018.2821670. https://ieeexplore.ieee.org/document/8329264
[28]
Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D. Local vs. global models for effort estimation and defect prediction. In: Proceedings of International Conference on Automated Software Engineering; 2011:343‐351.
[29]
Menzies T, Butcher A, Cok D, et al. Local versus global lessons for defect prediction and effort estimation. IEEE Trans Softw Eng. 2013;39(6):822‐834.
[30]
Bettenburg N, Nagappan M, Hassan AE. Think locally, act globally: improving defect and effort prediction models. In: Proceedings of theWorking Conference on Mining Software Repositories; 2012:60‐69.
[31]
Bettenburg N, Nagappan M, Hassan AE. Towards improving statistical modeling of software engineering data: think locally. Act Glob Empir Softw Eng. 2015;20(2):294‐335.
[32]
Herbold S, Trautsch A, Grabowski J. Global vs. local models for cross‐project defect prediction. Empir Softw Eng. 2017;22(4):1866‐1902.
[33]
Peters F, Menzies T, Marcus A. Better cross company defect prediction. In: Proceedings of theWorking Conference on Mining Software Repositories; 2013:409‐418.
[34]
Herbold S. Training data selection for cross‐project defect prediction. In: Proceedings of the International Conference on Predictive Models in Software Engineering; 2013:6:1‐6:10.
[35]
Hosseini S, Turhan B, Mäntylä M. Search based training data selection for cross project defect prediction. In: Proceedings of the International Conference on Predictive Models in Software Engineering; 2016:3:1‐3:10.
[36]
Hosseini S, Turhan B, Mäntylä M. A benchmark study on the effectiveness of search‐based data selection and feature selection for cross project defect prediction. Inf Softw Technol. 2018;95:296‐312.
[37]
Bin Y, Zhou K, Lu H, Zhou Y, Xu B. Training data selection for cross‐project defection prediction: which approach is better?. In: Proceedings of the International Symposium on Empirical Software Engineering and Measurement; 2017:354‐363.
[38]
Xu Z, Li S, Tang Y, et al. Cross version defect prediction with representative data via sparse subset selection. In: Proceedings of the Conference on Program Comprehension; 2018:132‐143.
[39]
Ma Y, Luo G, Zeng X, Chen A. Transfer learning for cross‐company software defect prediction. Inf Softw Technol. 2012;54(3):248‐256.
[40]
Poon WN, Bennin KE, Huang J, Phannachitta P, Keung JW. Cross‐project defect prediction using a credibility theory based naive Bayes classifier. In: Proceedings of the International Conference on Software Quality, Reliability and Security; 2017:434‐441.
[41]
Ni C, Chen X, Wu F, Shen Y, Gu Q. An empirical study on Pareto based multi‐objective feature selection for software defect prediction. J Syst Softw. 2019;152:215‐238.
[42]
Liu W, Liu S, Gu Q, Chen J, Chen X, Chen D. Empirical studies of a two‐stage data preprocessing approach for software fault prediction. IEEE Trans Reliab. 2016;65(1):38‐53.
[43]
Liu S, Chen X, Liu W, Chen J, Gu Q, Chen D. FECAR: a feature selection framework for software defect prediction. In: Proceedings of the Annual Computer Software and Applications Conference; 2014:426‐435.
[44]
Pan SJ, Tsang IW, Kwok JT, Yang Q. Domain adaptation via transfer component analysis. IEEE Trans Neural Netw. 2011;22(2):199‐210.
[45]
He P, Li B, Liu X, Chen J, Ma Y. An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol. 2015;59:170‐190.
[46]
Ni C, Liu W, Gu Q, Chen X, Chen D. FeSCH: a feature selection method using clusters of hybrid‐data for cross‐project defect prediction. In: Proceedings of Annual Computer Software and Applications Conference; 2017:51‐56.
[47]
Ni C, Liu WS, Chen X, Gu Q, Chen DX, Huang QG. A cluster based feature selection method for cross‐project software defect prediction. J Comput Sci Technol. 2017;32(6):1090‐1107.
[48]
Panichella A, Oliveto R, De Lucia A. Cross‐project defect prediction models: L'union fait la force. In: Proceedings of the International Conferencce on Software Maintenance, Reengineering and Reverse Engineering; 2014:164‐173.
[49]
Zhang Y, Lo D, Xia X, Sun J. An empirical study of classifier combination for cross‐project defect prediction. In: Proceedings of Annual Computer Software and Applications Conference; 2015:264‐269.
[50]
Ryu D, Choi O, Baik J. Value‐cognitive boosting with a support vector machine for cross‐project defect prediction. Empir Softw Eng. 2016;21(1):43‐71.
[51]
Ryu D, Jang JI, Baik J. A hybrid instance selection using nearest‐neighbor for cross‐project defect prediction. J Comput Sci Technol. 2015;30(5):969‐980.
[52]
Ryu D, Jang JI, Baik J. A transfer cost‐sensitive boosting approach for cross‐project defect prediction. Softw Qual J. 2017;25(1):235‐272.
[53]
Limsettho N, Bennin KE, Keung JW, Hata H, Matsumoto K. Cross project defect prediction using class distribution estimation and oversampling. Inf Softw Technol. 2018;100:87‐102.
[54]
Jing XY, Wu F, Dong X, Xu B. An improved SDA based defect prediction framework for both within‐project and cross‐project class‐imbalance problems. IEEE Trans Softw Eng. 2017;43(4):321‐339.
[55]
Wang S, Liu T, Nam J, Tan L. Deep semantic feature learning for software defect prediction. IEEE Trans Softw Eng. 2018:1‐1.
[56]
Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S. Multi‐objective cross‐project defect prediction. In: Proceedings of the International Conference on Software Testing, Verification and Validation; 2013:252‐261.
[57]
Canfora G, Lucia AD, Penta MD, Oliveto R, Panichella A, Panichella S. Defect prediction as a multiobjective optimization problem. Softw Test Verif Rel. 2015;25(4):426‐459.
[58]
Ryu D, Baik J. Effective multi‐objective naïve Bayes learning for cross‐project defect prediction. Appl Soft Comput. 2016;49:1062‐1077.
[59]
Feng Wang JH. A top‐k learning to rank approach to cross‐project software defect prediction. In: Proceedings of the Asia‐Pacific Software Engineering Conference; 2018.
[60]
Yang X, Wen W. Ridge and Lasso regression models for cross‐version defect prediction. IEEE Trans Rel. 2018;67(3):885‐896.
[61]
Nam J, Kim S. Heterogeneous defect prediction. In: Proceedings of Joint Meeting of the European Software Engineering Conference and the Symposium on Foundations of Software Engineering; 2015:508‐519.
[62]
Nam J, Fu W, Kim S, Menzies T, Tan L. Heterogeneous defect prediction. IEEE Trans Softw Eng. 2018;44(9):874‐896.
[63]
Yu Q, Jiang S, Zhang Y. A feature matching and transfer approach for cross‐company defect prediction. J Syst Softw. 2017;132:366‐378.
[64]
Jing X, Wu F, Dong X, Qi F, Xu B. Heterogeneous cross‐company defect prediction by unified metric representation and CCA‐based transfer learning. In: Proceedings of Joint Meeting of the European Software Engineering Conference and the Symposium on Foundations of Software Engineering; 2015:496‐507.
[65]
Li Z, Jing XY, Wu F, Zhu X, Xu B, Ying S. Cost‐sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction. Autom Softw Eng. 2018;25(2):201‐245.
[66]
Li Z, Jing XY, Zhu X, Zhang H. Heterogeneous defect prediction through multiple kernel learning and ensemble learning. In: Proceedings of the International Conference on Software Maintenance and Evolution; 2017:91‐102.
[67]
Turhan B, Mısırlı AT, Bener A. Empirical evaluation of the effects of mixed project data on learning defect predictors. Inf Softw Technol. 2013;55(6):1101‐1118.
[68]
Xia X, David L, Pan SJ, Nagappan N, Wang X. Hydra: massively compositional model for cross‐project defect prediction. IEEE Trans Softw Eng. 2016;42(10):977‐998.
[69]
Chen L, Fang B, Shang Z, Tang Y. Negative samples reduction in cross‐company software defects prediction. Inf Softw Technol. 2015;62:67‐77.
[70]
Zhang ZW, Jing XY, Wang TJ. Label propagation based semi‐supervised learning for software defect prediction. Autom Softw Eng. 2017;24(1):47‐69.
[71]
Wu F, Jing XY, Sun Y, et al. Cross‐project and within‐project semisupervised software defect prediction: a unified approach. IEEE Trans Reliab. 2018;67(2):581‐597.
[72]
Peters F, Menzies T. Privacy and utility for defect prediction: experiments with morph. In: Proceedings of the International Conference on Software Engineering; 2012:189‐199.
[73]
Peters F, Menzies T, Gong L, Zhang H. Balancing privacy and utility in cross‐company defect prediction. IEEE Trans Softw Eng. 2013;39(8):1054‐1068.
[74]
Peters F, Menzies T, Layman L. LACE2: Better privacy‐preserving data sharing for cross project defect prediction. In: Proceedings of the International Conference on Software Engineering; 2015:801‐811.
[75]
Fan Y, Lv C, Zhang X, Zhou G, Zhou Y. The utility challenge of privacy‐preserving data‐sharing in cross‐company defect prediction: an empirical study of the CLIFF & MORPH algorithm. In: Proceedings of the International Conference on Software Maintenance and Evolution; 2017:80‐90.
[76]
Zhang D, Chen X, Cui Z, Ju X. Software defect prediction model sharing under differential privacy. In: Proceedings of the International Conference on Advanced and Trusted Computing; 2018:1547‐1554.
[77]
Amasaki S. On applicability of cross‐project defect prediction method for multi‐versions projects. In: Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering; 2017:93‐96.
[78]
Amasaki S. Cross‐version defect prediction using cross‐project defect prediction approaches: does it work?. In: Proceedings of the International Conference on Predictive Models and Data Analytics in Software Engineering; 2018:32‐41.
[79]
Bennin KE, Toda K, Kamei Y, Keung J, Monden A, Ubayashi N. Empirical evaluation of cross‐release effort‐aware defect prediction models. In: Proceedings of the International Conference on Software Quality, Reliability and Security; 2016:214‐221.
[80]
Bowes D, Hall T, Petrić J. Software defect prediction: do different classifiers find the same defects?. Soft Qual J. 2017;26(2):525‐552.
[81]
Watanabe S, Kaiya H, Kaijiri K. Adapting a fault prediction model to allow inter language reuse. In: Proceedings of the International Conference on Predictive Models in Software Engineering; 2008:19‐24.
[82]
D'Ambros M, Lanza M, Robbes R. Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng. 2012;17(4–5):531‐577.
[83]
Wu R, Zhang H, Kim S, Cheung SC. Relink: recovering links between bugs and changes. In: Proceedings of Joint Meeting of the European Software Engineering Conference and the Symposium on Foundations of Software Engineering; 2011:15‐25.
[84]
Menzies T, Greenwald J, Frank A. Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng. 2007;33(1):2‐13.
[85]
Shepperd M, Song Q, Sun Z, Mair C. Data quality: some comments on the nasa software defect datasets. IEEE Trans Softw Eng. 2013;39(1):1208‐1215.
[86]
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K. Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the International Conference on Software Engineering; 2016:321‐332.
[87]
Dietterich TG. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 1998;10(7):1895‐1923.
[88]
Jelihovschi EG, Faria JC, Allaman IB. ScottKnott: a package for performing the Scott‐Knott clustering algorithm in R. TEMA (São Carlos). 2014;15(1):3‐17.
[89]
Ghotra B, Mcintosh S, Hassan AE. Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the International Conference on Software Engineering; 2015:789‐800.
[90]
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 1995;57(1):289‐300.
[91]
Petrić J, Bowes D, Hall T, Christianson B, Baddoo N. Building an ensemble for software defect prediction based on diversity selection. In: Proceedings of the International Symposium on Empirical Software Engineering and Measurement; 2016:46.
[92]
Śliwerski J, Zimmermann T, Zeller A. When do changes induce fixes?. In: Proceedings of the InternationalWorkshop on Mining Software Repositories; 2005:1‐5.
[93]
Chen X, Zhao Y, Wang Q, Yuan Z. MULTI: Multi‐objective effort‐aware just‐in‐time software defect prediction. Inf Softw Technol. 2018;93:1‐13.
[94]
Huang Q, Xia X, Lo D. Supervised vs unsupervised models: a holistic look at effort‐aware just‐in‐time defect prediction. In: Proceedings of the International Conference on Software Maintenance and Evolution; 2017:159‐170.

Cited By

View all
  • (2024)DevMuT: Testing Deep Learning Framework via Developer Expertise-Based MutationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695523(1533-1544)Online publication date: 27-Oct-2024
  • (2022)Evolutionary Measures for Object-oriented Projects and Impact on the Performance of Cross-version Defect PredictionProceedings of the 13th Asia-Pacific Symposium on Internetware10.1145/3545258.3545275(192-201)Online publication date: 11-Jun-2022
  • (2022)Effort-aware cross-project just-in-time defect prediction framework for mobile appsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-021-1013-516:6Online publication date: 1-Dec-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Software: Evolution and Process
Journal of Software: Evolution and Process  Volume 32, Issue 5
May 2020
109 pages
ISSN:2047-7473
EISSN:2047-7481
DOI:10.1002/smr.v32.5
Issue’s Table of Contents

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 20 April 2020

Author Tags

  1. cross‐project defect prediction
  2. diversity analysis
  3. empirical study
  4. software defect prediction

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)DevMuT: Testing Deep Learning Framework via Developer Expertise-Based MutationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695523(1533-1544)Online publication date: 27-Oct-2024
  • (2022)Evolutionary Measures for Object-oriented Projects and Impact on the Performance of Cross-version Defect PredictionProceedings of the 13th Asia-Pacific Symposium on Internetware10.1145/3545258.3545275(192-201)Online publication date: 11-Jun-2022
  • (2022)Effort-aware cross-project just-in-time defect prediction framework for mobile appsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-021-1013-516:6Online publication date: 1-Dec-2022

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media