Abstract
Although researchers invested significant effort, the performance of defect prediction in a cross-project setting, i.e., with data that does not come from the same project, is still unsatisfactory. A recent proposal for the improvement of defect prediction is using local models. With local models, the available data is first clustered into homogeneous regions and afterwards separate classifiers are trained for each homogeneous region. Since the main problem of cross-project defect prediction is data heterogeneity, the idea of local models is promising. Therefore, we perform a conceptual replication of the previous studies on local models with a focus on cross-project defect prediction. In a large case study, we evaluate the performance of local models and investigate their advantages and drawbacks for cross-project predictions. To this aim, we also compare the performance with a global model and a transfer learning technique designed for cross-project defect predictions. Our findings show that local models make only a minor difference in comparison to global models and transfer learning for cross-project defect prediction. While these results are negative, they provide valuable knowledge about the limitations of local models and increase the validity of previously gained research results.
Similar content being viewed by others
Notes
With the data used in the study and the success criterion of having both recall and precision of at least 0.75 they achieved a success rate of about 3 %.
The studies by Menzies et al and Bettenburg et al were first published in an initial version at a conference and then in greater detail in a journal publication, leading to five publications for the three studies.
recall, precision, and accuracy all at least 0.75.
The tera-PROMISE repository is the successor of the PROMISE repository, which was previously located at http://promisedata.googlecode.com.
instead of recall, sometimes PD or tpr are used in the literature. PD stands for probability of defect and tpr for true positive rate.
This problem is still very relevant. For example, during the 37th International Conference on Software Engineering held in May 2015, there were five papers on defect prediction (Caglayan et al. 2015; Ghotra et al. 2015; Peters et al. 2015; Tan et al. 2015; Tantithamthavorn et al. 2015). None of them used exactly the same performance measures.
References
Amasaki S, Kawata K, Yokogawa T (2015) Improving cross-project defect prediction methods with data simplification. In: 41st Euromicro conference on software engineering and advanced applications (SEAA)
Bettenburg N, Nagappan M, Hassan A (2012) Think locally, act globally: improving defect and effort prediction models. In: Proceedings of the 9th IEEE working conference on mining software repositories (MSR). IEEE Computer Society
Bettenburg N, Nagappan M, Hassan A (2014) Towards improving statistical modeling of software engineering data: think locally, act globally!. Empir Softw Eng:1–42
Caglayan B, Turhan B, Bener A, Habayeb M, Miranskyy A, Cialini E (2015) Merits of organizational metrics in defect prediction: an industrial replication. In: Proceedings of the 37th international conference on software engineering (ICSE)
Camargo Cruz AE, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement (ESEM). IEEE Computer Society
Carver JC (2010) Towards reporting guidelines for experimental replications: a proposal. In: Proceedings of the international workshop on replication in empirical software engineering
Chidamber S, Kemerer C (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
D’Ambros M, Lanza M, Robbes R (2010) An Extensive Comparison of Bug Prediction Approaches. In: Proceedings of the 7th IEEE working conference on mining software repositories (MSR). IEEE Computer Society
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J of the Royal Statistical Society Series B (Methodological) 39(1):1–38
Drummond C, Holte RC (2003) C4.5, class imbalance and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II
Faloutsos C, Lin KI (1995) Fastmap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. SIGMOD Rec 24(2):163–174
Fraley C, Raftery AE (1999) MCLUST: software for model-based cluster analysis. J Classif 16(2):297–306
Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th international conference on software engineering (ICSE)
Gray D, Bowes D, Davey N, Sun Y, Christianson B (2011) The misuse of the NASA metrics data program data sets for automated software defect prediction. In: Proceedings of the 15th annual conference on evaluation & assessment in software engineering (EASE). IET
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1):10–18
Halstead MH (1977) Elements of software science (operating and programming systems series). Elsevier Science Inc.
Han J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann
Hassan A (2009) Predicting faults using the complexity of code changes. In: IEEE 31st International Conference on Software Engineering, 2009. ICSE 2009, pp 78–88. doi:10.1109/ICSE.2009.5070510
He P, Li B, Ma Y (2014) Towards cross-project defect prediction with imbalanced feature sets. CoRR arXiv:1411.4228
He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19:167–199
He Z, Peters F, Menzies T, Yang Y (2013) Learning from Open-Source projects: an empirical study on defect prediction. In: Proceedings of the 7th international symposium on empirical software engineering and measurement (ESEM)
Henderson-Sellers B (1996) Object-oriented metrics; measures of complexity. Prentice-Hall
Herbold S (2013) Training data selection for cross-project defect prediction. In: Proceedings of the 9th international conference on predictive models in software engineering (PROMISE), ACM
Herbold S (2015) Crosspare: a tool for benchmarking cross-project defect predictions. In: Proceedings of the 4th international workshop on software mining (SoftMine)
Huang L, Port D, Wang L, Xie T, Menzies T (2010) Text mining in supporting software systems risk assurance. In: Proceedings of the 25th IEEE/ACM international conference on automated software engineering(ASE), ACM
Jelihovschi E, Faria J, Allaman I (2014) Scottknott: a package for performing the Scott-Knott clustering algorithm in R. TEMA (São Carlos) 15:3–17
Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561–595
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering (PROMISE), ACM
Kawata K, Amasaki S, Yokogawa T (2015) Improving relevancy filter methods for cross-project defect prediction. In: 3rd international conference on applied computing and information technology/2nd international conference on computational science and intelligence (ACIT-CSI)
Kitchenham B (2008) The role of replications in empirical software engineering word of warning. Empir Softw Eng 13(2):219–221
Kocaguneli E, Menzies T, Keung J, Cok D, Madachy R (2013) Active learning and effort estimation: Finding the essential content of software effort estimation data. IEEE Trans Softw Eng 39(8):1040–1053. doi:10.1109/TSE.2012.88
Kotsiantis S, Kanellopoulos D, Pintelas P (2006) Data preprocessing for supervised leaning. Int J Comp Sci 1(2):111–117
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256
Madeyski L, Jureczko M (2015) Which process metrics can significantly improve defect prediction models? an empirical study. Softw Qual J 23(3):393–422. doi:10.1007/s11219-014-9241-7
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60
McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320
Meneely A, Williams L, Snipes W, Osborne J (2008) Predicting failures with developer networks and social network analysis. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, ACM, New York, NY, USA, SIGSOFT ’08/FSE-16, pp 13–23. doi:10.1145/1453101.1453106
Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proceedings of the 4th international workshop on predictor models in software engineering (PROMISE), ACM
Menzies T, Butcher A, Marcus A, Zimmermann T, Cok D (2011) Local vs. global models for effort estimation and defect prediction. In: Proceedings of the 26th IEEE/ACM international conference on automated software engineering (ASE), IEEE Computer Society
Menzies T, Butcher A, Cok D, Marcus A, Layman L, Shull F, Turhan B, Zimmermann T (2013) Local versus global lessons for defect prediction and effort estimation. IEEE Trans Softw Eng 39(6):822–834
Menzies T, Pape C, Steele C (2014) tera-promise. http://openscience.us/repo/
Nam J, Kim S (2015) Heterogeneous defect prediction. In: Proceedings of the 10th joint meeting of the european software engineering conference (ESEC) and the ACM SIGSOFT symposium on the foundations of software engineering (FSE). doi:10.1145/2786805.2786814
Nam J, Pan SJ, Kim S (2013) Transfer defect learning. In: Proceedings of the 35th international conference on software engineering (ICSE)
Ngomo ACN (2009) Low-bias extraction of domain-specific concepts. Ph.D. Thesis
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Peters F, Menzies T, Gong L, Zhang H (2013) Balancing privacy and utility in cross-company defect prediction. IEEE Trans Softw Eng 39(8):1054–1068
Peters F, Menzies T, Layman L (2015) LACE2: better privacy-preserving data sharing for cross project defect prediction. In: Proceedings of the 37th international conference on software engineering (ICSE)
Premraj R, Herzig K (2011) Network versus code metrics to predict defects: a replication study. In: Proceedings of the international symposium on empirical software engineering and measurement (ESEM)
Rahman F, Posnett D, Devanbu P (2012) Recalling the “imprecision” of cross-project defect prediction. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering (FSE). ACM
Runeson P, Höst M (2009) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):131–164
Scanniello G, Gravino C, Marcus A, Menzies T (2013) Class level fault prediction using software clustering. In: Proceedings of the 28th IEEE/ACM international conference on automated software engineering (ASE). IEEE Computer Society
Schikuta E, Schikuta E (1993) Grid-clustering: a hierarchical clustering method for very large data sets. In: Proceedings of the 15th international conference on pattern recognition
Schölkopf B, Smola AJ (2002) Learning with Kernels. MIT Press
Scott AJ, Knott M (1974) A cluster analysis method for grouping means in the analysis of variance. Biometrics 30(3):507–512
Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. IEEE Trans Softw Eng 39(9):1208–1215
Shull F, Carver J, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Softw Eng 13(2):211–218
Siegmund J, Siegmund N, Apel S (2015) Views on internal and external validity in empirical software engineering. In: 37th International conference on software engineering
Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceedings of the 37th international conference on software engineering (ICSE)
Tantithamthavorn C, McIntosh S, Hassan AE, Ihara A, Matsumoto Ki (2015) The impact of mislabelling on the performance and interpretation of defect prediction models. In: Proceedings of the 37th international conference on software engineering (ICSE)
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the 38th international conference on software engineering. doi:10.1145/2884781.2884857. ACM
Turhan B, Menzies T, Bener A, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14:540–578
van Gestel T, Suykens J, Baesens B, Viaene S, Vanthienen J, Dedene G, de Moor B, Vandewalle J (2004) Benchmarking least squares support vector machine classifiers. Mach Learn 54(1):5–32
Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter language reuse. In: Proceedings of the 4th international workshop on predictor models in software engineering (PROMISE). ACM
Xu R, Wunsch ID (2005) Survey of clustering algorithms. IEEE Trans on Neural Networks 16(3):645–678
Zhang F, Mockus A, Keivanloo I, Zou Y (2014) Towards building a universal defect prediction model. In: Proceedings of the 11th working conference on mining software repositories (MSR). ACM
Zhang F, Mockus A, Keivanloo I, Zou Y (2015) Towards building a universal defect prediction model with rank transformed predictors. Empir Softw Eng:1–39. doi:10.1007/s10664-015-9396-2
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the the 7th joint meeting european software engineering conference (ESEC) and the ACM SIGSOFT symposium on the foundations of software engineering (FSE). ACM, pp 91–100
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Burak Turhan
Appendix A: Metrics
Appendix A: Metrics
1.1 A.1 JSTAT Data
The following metrics are part of the JSTAT data:
-
WMC: weighted method count, number of methods in a class
-
DIT: depth of inheritance tree
-
NOC: number of children
-
CBO: coupling between objects, number of classes coupled to a class
-
RFC: response for class, number of different methods that can be executed if the class receives a message
-
LCOM: lack of cohesion in methods, number of methods not related through the sharing of some of the class fields
-
LCOM3: lack of cohesion in methods after Henderson-Sellers (1996)
-
NPM: number of public methods
-
DAM: data access metric, ratio of private (protected) attributes to total number of attributes in the class
-
MOA: measure of aggregation, number of class fields whose types are user defined classes
-
MFA: measure of functional abstraction, ratio of the number of methods inherited by a class to the total number of methods accessible by the member methods of the class
-
CAM: cohesion among methods of class, relatedness of methods based upon the parameter list of the methods
-
IC: inheritance coupling, number of parent classes to which the class is coupled
-
CBM: coupling between methods, number of new/redefined methods to which all the inherited methods are coupled
-
AMC: average method complexity
-
Ca: afferent couplings
-
Ce: efferent couplings
-
CC: cyclomatic complexity
-
Max(CC): maximum cyclomatic complexity among methods
-
Avg(CC): average cyclomatic complexity among methods
For a detailed explanation see Jureczko and Madeyski (2010).
1.2 A.2 MDP Data
The following metrics are part of the MDP data. This is the common subset of metrics that is obtained by all projects within the MDP data set:
-
LOC_TOTAL: total lines of code
-
LOC_EXECUTABLE: exectuable lines of code
-
LOC_COMMENTS: lines of comments
-
LOC_CODE_AND_COMMENT: lines with comments or code
-
NUM_UNIQUE_OPERATORS: number of unique operators
-
NUM_UNIQUE_OPERANDS: number of unique operands
-
NUM_OPERATORS: total number of operators
-
NUM_OPERANDS: total number of operands
-
HALSTEAD_VOLUME: Halstead volume (see Halstead 1977)
-
HALSTEAD_LENGTH: Halstead length (see Halstead 1977)
-
HALSTEAD_DIFFICULTY: Halstead difficulty (see Halstead 1977)
-
HALSTEAD_EFFORT: Halstead effort (see Halstead 1977)
-
HALSTEAD_ERROR_EST: Halstead Error, also known as Halstead Bug (see Halstead 1977)
-
HALSTEAD_PROG_TIME: Halstead Pro
-
BRANCH_COUNT: Number of branches
-
CYCLOMATIC_COMPLEXITY: Cyclomatic complexity (same as CC in the JSTAT data)
-
DESIGN_COMPLEXITY: design complexity
1.3 A.3 JPROC Data
The following metrics are part of the JPROC data:
-
CBO: coupling between objects
-
DIT: depth of inheritance tree
-
fanIn: number of other classes that reference the class
-
fanOut: number of other classes referenced by the class
-
LCOM: lack of cohesion in methods
-
NOC: number of children
-
RFC: response for class
-
WMC: weighted method count
-
NOA: number of attributes
-
NOAI: number of attributes inherited
-
LOC: lines of code
-
NOM: number of methods
-
NOMI: number of methods inherited
-
NOPRA: number of private attributes
-
NOPRM: number of private methods
-
NOPA: number of public attributes
-
NOPM: number of public methods
-
NR: number of revisions
-
NREF: number of times the file has been refactored
-
NAUTH: number of authors
-
LADD: sum of lines added
-
max(LADD): maximum lines added
-
avg(LADD): average lines added
-
LDEL: sum of lines removed
-
max(LDEL): maximum lines deleted
-
avg(LDEL): average lines deleted
-
CHURN: sum of code churn
-
max(CHURN): maximum code churn
-
avg(CHURN): average code churn
-
AGE: age of the file
-
WAGE: weighted age of the file
For a detailed explanation see D’Ambros et al. (2010).
Rights and permissions
About this article
Cite this article
Herbold, S., Trautsch, A. & Grabowski, J. Global vs. local models for cross-project defect prediction. Empir Software Eng 22, 1866–1902 (2017). https://doi.org/10.1007/s10664-016-9468-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-016-9468-y