research-article

Replication of defect prediction studies: problems, pitfalls and recommendations

Author:

Thilo MendeAuthors Info & Claims

PROMISE '10: Proceedings of the 6th International Conference on Predictive Models in Software Engineering

Article No.: 5, Pages 1 - 10

https://doi.org/10.1145/1868328.1868336

Published: 12 September 2010 Publication History

Abstract

Background: The main goal of the PROMISE repository is to enable reproducible, and thus verifiable or refutable research. Over time, plenty of data sets became available, especially for defect prediction problems.

Aims: In this study, we investigate possible problems and pitfalls that occur during replication. This information can be used for future replication studies, and serve as a guideline for researchers reporting novel results.

Method: We replicate two recent defect prediction studies comparing different data sets and learning algorithms, and report missing information and problems.

Results: Even with access to the original data sets, replicating previous studies may not lead to the exact same results. The choice of evaluation procedures, performance measures and presentation has a large influence on the reproducibility. Additionally, we show that trivial and random models can be used to identify overly optimistic evaluation measures.

Conclusions: The best way to conduct easily reproducible studies is to share all associated artifacts, e.g. scripts and programs used. When this is not an option, our results can be used to simplify the replication task for other researchers.

References

[1]

}}E. Arisholm, L. C. Briand, and M. Fuglerud. Data mining techniques for building fault-proneness models in telecom java software. In ISSRE '07: Proceedings of the 18th IEEE International Symposium on Software Reliability Engineering, pages 215--224. IEEE Press, 2007.

Digital Library

[2]

}}E. Arisholm, L. C. Briand, and E. B. Johannessen. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Technical Report TR 2008--06, Simula Research Laboratory, 2008.

[3]

}}E. Dimitriadou, K. Hornik, F. Leisch, D. Meyer, and A. Weingessel. e1071: Misc functions of the department of statistics (e1071), TU Wien, 2009. R package version 1.5--19.

[4]

}}C. Drummond. Replicability is not reproducibility: Nor is it good science. In Proceedings of the Twenty-Sixth International Conference on Machine Learning: Workshop on Evaluation Methods for Machine Learning IV, 2009.

[5]

}}M. D'Ambros, M. Lanza, and R. Robbes. An extensive comparison of bug prediction approaches. In 7th IEEE Working Conference on Mining Software Repositories, 2010.

[6]

}}K. El Emam, S. Benlarbi, N. Goel, and S. N. Rai. The confounding effect of class size on the validity of object-oriented metrics. IEEE Transactions on Software Engineering, 27(7):630--650, 2001.

Digital Library

[7]

}}G. Gay, T. Menzies, B. Cukic, and B. Turhan. How to build repeatable experiments. In PROMISE '09: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, pages 1--9, New York, NY, USA, 2009. ACM.

Digital Library

[8]

}}A. E. Hassan. Predicting faults using the complexity of code changes. In International Conference on Software Engineering, pages 78--88, Washington, DC, USA, 2009. IEEE Computer Society.

Digital Library

[9]

}}Y. Jiang, B. Cukic, and Y. Ma. Techniques for evaluating fault prediction models. Empirical Software Engineering, 13(5):561--595, 2008.

Digital Library

[10]

}}B. Kitchenham and E. Mendes. Why comparative effort prediction studies may be invalid. In PROMISE '09: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, pages 1--5, New York, NY, USA, 2009. ACM.

Digital Library

[11]

}}A. G. Koru, K. E. Emam, D. Zhang, H. Liu, and D. Mathew. Theory of relative defect proneness. Empirical Software Engineering, 13(5):473--498, 2008.

Digital Library

[12]

}}S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4):485--496, 2008.

Digital Library

[13]

}}T. Mende and R. Koschke. Revisiting the evaluation of defect prediction models. In PROMISE '09: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, pages 1--10, New York, NY, USA, 2009.

Digital Library

[14]

}}T. Mende and R. Koschke. Effort-aware defect prediction models. In European Conference on Software Maintenance and Reengineering, pages 109--118, 2010.

Digital Library

[15]

}}T. Menzies, A. Dekhtyar, J. Distefano, and J. Greenwald. Problems with precision: A response to "comments on 'data mining static code attributes to learn defect predictors'". IEEE Transactions on Software Engineering, 33(9):637--640, 2007.

Digital Library

[16]

}}T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1):2--13, 2007.

[17]

}}T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener. Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering, 2010.

Digital Library

[18]

}}N. Nagappan, T. Ball, and A. Zeller. Mining metrics to predict component failures. In International Conference on Software Engineering, November 2006.

Digital Library

[19]

}}N. Ohlsson and H. Alberg. Predicting fault-prone software modules in telephone switches. IEEE Transactions on Software Engineering, 22(12):886--894, 1996.

Digital Library

[20]

}}T. Ostrand, E. Weyuker, and R. Bell. Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering, 31(4):34--355, 2005.

Digital Library

[21]

}}R Development Core Team. R: A Language and Environment for Statistical Computing.

[22]

}}T. Sing, O. Sander, N. Beerenwinkel, and T. Lengauer. ROCR: visualizing classifier performance in R. Bioinformatics, 21(20):3940--3941, 2005.

Digital Library

[23]

}}A. Tosun, B. Turhan, and A. Bener. Validation of network measures as indicators of defective modules in software systems. In PROMISE '09: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, pages 1--9, New York, NY, USA, 2009. ACM.

Digital Library

[24]

}}H. Zhang and X. Zhang. Comments on "data mining static code attributes to learn defect predictors". IEEE Transactions on Software Engineering, 33(9):635--637, 2007.

Digital Library

[25]

}}T. Zimmermann and N. Nagappan. Predicting defects using network analysis on dependency graphs. In International Conference on Software Engineering, pages 531--540, New York, NY, USA, 2008. ACM.

Digital Library

[26]

}}T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In Predictor Models in Software Engineering, May 2007.

Digital Library

Cited By

Khyber Ali SWahid FBaseer SAlkhayyat AAl-Radaei A(2024)Smell-Aware Bug ClassificationIEEE Access10.1109/ACCESS.2023.333517512(14061-14082)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3335175
Cech TAtzberger DScheibel WMisra SDöllner J(2023)Outlier Mining Techniques for Software Defect PredictionSoftware Quality: Higher Software Quality through Zero Waste Development10.1007/978-3-031-31488-9_3(41-60)Online publication date: 13-May-2023
https://doi.org/10.1007/978-3-031-31488-9_3
Jiarpakdee JTantithamthavorn CDam HGrundy J(2022)An Empirical Study of Model-Agnostic Techniques for Defect Prediction ModelsIEEE Transactions on Software Engineering10.1109/TSE.2020.298238548:1(166-185)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/TSE.2020.2982385
Show More Cited By

Index Terms

Recommendations

Reproducibility and replicability of software defect prediction studies
Abstract
Context: Replications are an important part of scientific disciplines. Replications test the credibility of original studies and can separate true results from those that are unreliable.

Objective: In this paper we ...
Branch replication scheme: A new model for data replication in large scale data grids

Data replication is a practical and effective method to achieve efficient and fault-tolerant data access in grids. Traditionally, data replication schemes maintain an entire replica in each site where a file is replicated, providing a read-only model. ...
Availability Prediction Based Replication Strategies for Grid Environments
CCGRID '10: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing

Volunteer-based grid computing resources are characteristically volatile and frequently become unavailable due to the autonomy that owners maintain over them. This resource volatility has significant influence on the applications the resources host. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

PROMISE '10: Proceedings of the 6th International Conference on Predictive Models in Software Engineering

September 2010

195 pages

ISBN:9781450304047

DOI:10.1145/1868328

General Chair:
Tim Menzies
West Virginia University
,
Program Chair:
Gunes Koru
University of Maryland Baltimore County

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

Promise '10

Promise '10: The 6th International Conference on Predictive Models in Software Engineering

September 12 - 13, 2010

Timişoara, Romania

Acceptance Rates

PROMISE '10 Paper Acceptance Rate 19 of 53 submissions, 36%;

Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
450
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Khyber Ali SWahid FBaseer SAlkhayyat AAl-Radaei A(2024)Smell-Aware Bug ClassificationIEEE Access10.1109/ACCESS.2023.333517512(14061-14082)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3335175
Cech TAtzberger DScheibel WMisra SDöllner J(2023)Outlier Mining Techniques for Software Defect PredictionSoftware Quality: Higher Software Quality through Zero Waste Development10.1007/978-3-031-31488-9_3(41-60)Online publication date: 13-May-2023
https://doi.org/10.1007/978-3-031-31488-9_3
Jiarpakdee JTantithamthavorn CDam HGrundy J(2022)An Empirical Study of Model-Agnostic Techniques for Defect Prediction ModelsIEEE Transactions on Software Engineering10.1109/TSE.2020.298238548:1(166-185)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/TSE.2020.2982385
Duan RXu HFan YYan M(2022)The Impact of Duplicate Changes on Just-in-Time Defect PredictionIEEE Transactions on Reliability10.1109/TR.2021.306161871:3(1294-1308)Online publication date: Sep-2022
https://doi.org/10.1109/TR.2021.3061618
Ali AGravino C(2022)The Impact of Parameters Optimization in Software Prediction Models2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)10.1109/SEAA56994.2022.00041(217-224)Online publication date: Aug-2022
https://doi.org/10.1109/SEAA56994.2022.00041
Zheng WShen TChen XDeng P(2022)Interpretability application of the Just-in-Time software defect prediction modelJournal of Systems and Software10.1016/j.jss.2022.111245188:COnline publication date: 1-Jun-2022
https://dl.acm.org/doi/10.1016/j.jss.2022.111245
Li YFeng YHao RLiu DFang CChen ZXu B(2022)Classifying crowdsourced mobile test reports with image featuresJournal of Systems and Software10.1016/j.jss.2021.111121184:COnline publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1016/j.jss.2021.111121
Zhao YWang YZhang YZhang DGong YJin D(2022)ST-TLFInformation and Software Technology10.1016/j.infsof.2022.106939149:COnline publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1016/j.infsof.2022.106939
dos Santos Dde Almeida EAhmed I(2022)Investigating replication challenges through multiple replications of an experimentInformation and Software Technology10.1016/j.infsof.2022.106870147:COnline publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1016/j.infsof.2022.106870
Bao LXia XLo DMurphy G(2021)A Large Scale Study of Long-Time Contributor Prediction for GitHub ProjectsIEEE Transactions on Software Engineering10.1109/TSE.2019.291853647:6(1277-1298)Online publication date: 1-Jun-2021
https://doi.org/10.1109/TSE.2019.2918536
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten