Abstract
The purpose of cross-project defect prediction is to predict whether there are defects in this project module by using a prediction model trained by the data of other projects. For the divergence of the data distribution between different projects, the performance of cross-project defect prediction is not as good as within-project defect prediction. To reduce the difference as much as possible, researchers have proposed a variety of methods to filter training data from the perspective of transfer learning. In this paper, we introduce a “project-instance-metric" hierarchical filtering strategy to select training data for the defect prediction model. Using the three-level filtering method, the candidate projects that are most similar to the target project, the instances that are most similar to the target instance, and the metrics with the highest correlation to the prediction result are filtered out respectively. We compared three-level filtering with project-level filtering, instance-level filtering, and the combination of project-level and instance-level filtering methods in four classification algorithms using NASA open source data sets. Our experiments show that the three-level filtering method achieves more significant f-measure and AUC values than the single level training data filtering method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhou, Y., et al.: How far we have progressed in the journey? an examination of cross-project defect prediction. ACM Trans. Softw. Eng. Methodol. 27, 1–51 (2018)
Xu, Z., Yuan, P., Zhang, T., Tang, Y., Li, S., Xia, Z.: HDA: cross-project defect prediction via heterogeneous domain adaptation with dictionary learning. IEEE Access 6, 57597–57613 (2018)
Hosseini, S., Turhan, B., Gunarathna, D.: A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans. Softw. Eng. 99, 1–40 (2017)
Turhan, B.: On the dataset shift problem in software engineering prediction models. Empir. Softw. Eng. 17, 62–74 (2012). https://doi.org/10.1007/s10664-011-9182-8
Turhan, B., et al.: On the relative value of cross-company and within-company data for defect prediction. Empir. Softw. Eng. 14, 540–578 (2009). https://doi.org/10.1007/s10664-008-9103-7
Peters, F., et al.: Better cross company defect prediction. In: 10th Working Conference on Mining Software Repositories (MSR) (2013)
He P., et al.: Simplification of training data for cross-project defect prediction. Computer Science (2014)
Yu, Q., Qian, J., Jiang, S., Wu, Z., Zhang, G.: An empirical study on the effectiveness of feature selection for cross-project defect prediction. IEEE Access 7, 35710–35718 (2019). https://doi.org/10.1109/ACCESS.2019.2895614
Briand, L.C., et al.: Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. Softw. Eng. 28(7), 706–720 (2002)
Herbold, S.: Training data selection for cross-project defect prediction. In: Proceedings of the 9th International Conference on Predictive Models in Software Engineering (2013)
Kawata, K., Amasaki, S., Yokogawa, T.: Improving relevancy filter methods for cross-project defect prediction. In: Software Engineering & Advanced Applications, vol. 619. IEEE (2015)
Cui, C., Liu, B., Wang, S.: Isolation forest filter to simplify training data for cross-project defect prediction. In: 2019 Prognostics and System Health Management Conference (2019)
He, P., Li, B., Liu, X., Chen, J., Ma, Y.T.: An empirical study on software defect prediction with a simplificd metric set. Inf. Soft. Technol. 59, 170–190 (2015)
Amasaki, S., Kawata, K., Yokogawa, T.: Improving cross-project defect prediction methods with data simplification. In: Proceedings of the Euromicro Conference on Software Engineering and Advanced Applications (2015)
He, Z., Shu, F., Yang, Y., Li, M., Wang, Q.: An investigation on the feasibility of cross-project defect prediction. In: Proceedings Eighth IEEE Symposium on Software Metrics (2012)
Gray, D., et al.: Reflections on the NASA MDP data sets. IET Softw. 6, 549–558 (2012)
Lewis, D.D: Naive (Bayes) at forty: the independence assumption in information retrieval. In: European Conference on Machine Learning (1998)
Witten, I.H., et al.: Data mining: practical machine learning tools and techniques. ACM Sigmod Rec. 31, 76–77 (1999)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Buckland, M., Fredric, G.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45, 12–19 (1994)
Rahman, F., et al.: Recalling the ‘Imprecision’ of cross-project defect prediction. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (2012)
Nam, Ja., et al.: Transfer defect learning. In: Proceedings of the 2013 International Conference on Software Engineering (2013)
Kim, S., Whitehead, E.J., Zhang, Y.: Classifying software changes: clean or buggy? IEEE Trans. Softw. Eng. 34(2), 181–196 (2008)
Shull, F., et al.: What we have learned about fighting defects. In: Proceedings 8th IEEE Symposium on Software Metrics (2002)
Acknowledgements
This paper is partly supported by the Pre-research of Civil Spacecraft Technology (No. B0204).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Yuan, C., Wang, X., Ke, X., Zhan, P. (2021). A Three-Level Training Data Filter for Cross-project Defect Prediction. In: Wu, Q., Zhao, K., Ding, X. (eds) Wireless and Satellite Systems. WiSATS 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 357. Springer, Cham. https://doi.org/10.1007/978-3-030-69069-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-69069-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69068-7
Online ISBN: 978-3-030-69069-4
eBook Packages: Computer ScienceComputer Science (R0)