A Three-Level Training Data Filter for Cross-project Defect Prediction

Yuan, Cangzhou; Wang, Xiaowei; Ke, Xinxin; Zhan, Panpan

doi:10.1007/978-3-030-69069-4_10

Cangzhou Yuan¹⁸,
Xiaowei Wang¹⁸,
Xinxin Ke¹⁸ &
…
Panpan Zhan¹⁹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 357))

Included in the following conference series:

International Conference on Wireless and Satellite Systems

781 Accesses

Abstract

The purpose of cross-project defect prediction is to predict whether there are defects in this project module by using a prediction model trained by the data of other projects. For the divergence of the data distribution between different projects, the performance of cross-project defect prediction is not as good as within-project defect prediction. To reduce the difference as much as possible, researchers have proposed a variety of methods to filter training data from the perspective of transfer learning. In this paper, we introduce a “project-instance-metric" hierarchical filtering strategy to select training data for the defect prediction model. Using the three-level filtering method, the candidate projects that are most similar to the target project, the instances that are most similar to the target instance, and the metrics with the highest correlation to the prediction result are filtered out respectively. We compared three-level filtering with project-level filtering, instance-level filtering, and the combination of project-level and instance-level filtering methods in four classification algorithms using NASA open source data sets. Our experiments show that the three-level filtering method achieves more significant f-measure and AUC values than the single level training data filtering method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cross-Project Software Defect Prediction Based on Feature Selection and Transfer Learning

Unsupervised Learning to Heterogeneous Cross Software Projects Defect Prediction

A software defect prediction method with metric compensation based on feature selection and transfer learning

Article 04 April 2022

References

Zhou, Y., et al.: How far we have progressed in the journey? an examination of cross-project defect prediction. ACM Trans. Softw. Eng. Methodol. 27, 1–51 (2018)
Article Google Scholar
Xu, Z., Yuan, P., Zhang, T., Tang, Y., Li, S., Xia, Z.: HDA: cross-project defect prediction via heterogeneous domain adaptation with dictionary learning. IEEE Access 6, 57597–57613 (2018)
Article Google Scholar
Hosseini, S., Turhan, B., Gunarathna, D.: A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans. Softw. Eng. 99, 1–40 (2017)
Google Scholar
Turhan, B.: On the dataset shift problem in software engineering prediction models. Empir. Softw. Eng. 17, 62–74 (2012). https://doi.org/10.1007/s10664-011-9182-8
Article Google Scholar
Turhan, B., et al.: On the relative value of cross-company and within-company data for defect prediction. Empir. Softw. Eng. 14, 540–578 (2009). https://doi.org/10.1007/s10664-008-9103-7
Article Google Scholar
Peters, F., et al.: Better cross company defect prediction. In: 10th Working Conference on Mining Software Repositories (MSR) (2013)
Google Scholar
He P., et al.: Simplification of training data for cross-project defect prediction. Computer Science (2014)
Google Scholar
Yu, Q., Qian, J., Jiang, S., Wu, Z., Zhang, G.: An empirical study on the effectiveness of feature selection for cross-project defect prediction. IEEE Access 7, 35710–35718 (2019). https://doi.org/10.1109/ACCESS.2019.2895614
Article Google Scholar
Briand, L.C., et al.: Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. Softw. Eng. 28(7), 706–720 (2002)
Article Google Scholar
Herbold, S.: Training data selection for cross-project defect prediction. In: Proceedings of the 9th International Conference on Predictive Models in Software Engineering (2013)
Google Scholar
Kawata, K., Amasaki, S., Yokogawa, T.: Improving relevancy filter methods for cross-project defect prediction. In: Software Engineering & Advanced Applications, vol. 619. IEEE (2015)
Google Scholar
Cui, C., Liu, B., Wang, S.: Isolation forest filter to simplify training data for cross-project defect prediction. In: 2019 Prognostics and System Health Management Conference (2019)
Google Scholar
He, P., Li, B., Liu, X., Chen, J., Ma, Y.T.: An empirical study on software defect prediction with a simplificd metric set. Inf. Soft. Technol. 59, 170–190 (2015)
Article Google Scholar
Amasaki, S., Kawata, K., Yokogawa, T.: Improving cross-project defect prediction methods with data simplification. In: Proceedings of the Euromicro Conference on Software Engineering and Advanced Applications (2015)
Google Scholar
He, Z., Shu, F., Yang, Y., Li, M., Wang, Q.: An investigation on the feasibility of cross-project defect prediction. In: Proceedings Eighth IEEE Symposium on Software Metrics (2012)
Google Scholar
Gray, D., et al.: Reflections on the NASA MDP data sets. IET Softw. 6, 549–558 (2012)
Article Google Scholar
Lewis, D.D: Naive (Bayes) at forty: the independence assumption in information retrieval. In: European Conference on Machine Learning (1998)
Google Scholar
Witten, I.H., et al.: Data mining: practical machine learning tools and techniques. ACM Sigmod Rec. 31, 76–77 (1999)
Article Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Buckland, M., Fredric, G.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45, 12–19 (1994)
Article Google Scholar
Rahman, F., et al.: Recalling the ‘Imprecision’ of cross-project defect prediction. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (2012)
Google Scholar
Nam, Ja., et al.: Transfer defect learning. In: Proceedings of the 2013 International Conference on Software Engineering (2013)
Google Scholar
Kim, S., Whitehead, E.J., Zhang, Y.: Classifying software changes: clean or buggy? IEEE Trans. Softw. Eng. 34(2), 181–196 (2008)
Article Google Scholar
Shull, F., et al.: What we have learned about fighting defects. In: Proceedings 8th IEEE Symposium on Software Metrics (2002)
Google Scholar

Download references

Acknowledgements

This paper is partly supported by the Pre-research of Civil Spacecraft Technology (No. B0204).

Author information

Authors and Affiliations

School of Software, Beihang University, Beijing, 100191, China
Cangzhou Yuan, Xiaowei Wang & Xinxin Ke
Beijing Institute of Spacecraft System Engineering, Beijing, 100094, China
Panpan Zhan

Authors

Cangzhou Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xinxin Ke
View author publications
You can also search for this author in PubMed Google Scholar
Panpan Zhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cangzhou Yuan .

Editor information

Editors and Affiliations

Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China
Qihui Wu
Nanjing University, Nanjing, China
Kanglian Zhao
Nanjing University of Posts and Telecommunications, Nanjing, China
Xiaojin Ding

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuan, C., Wang, X., Ke, X., Zhan, P. (2021). A Three-Level Training Data Filter for Cross-project Defect Prediction. In: Wu, Q., Zhao, K., Ding, X. (eds) Wireless and Satellite Systems. WiSATS 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 357. Springer, Cham. https://doi.org/10.1007/978-3-030-69069-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-69069-4_10
Published: 28 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69068-7
Online ISBN: 978-3-030-69069-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Three-Level Training Data Filter for Cross-project Defect Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cross-Project Software Defect Prediction Based on Feature Selection and Transfer Learning

Unsupervised Learning to Heterogeneous Cross Software Projects Defect Prediction

A software defect prediction method with metric compensation based on feature selection and transfer learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Three-Level Training Data Filter for Cross-project Defect Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cross-Project Software Defect Prediction Based on Feature Selection and Transfer Learning

Unsupervised Learning to Heterogeneous Cross Software Projects Defect Prediction

A software defect prediction method with metric compensation based on feature selection and transfer learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation