research-article

Dictionary learning based software defect prediction

Authors:

Xiao-Yuan Jing,

Jin LiuAuthors Info & Claims

ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

Pages 414 - 423

https://doi.org/10.1145/2568225.2568320

Published: 31 May 2014 Publication History

Abstract

In order to improve the quality of a software system, software defect prediction aims to automatically identify defective software modules for efficient software test. To predict software defect, those classification methods with static code attributes have attracted a great deal of attention. In recent years, machine learning techniques have been applied to defect prediction. Due to the fact that there exists the similarity among different software modules, one software module can be approximately represented by a small proportion of other modules. And the representation coefficients over the pre-defined dictionary, which consists of historical software module data, are generally sparse. In this paper, we propose to use the dictionary learning technique to predict software defect. By using the characteristics of the metrics mined from the open source software, we learn multiple dictionaries (including defective module and defective-free module sub-dictionaries and the total dictionary) and sparse representation coefficients. Moreover, we take the misclassification cost issue into account because the misclassification of defective modules generally incurs much higher risk cost than that of defective-free ones. We thus propose a cost-sensitive discriminative dictionary learning (CDDL) approach for software defect classification and prediction. The widely used datasets from NASA projects are employed as test data to evaluate the performance of all compared methods. Experimental results show that CDDL outperforms several representative state-of-the-art defect prediction methods.

References

[1]

M.R. Lyu, “Software Reliability Engineering: A Roadmap,” Future of Software Engineering, IEEE Computer Society, pp. 153-170, 2007.

Digital Library

[2]

J. Nam, S.J. Pany, S. Kim, “Transfer Defect Learning,” Int. Conf. Software Engineering, pp. 382-391, 2013.

Digital Library

[3]

C. Catal, B. Diri, “A systematic review of software fault prediction studies,” Expert Systems with Applications, vol. 36, pp. 7346-7354, 2009.

Digital Library

[4]

T. Hall, S. Beecham, D. Bowes, D. gray, S. Counsell, “A Systematic Literature Review on Fault Prediction Performance in Software Engineering,” IEEE Trans. Software Engineering, vol. 38, no. 6, pp. 1276-1304, 2011.

Digital Library

[5]

K. Elish, M. Elish, “Predicting Defect-prone Software Modules Using Support Vector Machines,” Journal Systems and Software, vol. 81, no. 5, pp. 649-660, 2008.

Digital Library

[6]

D. Gray, D. Bowes, N. Davey, Y. Sun, B. Christianson, “Using the support vector machine as a classification method for software defect prediction with static code metrics,” Engineering Applications of Neural Networks, vol. 43, pp. 223-234, 2009.

[7]

Z. Yan, X.Y. Chen, P. Guo, “Software Defect Prediction Using Fuzzy Support Vector Regression,” Advances in Neural Networks, pp. 17-24, 2010.

Digital Library

[8]

J. Wang, B.J. Shen, Y.T. Chen, “Compressed C4.5 Models for Software Defect Prediction,” Int. Conf. Quality Software, pp.13-16, 2012.

Digital Library

[9]

T.M. Khoshgoftaar, N. Seliya, “Tree-based software quality esti-mation models for fault prediction,” IEEE Symp. Software Metrics, pp. 203-214, 2002.

Digital Library

[10]

L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.

Digital Library

[11]

N. Gayatri, S. Nickolas, A.V. Reddy, “Feature Selection Using Decision Tree Induction in Class level Metrics Dataset for Software Defect Predictions,” The World Congress on Engineering and Computer Science, pp. 124- 129, 2010.

[12]

M.M.T. Thwin, T.S. Quah, “Application of neural networks for software quality prediction using object-oriented metrics,” Journal of Systems and Software, vol. 76, no. 2, pp. 147-156, 2005.

Digital Library

[13]

E. Paikari, M.M. Richter, G. Ruhe, “Defect Prediction Using Case-Based Reasoning: An Attribute Weighting Technique Based Upon Sensitivity Analysis In Neural Networks,” Int. Journal of Software Engineering and Knowledge Engineering, vol. 22, no. 5, 2012.

[14]

T. Wang, W.H. Li, “Naïve Bayes Software Defect Prediction Model,” Int. Conf. Computational Intelligence and Software Engineering, pp. 1-4, 2010.

[15]

S. Amasaki, Y. Takagi, O. Mizuno, T. Kikuno, “A Bayesian Belief Network for Assessing the Likelihood of Fault Content,” Int. Symp. Software Reliability Engineering, pp. 215-226, 2003.

Digital Library

[16]

B. Turhan, A. Bener, “Software Defect Prediction: Heuristics for Weighted Naïve Bayes,” Int. Conf. Software and Data Technologies, pp. 244-249, 2007.

[17]

B. Turhan, A. Bener, “Analysis of naïve bayes’ assumptions on software fault data: An empirical study,” Data Knowledge Engineering, vol. 68, no. 2, pp. 278-290, 2009.

Digital Library

[18]

J. Zheng, “Cost-sensitive boosting neural networks for software defect prediction,” Expert Systems With Applications, vol. 37, no. 6, pp. 4537-4543, 2010.

Digital Library

[19]

Bezerra, E. Miguel, A.L.I. Oliveiray, P.J.L. Adeodatoz, “Predicting software defects: A cost-sensitive approach,” Int. Conf. Systems, Man, and Cybernetics, pp. 2515-2522, 2011.

[20]

N. Seliya, T.M. Khoshgoftaar, “The use of decision trees for cost-sensitive classification an empirical study in software quality prediction,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 5, pp. 448-459, 2011.

Digital Library

[21]

P.D. Turney, “Types of cost in inductive concept learning,” Int. Conf. Machine Learning, pp. 15-21, 2000.

[22]

P. Domingos, “MetaCost: A general method for making classifiers cost-sensitive,” Int. Conf. Knowledge Discovery and Data Mining, pp. 155-164, 1999.

Digital Library

[23]

T. Menzies, J. Greenwald, A. Frank, “Data mining static code at-tributes to learn defect predictors,” IEEE Trans. Software Engineering, vol. 33, no. 1, pp. 2-13, 2007.

Digital Library

[24]

H. He, E.A. Garcia, “Learning from imbalanced data,” IEEE Trans. Knowledge Data Engineering, vol. 21, no. 9, pp. 1263-1284, 2009.

Digital Library

[25]

Z.H. Zhou, X.Y. Liu, “Training cost-sensitive neural networks with methods addressing the class imbalance problem,” IEEE Trans. Knowledge and Data Engineering, vol. 18, no. 1, pp. 63-77, 2006.

Digital Library

[26]

M.T. Khoshgoftaar, K. Gao, N.Seliya, “Attribute Selection and Imbalanced Data: Problems in Software Defect Prediction,” Int. Conf. Tools with Artificial Intelligence, pp. 137-144, 2010.

Digital Library

[27]

Z.B. Sun, Q.B. Song, X.Y. Zhu, “Using Coding Based Ensemble Learning to Improve Software Defect Prediction,” IEEE Trans. Systems, Man, and Cybernetics, Part C, vol. 42, no. 6, pp. 1806-1817, 2012.

Digital Library

[28]

K. Gao, T.M. Khoshgoftaar, A. Napolitano, “A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction,” Machine Learning and Applications, vol. 2, pp. 281-288, 2012.

Digital Library

[29]

S. Wang, X. Yao, “Using Class Imbalance Learning for Software Defect Prediction,” IEEE Trans. Reliability, vol. 62, no. 2, pp. 434-443, 2013.

[30]

S.G. Mallat, Z.F. Zhang, “Matching pursuits with timefrequency dictionaries,” IEEE Trans. Signal Processing, vol. 41, no. 12, pp. 3397-3415, 1993.

Digital Library

[31]

M. Aharon, M. Elad, A. Bruckstein, “K-SVD: An Algorithm for Designing Over-complete Dictionaries for Sparse Representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311-4322, 2006.

Digital Library

[32]

J. Mairal, F. Bach, J. Ponce, G. Sapiro, “Online learning for matrix factorization and sparse coding,” Journal of Machine Learning Research, vol. 11, pp. 19-60, 2010.

Digital Library

[33]

K. Skretting, K. Engan, “Recursive Least Squares Dictionary Learning Algorithm,” IEEE Trans. Signal Processing, vol. 58, no. 4, pp. 2121-2130, 2010.

Digital Library

[34]

M. Yang, L. Zhang, J. Yang, D. Zhang, “Metaface Learning for Sparse Representation based Face Recognition,” Int. Conf. Image Processing, pp. 1601-1604, 2010.

[35]

Z.L. Jiang, Z. Lin, L.S. Davis, “Learning a Discriminative Dictionary for Sparse Coding via Label Consistent K-SVD,” IEEE Computer Society Conf. Computer Vision and Pattern Recognition, pp. 1697-1704, 2011.

Digital Library

[36]

R. Cristian, D. Bogdan, “Stagewise K-SVD to Design Efficient Dictionaries for Sparse Representations,” IEEE Signal Processing Letters, vol. 19, no. 10, pp. 631-634, 2012.

[37]

Q. Zhang, B.X. Li, “Discriminative K-SVD for Dictionary Learning in Face Recognition,” IEEE Computer Society Conf. Computer Vision and Pattern Recognition, pp. 2691- 2698, 2010.

[38]

D. Pham, S. Venkatesh, “Joint learning and dictionary construction for pattern recognition,” IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.

[39]

J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, “Supervised dictionary learning,” IEEE Conf. Computer Vision and Pattern Recognition, 2008.

[40]

T.J. McCabe, “A complexity measure,” IEEE Trans. Software Engineering, vol. 4, pp. 308-320, 1976.

Digital Library

[41]

M.H. Halstead, “Elements of Software Science (Operating and programming systems series),” New York: Elsevier North-Holland, 1977.

Digital Library

[42]

Q.B. Song, Z.H. Jia, M. Shepperd, S. Ying, J. Liu,“ A General Software Defect-Proneness Prediction Framework,” IEEE Trans. Software Engineering, vol. 37, no. 3, pp. 356- 370, 2011.

Digital Library

[43]

T.G. Dietterich, “Ensemble Methods in Machine Learning, ” Multiple Classier Systems, pp. 1-15, 2000.

Digital Library

[44]

G. Valentini, F. Masulli, “Ensembles of learning machines,” Neural Networks, pp. 3-20, 2002.

Digital Library

[45]

I. Ramírez, P. Sprechmann, G. Sapiro, “Classification and Clustering via Dictionary Learning with Structured Incoherence and Shared Features,” IEEE Computer Society Conf. Computer Vision and Pattern Recognition, pp. 3501- 3508, 2010.

[46]

B.M, Mark. D. Plumbley, “Dictionary Learning with Large Step Gradient Descent for Sparse Representations,” Int. Conf. Latent Variable Analysis and Signal Separation, pp. 231-238, 2012.

Digital Library

[47]

M. Yang, L. Zhang, X. Feng, D. Zhang, “Fisher discrimination dictionary learning for sparse representation,” Int. Conf. Computer Vision, pp. 543-550, 2011.

Digital Library

[48]

J.C. Yang, J.P. Wang, H.S. Thomas, “Learning the Sparse Representation for Classification,” Int. Conf. Multimedia and Expo, pp. 1-6, 2011.

Digital Library

[49]

J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, “Robust Face Recognition via Sparse Representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.

Digital Library

[50]

L. Rosasco, A. Verri, M. Santoro, S. Mosci, S. Villa, “Iterative Projection Methods for Structured Sparsity Regularization,” MIT Technical Reports, MIT-CSAIL-TR- 2009-050, CBCL-282, 2009.

[51]

T. Lee, J. Nam, D. Han, S. Kim, I.P. Hoh, “Micro interaction metrics for defect prediction,” European Software Engineering Conf. the Foundations of Software Engineering, pp. 311-321, 2011.

Digital Library

[52]

R. Moser, W. Pedrycz, G. Succi, “A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction,” Int. Conf. Software Engineering, pp. 181- 190, 2008.

Digital Library

[53]

T.J. Ostrand, E.J. Weyuker, R.M. Bell, “Predicting the location and number of faults in large software systems,” IEEE Trans. Software Engineering, vol. 31, no. 4, pp. 340- 355, 2005.

Digital Library

[54]

H. Zhang, “An investigation of the relationships between lines of code and defects,” Int. Conf. Software Maintenance, pp. 274-283, 2009.

[55]

W.S. Yambor, B.A. Draper, J.R. Beveridge, “Analyzing PCA-based Face Recognition Algorithms: Eigenvector Selection and Distance Measures,” Workshop on Empirical Evaluation in Computer Vision, 2000.

[56]

T. Menzies, J. Greenwald, A. Frank, “Data Mining Static Code Attributes to Learn Defect Predictors,” IEEE Trans. Software Engineering, vol. 33, no. 1, pp. 2-13, 2007.

Digital Library

[57]

N. Seliya, T.M. Khoshgoftaar, J.V. Hulse, “Predicting faults in high assurance software,” IEEE Int. High Assurance Systems Engineering Symposium, pp. 26-34, 2010.

Digital Library

[58]

Y. Jiang, B. Cukic, T. Menzies, “Cost curve evaluation of fault prediction models,” IEEE Int. 19th International Symposium on Software Reliability Engineering, pp. 197- 206, 2008.

Digital Library

[59]

S. Lessmann, B. Baesens, C. Mues, S. Pietsch, “Benchmarking classification models for software defect prediction: A proposed framework and novel findings,” IEEE Trans. Software Engineering, vol. 34, no. 4, pp. 485- 496, 2008.

Digital Library

Cited By

Jiang MJiang JWu TMa ZLuo XZhou Y(2024)Understanding Vulnerability Inducing Commits of the Linux KernelACM Transactions on Software Engineering and Methodology10.1145/367245233:7(1-28)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3672452
Kumar RChaturvedi A(2024)Software Bug Prediction Using Reward-Based Weighted Majority Voting Ensemble TechniqueIEEE Transactions on Reliability10.1109/TR.2023.329559873:1(726-740)Online publication date: Mar-2024
https://doi.org/10.1109/TR.2023.3295598
Su WHuang C(2024)Application of Weighted Combinations of Activation Functions to Defect Prediction in Software DevelopmentIEEE Transactions on Reliability10.1109/TR.2023.328485773:1(680-694)Online publication date: Mar-2024
https://doi.org/10.1109/TR.2023.3284857
Show More Cited By

Index Terms

Dictionary learning based software defect prediction

Recommendations

Cost-sensitive Dictionary Learning for Software Defect Prediction
Abstract
In recent years, software defect prediction has been recognized as a cost-sensitive learning problem. To deal with the unequal misclassification losses resulted by different classification errors, some cost-sensitive dictionary learning methods ...
Software defect prediction: do different classifiers find the same defects?

During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of ...
Progress on approaches to software defect prediction

Software defect prediction is one of the most popular research topics in software engineering. It aims to predict defect‐prone software modules before defects are discovered, therefore it can be used to better prioritise software quality assurance effort. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

May 2014

1139 pages

ISBN:9781450327565

DOI:10.1145/2568225

General Chair:
Pankaj Jalote
IIIT-Delhi, India
,
Program Chairs:
Lionel Briand
University of Luxembourg, Luxembourg
,
André van der Hoek
University of California, Irvine, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICSE '14

Sponsor:

SIGSOFT

ICSE '14: 36th International Conference on Software Engineering

May 31 - June 7, 2014

Hyderabad, India

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

142
Total Citations
View Citations
1,747
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)3

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jiang MJiang JWu TMa ZLuo XZhou Y(2024)Understanding Vulnerability Inducing Commits of the Linux KernelACM Transactions on Software Engineering and Methodology10.1145/367245233:7(1-28)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3672452
Kumar RChaturvedi A(2024)Software Bug Prediction Using Reward-Based Weighted Majority Voting Ensemble TechniqueIEEE Transactions on Reliability10.1109/TR.2023.329559873:1(726-740)Online publication date: Mar-2024
https://doi.org/10.1109/TR.2023.3295598
Su WHuang C(2024)Application of Weighted Combinations of Activation Functions to Defect Prediction in Software DevelopmentIEEE Transactions on Reliability10.1109/TR.2023.328485773:1(680-694)Online publication date: Mar-2024
https://doi.org/10.1109/TR.2023.3284857
Chen ZHuang CLin JFang CChu W(2024)Employing CNN with Spatial Pyramid Pooling for Predicting Software Defects through Image Analysis2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00039(318-327)Online publication date: 1-Jul-2024
https://doi.org/10.1109/QRS62785.2024.00039
Akritidis LBozanis P(2024)A Clustering-Based Resampling Technique with Cluster Structure Analysis for Software Defect Detection in Imbalanced DatasetsInformation Sciences10.1016/j.ins.2024.120724(120724)Online publication date: May-2024
https://doi.org/10.1016/j.ins.2024.120724
Han JHuang CLiu J(2024)bjCnet: A contrastive learning-based framework for software defect predictionComputers & Security10.1016/j.cose.2024.104024(104024)Online publication date: Jul-2024
https://doi.org/10.1016/j.cose.2024.104024
Sasankar PSakarkar G(2024)Cross-Project Defect Prediction: Leveraging Knowledge Transfer for Improved Software Quality AssuranceInnovations in Electrical and Electronic Engineering10.1007/978-981-99-8661-3_22(291-303)Online publication date: 16-Feb-2024
https://doi.org/10.1007/978-981-99-8661-3_22
Jing XChen HXu BJing XChen HXu B(2024)Cross-Project Defect PredictionIntelligent Software Defect Prediction10.1007/978-981-99-2842-2_4(35-63)Online publication date: 18-Jan-2024
https://doi.org/10.1007/978-981-99-2842-2_4
Santos GMuzetti IFigueiredo E(2024)Two sides of the same coin: A study on developers' perception of defectsJournal of Software: Evolution and Process10.1002/smr.2699Online publication date: 18-Jun-2024
https://doi.org/10.1002/smr.2699
Mrs. Prachi Sasankar Dr. Gopal Sakarkar (2023)An Empirical Study of Classification Models Using AUC-ROC Curve for Software Fault PredictionsInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT2390143(250-260)Online publication date: 1-Feb-2023
https://doi.org/10.32628/CSEIT2390143
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents