Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2568225.2568320acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Dictionary learning based software defect prediction

Published: 31 May 2014 Publication History

Abstract

In order to improve the quality of a software system, software defect prediction aims to automatically identify defective software modules for efficient software test. To predict software defect, those classification methods with static code attributes have attracted a great deal of attention. In recent years, machine learning techniques have been applied to defect prediction. Due to the fact that there exists the similarity among different software modules, one software module can be approximately represented by a small proportion of other modules. And the representation coefficients over the pre-defined dictionary, which consists of historical software module data, are generally sparse. In this paper, we propose to use the dictionary learning technique to predict software defect. By using the characteristics of the metrics mined from the open source software, we learn multiple dictionaries (including defective module and defective-free module sub-dictionaries and the total dictionary) and sparse representation coefficients. Moreover, we take the misclassification cost issue into account because the misclassification of defective modules generally incurs much higher risk cost than that of defective-free ones. We thus propose a cost-sensitive discriminative dictionary learning (CDDL) approach for software defect classification and prediction. The widely used datasets from NASA projects are employed as test data to evaluate the performance of all compared methods. Experimental results show that CDDL outperforms several representative state-of-the-art defect prediction methods.

References

[1]
M.R. Lyu, “Software Reliability Engineering: A Roadmap,” Future of Software Engineering, IEEE Computer Society, pp. 153-170, 2007.
[2]
J. Nam, S.J. Pany, S. Kim, “Transfer Defect Learning,” Int. Conf. Software Engineering, pp. 382-391, 2013.
[3]
C. Catal, B. Diri, “A systematic review of software fault prediction studies,” Expert Systems with Applications, vol. 36, pp. 7346-7354, 2009.
[4]
T. Hall, S. Beecham, D. Bowes, D. gray, S. Counsell, “A Systematic Literature Review on Fault Prediction Performance in Software Engineering,” IEEE Trans. Software Engineering, vol. 38, no. 6, pp. 1276-1304, 2011.
[5]
K. Elish, M. Elish, “Predicting Defect-prone Software Modules Using Support Vector Machines,” Journal Systems and Software, vol. 81, no. 5, pp. 649-660, 2008.
[6]
D. Gray, D. Bowes, N. Davey, Y. Sun, B. Christianson, “Using the support vector machine as a classification method for software defect prediction with static code metrics,” Engineering Applications of Neural Networks, vol. 43, pp. 223-234, 2009.
[7]
Z. Yan, X.Y. Chen, P. Guo, “Software Defect Prediction Using Fuzzy Support Vector Regression,” Advances in Neural Networks, pp. 17-24, 2010.
[8]
J. Wang, B.J. Shen, Y.T. Chen, “Compressed C4.5 Models for Software Defect Prediction,” Int. Conf. Quality Software, pp.13-16, 2012.
[9]
T.M. Khoshgoftaar, N. Seliya, “Tree-based software quality esti-mation models for fault prediction,” IEEE Symp. Software Metrics, pp. 203-214, 2002.
[10]
L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[11]
N. Gayatri, S. Nickolas, A.V. Reddy, “Feature Selection Using Decision Tree Induction in Class level Metrics Dataset for Software Defect Predictions,” The World Congress on Engineering and Computer Science, pp. 124- 129, 2010.
[12]
M.M.T. Thwin, T.S. Quah, “Application of neural networks for software quality prediction using object-oriented metrics,” Journal of Systems and Software, vol. 76, no. 2, pp. 147-156, 2005.
[13]
E. Paikari, M.M. Richter, G. Ruhe, “Defect Prediction Using Case-Based Reasoning: An Attribute Weighting Technique Based Upon Sensitivity Analysis In Neural Networks,” Int. Journal of Software Engineering and Knowledge Engineering, vol. 22, no. 5, 2012.
[14]
T. Wang, W.H. Li, “Naïve Bayes Software Defect Prediction Model,” Int. Conf. Computational Intelligence and Software Engineering, pp. 1-4, 2010.
[15]
S. Amasaki, Y. Takagi, O. Mizuno, T. Kikuno, “A Bayesian Belief Network for Assessing the Likelihood of Fault Content,” Int. Symp. Software Reliability Engineering, pp. 215-226, 2003.
[16]
B. Turhan, A. Bener, “Software Defect Prediction: Heuristics for Weighted Naïve Bayes,” Int. Conf. Software and Data Technologies, pp. 244-249, 2007.
[17]
B. Turhan, A. Bener, “Analysis of naïve bayes’ assumptions on software fault data: An empirical study,” Data Knowledge Engineering, vol. 68, no. 2, pp. 278-290, 2009.
[18]
J. Zheng, “Cost-sensitive boosting neural networks for software defect prediction,” Expert Systems With Applications, vol. 37, no. 6, pp. 4537-4543, 2010.
[19]
Bezerra, E. Miguel, A.L.I. Oliveiray, P.J.L. Adeodatoz, “Predicting software defects: A cost-sensitive approach,” Int. Conf. Systems, Man, and Cybernetics, pp. 2515-2522, 2011.
[20]
N. Seliya, T.M. Khoshgoftaar, “The use of decision trees for cost-sensitive classification an empirical study in software quality prediction,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 5, pp. 448-459, 2011.
[21]
P.D. Turney, “Types of cost in inductive concept learning,” Int. Conf. Machine Learning, pp. 15-21, 2000.
[22]
P. Domingos, “MetaCost: A general method for making classifiers cost-sensitive,” Int. Conf. Knowledge Discovery and Data Mining, pp. 155-164, 1999.
[23]
T. Menzies, J. Greenwald, A. Frank, “Data mining static code at-tributes to learn defect predictors,” IEEE Trans. Software Engineering, vol. 33, no. 1, pp. 2-13, 2007.
[24]
H. He, E.A. Garcia, “Learning from imbalanced data,” IEEE Trans. Knowledge Data Engineering, vol. 21, no. 9, pp. 1263-1284, 2009.
[25]
Z.H. Zhou, X.Y. Liu, “Training cost-sensitive neural networks with methods addressing the class imbalance problem,” IEEE Trans. Knowledge and Data Engineering, vol. 18, no. 1, pp. 63-77, 2006.
[26]
M.T. Khoshgoftaar, K. Gao, N.Seliya, “Attribute Selection and Imbalanced Data: Problems in Software Defect Prediction,” Int. Conf. Tools with Artificial Intelligence, pp. 137-144, 2010.
[27]
Z.B. Sun, Q.B. Song, X.Y. Zhu, “Using Coding Based Ensemble Learning to Improve Software Defect Prediction,” IEEE Trans. Systems, Man, and Cybernetics, Part C, vol. 42, no. 6, pp. 1806-1817, 2012.
[28]
K. Gao, T.M. Khoshgoftaar, A. Napolitano, “A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction,” Machine Learning and Applications, vol. 2, pp. 281-288, 2012.
[29]
S. Wang, X. Yao, “Using Class Imbalance Learning for Software Defect Prediction,” IEEE Trans. Reliability, vol. 62, no. 2, pp. 434-443, 2013.
[30]
S.G. Mallat, Z.F. Zhang, “Matching pursuits with timefrequency dictionaries,” IEEE Trans. Signal Processing, vol. 41, no. 12, pp. 3397-3415, 1993.
[31]
M. Aharon, M. Elad, A. Bruckstein, “K-SVD: An Algorithm for Designing Over-complete Dictionaries for Sparse Representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311-4322, 2006.
[32]
J. Mairal, F. Bach, J. Ponce, G. Sapiro, “Online learning for matrix factorization and sparse coding,” Journal of Machine Learning Research, vol. 11, pp. 19-60, 2010.
[33]
K. Skretting, K. Engan, “Recursive Least Squares Dictionary Learning Algorithm,” IEEE Trans. Signal Processing, vol. 58, no. 4, pp. 2121-2130, 2010.
[34]
M. Yang, L. Zhang, J. Yang, D. Zhang, “Metaface Learning for Sparse Representation based Face Recognition,” Int. Conf. Image Processing, pp. 1601-1604, 2010.
[35]
Z.L. Jiang, Z. Lin, L.S. Davis, “Learning a Discriminative Dictionary for Sparse Coding via Label Consistent K-SVD,” IEEE Computer Society Conf. Computer Vision and Pattern Recognition, pp. 1697-1704, 2011.
[36]
R. Cristian, D. Bogdan, “Stagewise K-SVD to Design Efficient Dictionaries for Sparse Representations,” IEEE Signal Processing Letters, vol. 19, no. 10, pp. 631-634, 2012.
[37]
Q. Zhang, B.X. Li, “Discriminative K-SVD for Dictionary Learning in Face Recognition,” IEEE Computer Society Conf. Computer Vision and Pattern Recognition, pp. 2691- 2698, 2010.
[38]
D. Pham, S. Venkatesh, “Joint learning and dictionary construction for pattern recognition,” IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[39]
J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, “Supervised dictionary learning,” IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[40]
T.J. McCabe, “A complexity measure,” IEEE Trans. Software Engineering, vol. 4, pp. 308-320, 1976.
[41]
M.H. Halstead, “Elements of Software Science (Operating and programming systems series),” New York: Elsevier North-Holland, 1977.
[42]
Q.B. Song, Z.H. Jia, M. Shepperd, S. Ying, J. Liu,“ A General Software Defect-Proneness Prediction Framework,” IEEE Trans. Software Engineering, vol. 37, no. 3, pp. 356- 370, 2011.
[43]
T.G. Dietterich, “Ensemble Methods in Machine Learning, ” Multiple Classier Systems, pp. 1-15, 2000.
[44]
G. Valentini, F. Masulli, “Ensembles of learning machines,” Neural Networks, pp. 3-20, 2002.
[45]
I. Ramírez, P. Sprechmann, G. Sapiro, “Classification and Clustering via Dictionary Learning with Structured Incoherence and Shared Features,” IEEE Computer Society Conf. Computer Vision and Pattern Recognition, pp. 3501- 3508, 2010.
[46]
B.M, Mark. D. Plumbley, “Dictionary Learning with Large Step Gradient Descent for Sparse Representations,” Int. Conf. Latent Variable Analysis and Signal Separation, pp. 231-238, 2012.
[47]
M. Yang, L. Zhang, X. Feng, D. Zhang, “Fisher discrimination dictionary learning for sparse representation,” Int. Conf. Computer Vision, pp. 543-550, 2011.
[48]
J.C. Yang, J.P. Wang, H.S. Thomas, “Learning the Sparse Representation for Classification,” Int. Conf. Multimedia and Expo, pp. 1-6, 2011.
[49]
J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, “Robust Face Recognition via Sparse Representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
[50]
L. Rosasco, A. Verri, M. Santoro, S. Mosci, S. Villa, “Iterative Projection Methods for Structured Sparsity Regularization,” MIT Technical Reports, MIT-CSAIL-TR- 2009-050, CBCL-282, 2009.
[51]
T. Lee, J. Nam, D. Han, S. Kim, I.P. Hoh, “Micro interaction metrics for defect prediction,” European Software Engineering Conf. the Foundations of Software Engineering, pp. 311-321, 2011.
[52]
R. Moser, W. Pedrycz, G. Succi, “A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction,” Int. Conf. Software Engineering, pp. 181- 190, 2008.
[53]
T.J. Ostrand, E.J. Weyuker, R.M. Bell, “Predicting the location and number of faults in large software systems,” IEEE Trans. Software Engineering, vol. 31, no. 4, pp. 340- 355, 2005.
[54]
H. Zhang, “An investigation of the relationships between lines of code and defects,” Int. Conf. Software Maintenance, pp. 274-283, 2009.
[55]
W.S. Yambor, B.A. Draper, J.R. Beveridge, “Analyzing PCA-based Face Recognition Algorithms: Eigenvector Selection and Distance Measures,” Workshop on Empirical Evaluation in Computer Vision, 2000.
[56]
T. Menzies, J. Greenwald, A. Frank, “Data Mining Static Code Attributes to Learn Defect Predictors,” IEEE Trans. Software Engineering, vol. 33, no. 1, pp. 2-13, 2007.
[57]
N. Seliya, T.M. Khoshgoftaar, J.V. Hulse, “Predicting faults in high assurance software,” IEEE Int. High Assurance Systems Engineering Symposium, pp. 26-34, 2010.
[58]
Y. Jiang, B. Cukic, T. Menzies, “Cost curve evaluation of fault prediction models,” IEEE Int. 19th International Symposium on Software Reliability Engineering, pp. 197- 206, 2008.
[59]
S. Lessmann, B. Baesens, C. Mues, S. Pietsch, “Benchmarking classification models for software defect prediction: A proposed framework and novel findings,” IEEE Trans. Software Engineering, vol. 34, no. 4, pp. 485- 496, 2008.

Cited By

View all
  • (2024)Understanding Vulnerability Inducing Commits of the Linux KernelACM Transactions on Software Engineering and Methodology10.1145/367245233:7(1-28)Online publication date: 14-Jun-2024
  • (2024)Software Bug Prediction Using Reward-Based Weighted Majority Voting Ensemble TechniqueIEEE Transactions on Reliability10.1109/TR.2023.329559873:1(726-740)Online publication date: Mar-2024
  • (2024)Application of Weighted Combinations of Activation Functions to Defect Prediction in Software DevelopmentIEEE Transactions on Reliability10.1109/TR.2023.328485773:1(680-694)Online publication date: Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE 2014: Proceedings of the 36th International Conference on Software Engineering
May 2014
1139 pages
ISBN:9781450327565
DOI:10.1145/2568225
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Software defect prediction
  2. cost-sensitive discriminative dictionary learning (CDDL)
  3. dictionary learning
  4. sparse representation

Qualifiers

  • Research-article

Conference

ICSE '14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)3
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Understanding Vulnerability Inducing Commits of the Linux KernelACM Transactions on Software Engineering and Methodology10.1145/367245233:7(1-28)Online publication date: 14-Jun-2024
  • (2024)Software Bug Prediction Using Reward-Based Weighted Majority Voting Ensemble TechniqueIEEE Transactions on Reliability10.1109/TR.2023.329559873:1(726-740)Online publication date: Mar-2024
  • (2024)Application of Weighted Combinations of Activation Functions to Defect Prediction in Software DevelopmentIEEE Transactions on Reliability10.1109/TR.2023.328485773:1(680-694)Online publication date: Mar-2024
  • (2024)Employing CNN with Spatial Pyramid Pooling for Predicting Software Defects through Image Analysis2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00039(318-327)Online publication date: 1-Jul-2024
  • (2024)A Clustering-Based Resampling Technique with Cluster Structure Analysis for Software Defect Detection in Imbalanced DatasetsInformation Sciences10.1016/j.ins.2024.120724(120724)Online publication date: May-2024
  • (2024)bjCnet: A contrastive learning-based framework for software defect predictionComputers & Security10.1016/j.cose.2024.104024(104024)Online publication date: Jul-2024
  • (2024)Cross-Project Defect Prediction: Leveraging Knowledge Transfer for Improved Software Quality AssuranceInnovations in Electrical and Electronic Engineering10.1007/978-981-99-8661-3_22(291-303)Online publication date: 16-Feb-2024
  • (2024)Cross-Project Defect PredictionIntelligent Software Defect Prediction10.1007/978-981-99-2842-2_4(35-63)Online publication date: 18-Jan-2024
  • (2024)Two sides of the same coin: A study on developers' perception of defectsJournal of Software: Evolution and Process10.1002/smr.2699Online publication date: 18-Jun-2024
  • (2023)An Empirical Study of Classification Models Using AUC-ROC Curve for Software Fault PredictionsInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT2390143(250-260)Online publication date: 1-Feb-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media