research-article

An adaptive approach with active learning in software fault prediction

Authors:

Bojan CukicAuthors Info & Claims

PROMISE '12: Proceedings of the 8th International Conference on Predictive Models in Software Engineering

Pages 79 - 88

https://doi.org/10.1145/2365324.2365335

Published: 21 September 2012 Publication History

Abstract

Background: Software quality prediction plays an important role in improving the quality of software systems. By mining software metrics, predictive models can be induced that provide software managers with insights into quality problems they need to tackle as effectively as possible.

Objective: Traditional, supervised learning approaches dominate software quality prediction. Resulting models tend to be project specific. On the other hand, in situations where there are no previous releases, supervised learning approaches are not very useful because large training data sets are needed to develop accurate predictive models.

Method: This paper eases the limitations of supervised learning approaches and offers good prediction performance. We propose an adaptive approach in which supervised learning and active learning are coupled together. NaiveBayes classifier is used as the base learner.

Results: We track the performance at each iteration of the adaptive learning algorithm and compare it with the performance of supervised learning. Our results show that proposed scheme provides good fault prediction performance over time, i.e., it eventually outperforms the corresponding supervised learning approach. On the other hand, adaptive learning classification approach reduces the variance in prediction performance in comparison with the corresponding supervised learning algorithm.

Conclusion: The adaptive approach outperforms the corresponding supervised learning approach when both use Naive-Bayes as base learner. Additional research is needed to investigate whether this observation remains valid with other base classifiers.

References

[1]

O. Chapelle, B. Schölkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, Cambridge, MA, 2006.

[2]

M. Culp and G. Michailidis. An iterative algorithm for extending learners to a semisupervised setting. In The 2007 Joint Statistical Meetings (JSM), 2007.

[3]

L. Datong, P. Yu, and P. Xiyuan. Online adaptive status prediction strategy for data-driven fault prognostics of complex systems. In Prognostics and System Health Management Conference (PHM-Shenzhen), 2011, pages 1--6, may 2011.

[4]

C. Fetzer, M. Raynal, and F. Tronel. An adaptive failure detection protocol. In Dependable Computing, 2001. Proceedings. 2001 Pacific Rim International Symposium on, pages 146--153, 2001.

Digital Library

[5]

D. Gray, D. Bowes, N. Davey, Y. Sun, and B. Christianson. The misuse of the nasa metrics data program data sets for automated software defect prediction. In Evaluation Assessment in Software Engineering (EASE 2011), 15th Annual Conference on, pages 96--103, april 2011.

[6]

G. Haffari and A. Sarkar. Analysis of semi-supervised learning with the yarowsky algorithm. In 23rd Conference on Uncertainty in Artificial Intelligence (UAI), 2007.

[7]

Y. Jiang, B. Cukic, T. Menzies, and N. Bartlow. Comparing design and code metrics for software quality prediction. In Proceedings of the 4th international workshop on Predictor models in software engineering, PROMISE '08, pages 11--18, New York, NY, USA, 2008. ACM.

Digital Library

[8]

S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. Software Engineering, IEEE Transactions on, 34(4): 485--496, july-aug. 2008.

Digital Library

[9]

D. D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In In Proceedings of the Eleventh International Conference on Machine Learning, pages 148--156. Morgan Kaufmann, 1994.

Digital Library

[10]

J. Ma, D. Li, S. Wang, and X. Xu. Data-based adaptive fault prediction method and its application. In Electronic Measurement Instruments, 2009. ICEMI '09. 9th International Conference on, pages 4-1011--4-1016, aug. 2009.

[11]

T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33: 2--13, 2007.

Digital Library

[12]

N. Roy and A. Mccallum. Toward optimal active learning through sampling estimation of error reduction. In In Proc. 18th International Conf. on Machine Learning, pages 441--448. Morgan Kaufmann, 2001.

Digital Library

[13]

N. Schneidewind. Software metrics model for quality control. In Software Metrics Symposium, 1997. Proceedings., Fourth International, pages 127--136, nov 1997.

Digital Library

[14]

B. Settles. Active Learning Literature Survey. Technical Report 1648, University of Wisconsin--Madison, 2009.

[15]

B. Settles and M. Craven. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, pages 1070--1079, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics.

Digital Library

[16]

M. Shepperd, Q. Song, Z. Sun, and C. Mair. Data quality: Some comments on the nasa software defect data sets. In Personal Communication, July 2012.

[17]

M. Tang, X. Luo, and S. Roukos. Active learning for statistical natural language parsing. In In Proceedings of ACL 2002, pages 120--127, 2002.

Digital Library

[18]

D. Tian, K. Wu, and X. Li. A novel adaptive failure detector for distributed systems. In Networking, Architecture, and Storage, 2008. NAS '08. International Conference on, pages 215--221, june 2008.

Digital Library

[19]

D. Yu, B. Varadarajan, L. Deng, and A. Acero. Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion. Comput. Speech Lang., 24(3): 433--444, July 2010.

Digital Library

[20]

J. Zhu, H. Wang, T. Yao, and B. K. Tsou. Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, COLING '08, pages 1137--1144, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics.

Digital Library

[21]

X. Zhu. Semi-supervised learning literature survey, 2006.

Cited By

Byrraju IJ S(2023)Enhancing Software Fault Localization Using Deep Learning Techniques2023 World Conference on Communication & Computing (WCONF)10.1109/WCONF58270.2023.10235227(1-5)Online publication date: 14-Jul-2023
https://doi.org/10.1109/WCONF58270.2023.10235227
Sun XTu LZhang JCai JLi BWang Y(2023) ASSBertJournal of Information Security and Applications10.1016/j.jisa.2023.10342373:COnline publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1016/j.jisa.2023.103423
Mei YLiu XLu ZYang YLiu HZhou Y(2023)Cross‐version defect prediction using threshold‐based active learningJournal of Software: Evolution and Process10.1002/smr.256336:4Online publication date: 2-Apr-2023
https://dl.acm.org/doi/10.1002/smr.2563
Show More Cited By

Index Terms

Recommendations

Software fault prediction using data mining, machine learning and deep learning techniques: A systematic literature review
Abstract
Software fault/defect prediction assists software developers to identify faulty constructs, such as modules or classes, early in the software development life cycle. There are data mining, machine learning, and deep learning techniques ...
Graphical abstract

Display Omitted
Highlights
- We study fault prediction using data mining, machine learning and deep learning.
A meta-learning framework for algorithm recommendation in software fault prediction
SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied Computing

Software fault prediction is a significant part of software quality assurance and it is commonly used to detect faulty software modules based on software measurement data. Several machine learning based approaches have been proposed for generating ...
Active deep Q-learning with demonstration
Abstract
Reinforcement learning (RL) is a machine learning technique aiming to learn how to take actions in an environment to maximize some kind of reward. Recent research has shown that although the learning efficiency of RL can be improved with expert ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

PROMISE '12: Proceedings of the 8th International Conference on Predictive Models in Software Engineering

September 2012

126 pages

ISBN:9781450312417

DOI:10.1145/2365324

Conference Chair:
Stefan Wagner
U Stuttgart

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 September 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PROMISE '12

PROMISE '12: 8th International Conference on Predictive Models in Software Engineering

September 21 - 22, 2012

Lund, Sweden

Acceptance Rates

PROMISE '12 Paper Acceptance Rate 12 of 24 submissions, 50%;

Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
391
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Byrraju IJ S(2023)Enhancing Software Fault Localization Using Deep Learning Techniques2023 World Conference on Communication & Computing (WCONF)10.1109/WCONF58270.2023.10235227(1-5)Online publication date: 14-Jul-2023
https://doi.org/10.1109/WCONF58270.2023.10235227
Sun XTu LZhang JCai JLi BWang Y(2023) ASSBertJournal of Information Security and Applications10.1016/j.jisa.2023.10342373:COnline publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1016/j.jisa.2023.103423
Mei YLiu XLu ZYang YLiu HZhou Y(2023)Cross‐version defect prediction using threshold‐based active learningJournal of Software: Evolution and Process10.1002/smr.256336:4Online publication date: 2-Apr-2023
https://dl.acm.org/doi/10.1002/smr.2563
Kang HLo D(2022)Active Learning of Discriminative Subgraph Patterns for API Misuse DetectionIEEE Transactions on Software Engineering10.1109/TSE.2021.306997848:8(2761-2783)Online publication date: 1-Aug-2022
https://doi.org/10.1109/TSE.2021.3069978
Guo YHu QCordy MPapadakis MLe Traon Y(2022)DRE: density-based data selection with entropy for adversarial-robust deep learning modelsNeural Computing and Applications10.1007/s00521-022-07812-235:5(4009-4026)Online publication date: 19-Oct-2022
https://dl.acm.org/doi/10.1007/s00521-022-07812-2
Zhang JTu LCai JSun XLi BChen WWang Y(2022)Vulnerability Detection for Smart Contract via Backward Bayesian Active LearningApplied Cryptography and Network Security Workshops10.1007/978-3-031-16815-4_5(66-83)Online publication date: 20-Jun-2022
https://dl.acm.org/doi/10.1007/978-3-031-16815-4_5
Liang MLi DXu BZhao DYu XXiang J(2021)Within-Project Software Aging Defect Prediction Based on Active Learning2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW53611.2021.00037(1-8)Online publication date: Oct-2021
https://doi.org/10.1109/ISSREW53611.2021.00037
Hu QGuo YCordy MXie XMa WPapadakis MTraon YGrundy J(2021)Towards exploring the limitations of active learningProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE51524.2021.9678672(917-929)Online publication date: 15-Nov-2021
https://dl.acm.org/doi/10.1109/ASE51524.2021.9678672
Mi WLi YWang S(2020)Empirical evaluation of the active learning strategies on software defects prediction2020 6th International Symposium on System and Software Reliability (ISSSR)10.1109/ISSSR51244.2020.00021(83-89)Online publication date: Oct-2020
https://doi.org/10.1109/ISSSR51244.2020.00021
Saputri TLee S(2020)The Application of Machine Learning in Self-Adaptive Systems: A Systematic Literature ReviewIEEE Access10.1109/ACCESS.2020.30360378(205948-205967)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3036037
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten