research-article

Comparing negative binomial and recursive partitioning models for fault prediction

Authors:

Elaine J. Weyuker,

Thomas J. Ostrand,

Robert M. BellAuthors Info & Claims

PROMISE '08: Proceedings of the 4th international workshop on Predictor models in software engineering

Pages 3 - 10

https://doi.org/10.1145/1370788.1370792

Published: 12 May 2008 Publication History

Abstract

Two different software fault prediction models have been used to predict the N% of the files of a large software system that are likely to contain the largest numbers of faults. We used the same predictor variables in a negative binomial regression model and a recursive partitioning model, and compared their effectiveness on three large industrial software systems. The negative binomial model identified files that contain 76 to 93 percent of the faults, and recursive partitioning identified files that contain 68 to 85 percent.

References

[1]

E.N. Adams. Optimizing Preventive Service of Software Products. IBM J. Res. Develop., Vol 28, No 1, Jan 1984, pp. 2--14.

Digital Library

[2]

E. Arisholm and L.C. Briand. Predicting Fault--prone Components in a Java Legacy System. Proc. ACM/IEEE ISESE, Rio de Janeiro, 2006.

Digital Library

[3]

V.R. Basili and B.T. Perricone. Software Errors and Complexity: An Empirical Investigation. Communications of the ACM, Vol 27, No 1, Jan 1984, pp. 42--52.

Digital Library

[4]

R.M. Bell, T.J. Ostrand, and E.J. Weyuker. Looking for Bugs in All the Right Places. Proc. ACM/International Symposium on Software Testing and Analysis (ISSTA2006), Portland, Maine, July 2006, pp. 61--71.

Digital Library

[5]

L. Breiman. Random Forests. Machine Learning, Vol. 45, 2001, pp. 5--32.

Digital Library

[6]

G. Denaro and M. Pezze. An Empirical Evaluation of Fault--Proneness Models. Proc. International Conf on Software Engineering (ICSE2002), Miami, USA, May 2002.

Digital Library

[7]

S.G. Eick, T.L. Graves, A.F. Karr, J.S. Marron, A. Mockus. Does Code Decay? Assessing the Evidence from Change Management Data. IEEE Trans. on Software Engineering, Vol 27, No. 1, Jan 2001, pp. 1--12.

Digital Library

[8]

N.E. Fenton and N. Ohlsson. Quantitative Analysis of Faults and Failures in a Complex Software System. IEEE Trans. on Software Engineering, Vol 26, No 8, Aug 2000, pp. 797--814.

Digital Library

[9]

L. Guo, Y. Ma, B. Cukic, H. Singh. Robust Prediction of Fault--Proneness by Random Forests. Proc. ISSRE 2004, Saint--Malo, France, Nov. 2004.

Digital Library

[10]

L. Hatton. Reexamining the Fault Density -- Component Size Connection. IEEE Software, March/April 1997, pp. 89--97.

Digital Library

[11]

T.M. Khoshgoftaar, E.B. Allen, J. Deng. Using Regression Trees to Classify Fault--Prone Software Modules. IEEE Trans. on Reliability, Vol 51, No. 4, Dec 2002, pp. 455--462.

[12]

T.M. Khoshgoftaar, E.B. Allen, K.S. Kalaichelvan, N. Goel. Early Quality Prediction: A Case Study in Telecommunications. IEEE Software, Jan 1996, pp. 65--71.

Digital Library

[13]

A.G. Koru and H. Liu. An Investigation of the Effect of Module Size on Defect Prediction Using Static Measures. 2005 Promise Workshop, May 15, 2005.

Digital Library

[14]

P. McCullagh and J.A. Nelder. Generalized Linear Models, Second Edition, Chapman and Hall, London, 1989.

[15]

T. Menzies, J.S. Di Stefano, C. Cunanan, and R. Chapman. Mining Repositories to Assist in Project Planning and Resource Allocation. Innternational Workshop on Mining Software Repositories, May 2004.

[16]

K--H. Moller and D.J. Paulish. An Empirical Investigation of Software Fault Distribution. Proc. IEEE First International Software Metrics Symposium, Baltimore, Md., May 21--22, 1993, pp. 82--90.

[17]

J.C. Munson and T.M. Khoshgoftaar. The Detection of Fault--Prone Programs. IEEE Trans. on Software Engineering, Vol 18, No 5, May 1992, pp. 423--433.

Digital Library

[18]

T. Ostrand and E.J. Weyuker. The Distribution of Faults in a Large Industrial Software System. Proc. ACM/International Symposium on Software Testing and Analysis (ISSTA2002), Rome, Italy, July 2002, pp. 55--64.

Digital Library

[19]

T.J. Ostrand, E.J. Weyuker, and R.M. Bell. Predicting the Location and Number of Faults in Large Software Systems. IEEE Trans. on Software Engineering, Vol 31, No 4, April 2005.

Digital Library

[20]

T.J. Ostrand, E.J. Weyuker, and R.M. Bell. Automating Algorithms for the Identification of Fault--Prone Files. Proc. ACM/International Symposium on Software Testing and Analysis (ISSTA07), London, England, July 2007.

Digital Library

[21]

M. Pighin and A. Marzona. An Empirical Analysis of Fault Persistence Through Software Releases. Proc. IEEE/ACM ISESE 2003, pp. 206--212.

Digital Library

[22]

G. Succi, W. Pedrycz, M. Stefanovic, and J. Miller. Practical Assessment of the Models for Identification of Defect--prone Classes in Object-oriented Commercial Systems Using Design Metrics. Journal of Systems and Software, Vol 65, No 1, Jan 2003, pp. 1--12.

Digital Library

[23]

The R Project for Statistical Computing. http://www.r-project.org/

[24]

The rpart Package. http://cran.r-project.org/doc/packages/rpart.pdf

Cited By

Pandit MVarma N(2019)A Deep Introduction to AI Based Software Defect Prediction (SDP) and its Current ChallengesTENCON 2019 - 2019 IEEE Region 10 Conference (TENCON)10.1109/TENCON.2019.8929661(284-290)Online publication date: Oct-2019
https://doi.org/10.1109/TENCON.2019.8929661
Tantithamthavorn CMcIntosh SHassan AMatsumoto KDillon LVisser WWilliams L(2016)Automated parameter optimization of classification techniques for defect prediction modelsProceedings of the 38th International Conference on Software Engineering10.1145/2884781.2884857(321-332)Online publication date: 14-May-2016
https://dl.acm.org/doi/10.1145/2884781.2884857
Hall TBeecham SBowes DGray DCounsell S(2012)A Systematic Literature Review on Fault Prediction Performance in Software EngineeringIEEE Transactions on Software Engineering10.1109/TSE.2011.10338:6(1276-1304)Online publication date: 1-Nov-2012
https://dl.acm.org/doi/10.1109/TSE.2011.103
Show More Cited By

Index Terms

Comparing negative binomial and recursive partitioning models for fault prediction
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation

Recommendations

Comparing the effectiveness of several modeling methods for fault prediction

We compare the effectiveness of four modeling methods--negative binomial regression, recursive partitioning, random forests and Bayesian additive regression trees--for predicting the files likely to contain the most faults for 28 to 35 releases of three ...
Class level fault prediction using software clustering
ASE '13: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering

Defect prediction approaches use software metrics and fault data to learn which software properties associate with faults in classes. Existing techniques predict fault-prone classes in the same release (intra) or in a subsequent releases (inter) of a ...
Empirical study of fault prediction for open‐source systems using the Chidamber and Kemerer metrics

Software testers are usually provoked with projects that have faults. Predicting a class's fault‐proneness is vital for minimising cost and improving the effectiveness of the software testing. Previous research on software metrics has shown strong ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PROMISE '08: Proceedings of the 4th international workshop on Predictor models in software engineering

May 2008

108 pages

ISBN:9781605580364

DOI:10.1145/1370788

General Chair:
Boetticher Boetticher
University of Houston - Clear Lake, USA
,
Program Chair:
Tom Ostrand
AT&T, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 May 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICSE '08

Sponsor:

ICSE '08: International Conference on Software Engineering

May 12 - 13, 2008

Leipzig, Germany

Acceptance Rates

PROMISE '08 Paper Acceptance Rate 13 of 16 submissions, 81%;

Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
237
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pandit MVarma N(2019)A Deep Introduction to AI Based Software Defect Prediction (SDP) and its Current ChallengesTENCON 2019 - 2019 IEEE Region 10 Conference (TENCON)10.1109/TENCON.2019.8929661(284-290)Online publication date: Oct-2019
https://doi.org/10.1109/TENCON.2019.8929661
Tantithamthavorn CMcIntosh SHassan AMatsumoto KDillon LVisser WWilliams L(2016)Automated parameter optimization of classification techniques for defect prediction modelsProceedings of the 38th International Conference on Software Engineering10.1145/2884781.2884857(321-332)Online publication date: 14-May-2016
https://dl.acm.org/doi/10.1145/2884781.2884857
Hall TBeecham SBowes DGray DCounsell S(2012)A Systematic Literature Review on Fault Prediction Performance in Software EngineeringIEEE Transactions on Software Engineering10.1109/TSE.2011.10338:6(1276-1304)Online publication date: 1-Nov-2012
https://dl.acm.org/doi/10.1109/TSE.2011.103
Zafar HRana ZShamail SAwais M(2012)Finding focused itemsets from software defect data2012 15th International Multitopic Conference (INMIC)10.1109/INMIC.2012.6511437(418-423)Online publication date: Dec-2012
https://doi.org/10.1109/INMIC.2012.6511437
Shin YBell ROstrand TWeyuker E(2012)On the use of calling structure information to improve fault predictionEmpirical Software Engineering10.1007/s10664-011-9165-917:4-5(390-423)Online publication date: 1-Aug-2012
https://dl.acm.org/doi/10.1007/s10664-011-9165-9
Yang XTang KYao X(2012)A learning-to-rank algorithm for constructing defect prediction modelsProceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning10.1007/978-3-642-32639-4_21(167-175)Online publication date: 29-Aug-2012
https://dl.acm.org/doi/10.1007/978-3-642-32639-4_21
Shin YBell ROstrand TWeyuker E(2009)Does calling structure information improve the accuracy of fault prediction?Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories10.1109/MSR.2009.5069481(61-70)Online publication date: 16-May-2009
https://dl.acm.org/doi/10.1109/MSR.2009.5069481

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten