article

Theory of relative defect proneness

Authors:

A. Güneş Koru,

Khaled El Emam,

Dongsong Zhang,

Divya MathewAuthors Info & Claims

Empirical Software Engineering, Volume 13, Issue 5

Pages 473 - 498

https://doi.org/10.1007/s10664-008-9080-x

Published: 01 October 2008 Publication History

Abstract

In this study, we investigated the functional form of the size-defect relationship for software modules through replicated studies conducted on ten open-source products. We consistently observed a power-law relationship where defect proneness increases at a slower rate compared to size. Therefore, smaller modules are proportionally more defect prone. We externally validated the application of our results for two commercial systems. Given limited and fixed resources for code inspections, there would be an impressive improvement in the cost-effectiveness, as much as 341% in one of the systems, if a smallest-first strategy were preferred over a largest-first one. The consistent results obtained in this study led us to state a theory of relative defect proneness (RDP): In large-scale software systems, smaller modules will be proportionally more defect-prone compared to larger ones. We suggest that practitioners consider our results and give higher priority to smaller modules in their focused quality assurance efforts.

References

[1]

Akiyama F (1971) An example of software system debuggings. In: Information processing 71, Proceedings of IFIP congress 71, vol 1. IFIP, Amsterdam, pp 353-359.

[2]

Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, Heidelberg.

[3]

Askari M, Holt R (2006) Information theoretic evaluation of change prediction models for large-scale software. In: Workshop on mining software repositories, MSR 2006, ICSE workshop, Shanghai, 22-23 May 2006.

Digital Library

[4]

Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun ACM 27(1):42-52.

Digital Library

[5]

Briand LC, Basili VR, Hetmanski CJ (1993) Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Trans Softw Eng 19(11): 1028-1044.

Digital Library

[6]

Briand LC, Bunse C, Daly JW (2001) A controlled experiment for evaluating quality guidelines on the maintainability of object-oriented designs. IEEE Trans Softw Eng 27(6):513-530.

Digital Library

[7]

Briand LC, Melo WL, Wüst J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706-720.

Digital Library

[8]

Chayes F (1971) Ratio correlation: a manual for students of petrology and geochemistry. University of Chicago Press, Chicago.

[9]

Compton BT, Withrow C (1990) Prediction and control of ada software defects. J Syst Softw 12(3):199-207.

Digital Library

[10]

Cox DR (1972) Regression models and life tables. J Royal Stat Soc 34:187-220.

[11]

El Emam K (2005) The ROI from software quality. Auerbach Publications, Taylor and Francis Group, LLC, Boca Raton.

[12]

El Emam K, Benlarbi S, Goel N, Rai SN (2001) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630-650.

Digital Library

[13]

El Emam K, Benlarbi S, Goel N, Melo W, Lounis H, Rai SN (2002) The optimal class size for object-oriented software. IEEE Trans Softw Eng 28(5):494-509.

Digital Library

[14]

Fenton N, Pfleeger SL (1996) Software metrics: a rigorous and practical approach, 2nd edn. PWS, Boston.

[15]

Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675-689.

Digital Library

[16]

Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797-814.

Digital Library

[17]

Funami Y, Halstead MH (1976) A software physics analysis of akiyama's debugging data. In: Proceedings of MRI XXIV international symposium on computer software engineering. IEEE, Piscataway, pp 133-138.

[18]

Gaffney JE (1984) Estimating the number of faults in code. IEEE Trans Softw Eng 10(4):459-465.

Digital Library

[19]

Halstead MH (1977) Elements of software science. Elsevier, Amsterdam.

[20]

Harrell FE (2001) Regression modeling strategies: with applications to linear modes, logistic regression, and survival analysis. Springer, Heidelberg.

[21]

Harrell FE (2005) Design: design package. R package version 2.0-12. http://biostat.mc.vanderbilt. edu/twiki/bin/view/Main/Design.

[22]

Harvey AC, Collier P (1977) Testing for functional misspecification in regression analysis. J Econom 6(1):103-119.

[23]

Hatton L (1997) Reexamining the fault density-component size connection. IEEE Softw 14(2):89-97.

Digital Library

[24]

Hatton L (1998) Does oo sync with how we think? IEEE Softw 15(3):46-54.

Digital Library

[25]

Hosmer DW, Lemeshow S (1999) Applied survival analysis: regression modeling of time to event data. Wiley, New York.

[26]

Khoshgoftaar TM, Allen EB, Hudepohl J, Aud S (1997) Applications of neural networks to software quality modeling of a very large telecommunications system. IEEE Trans Neural Netw 8(4): 902-909.

Digital Library

[27]

Koru AG, Tian J (2003) An empirical comparison and characterization of high defect and high complexity modules. J Syst Softw 67(3):153-163.

Digital Library

[28]

Koru AG, Tian J (2004) Defect handling in medium and large open source projects. Softw IEEE 21(4):54-61.

Digital Library

[29]

Koru AG, Ma L, Li Z (2003) Utilizing operational profile in refactoring large scale legacy systems. In: WCRE 2003: first IEEE international workshop on refactoring: achievements, challenges, effects, Victoria, November 2003.

[30]

Koru AG, Zhang D, Liu, H (2007) Modeling the effect of size on defect proneness for open-source software. In: Predictor models in software engineering, PROMISE'07, 20-26 May 2007.

[31]

Lipow M (1982) Number of faults per line of code. IEEE Trans Softw Eng 8(4):437-439.

Digital Library

[32]

McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 2(6):308-320.

Digital Library

[33]

Meine JPvdM, Miguel AR (2007) Correlations between internal software metrics and software dependability in a large population of small c/c++ programs. In: The 18th IEEE international symposium on software reliability. IEEE, Trollhattan, pp 203-208.

[34]

Mockus A, Fielding RT, Herbsleb J (2002) Two case studies of open source software development: apache and mozilla. ACM Trans Softw Eng Methodol 11(3):309-346.

Digital Library

[35]

Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE Trans Softw Eng 18(5):423-433.

Digital Library

[36]

Newman MEJ (2005) Power laws, pareto distributions and zipf's law. Contemp Phys 46:323.

[37]

Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340-355.

Digital Library

[38]

Promise (2007) Promise data repository.

[39]

R Development Core Team (2003) R: a language and environment for statistical computing. ISBN 3-900051-00-3.

[40]

Raymond ES (1999) The Cathedral and the Bazaar: musings on Linux and open source by an accidental revolutionary. O'Reilly, Sebastopol.

[41]

Rosenberg J (1997) Some misconceptions about lines of code. In: METRICS '97: Proceedings of the 4th international symposium on software metrics. IEEE Computer Society, Washington, DC, pp 137-142.

[42]

Schmidt DC (1995) Using design patterns to develop reusable object-oriented communication software. Commun ACM 38(10):65-74.

Digital Library

[43]

Scientific Toolworks I (2003) Understand for c++: user guide and reference manual, January. I Scientific Toolworks, St. George.

[44]

Shen VY, Yu TJ, Thebaut SM, Paulsen L (1985) Identifying error-prone software - an empirical study. IEEE Trans Softw Eng 11(4):317-324.

Digital Library

[45]

Therneau TM (1999) Survival: survival analysis package, including penalized likelihood. R package v. 2.29. http://cran.r-project.org/web/packages/survival/index.html

[46]

Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, Heidelberg.

[47]

Tian J, Troster J (1998) A comparison of measurement and defect characteristics of new and legacy software systems. J Syst Softw 44(2):135-146.

Digital Library

[48]

Troster J, Tian J (1995) Defect characteristics of legacy software: measurement, visualization, regression analysis, and tree-based modeling. Technical report, IBM SWS Toronto Laboratory, March.

[49]

Withrow C (1990) Error density and size in ada software. IEEE Softw 7(1):26-30.

Digital Library

Cited By

Abidi MRahman MOpenja MKhomh F(2024)Design smells in multi-language systems and bug-proneness: a survival analysisEmpirical Software Engineering10.1007/s10664-024-10476-229:5Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1007/s10664-024-10476-2
C. SMenzies T(2023)Assessing the Early Bird Heuristic (for Predicting Project Quality)ACM Transactions on Software Engineering and Methodology10.1145/358356532:5(1-39)Online publication date: 24-Jul-2023
https://dl.acm.org/doi/10.1145/3583565
Pandey STripathi A(2023)DBDNN-Estimator: A Cross-Project Number of Fault Estimation TechniqueSN Computer Science10.1007/s42979-023-02364-15:1Online publication date: 19-Nov-2023
https://dl.acm.org/doi/10.1007/s42979-023-02364-1
Show More Cited By

Index Terms

Recommendations

Testing the theory of relative defect proneness for closed-source software

Recent studies on open-source software (OSS) products report that smaller modules are proportionally more defect prone compared to larger ones. This phenomenon, referred to as the Theory of Relative Defect Proneness (RDP), challenges the traditional QA ...
The Theory of Relative Dependency: Higher Coupling Concentration in Smaller Modules

Recent studies have repeatedly found that smaller modules are proportionally more defect-prone. In this article, the authors formulate and test a hypothesis stating that smaller modules are proportionally more coupled, given that dependencies caused by ...
Software Science Revisited: A Critical Analysis of the Theory and Its Empirical Support

The theory of software science was developed by the late M. H. Halstead of Purdue University during the early 1970's. It was first presented in unified form in the monograph Elements of Software Science published by Elsevier North-Holland in 1977. Since ...

Comments

Information & Contributors

Information

Published In

cover image Empirical Software Engineering

Empirical Software Engineering Volume 13, Issue 5

October 2008

124 pages

ISSN:1382-3256

Issue’s Table of Contents

Copyright © Copyright © 2008 Springer Science+Business Media, LLC.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 October 2008

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
1
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Abidi MRahman MOpenja MKhomh F(2024)Design smells in multi-language systems and bug-proneness: a survival analysisEmpirical Software Engineering10.1007/s10664-024-10476-229:5Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1007/s10664-024-10476-2
C. SMenzies T(2023)Assessing the Early Bird Heuristic (for Predicting Project Quality)ACM Transactions on Software Engineering and Methodology10.1145/358356532:5(1-39)Online publication date: 24-Jul-2023
https://dl.acm.org/doi/10.1145/3583565
Pandey STripathi A(2023)DBDNN-Estimator: A Cross-Project Number of Fault Estimation TechniqueSN Computer Science10.1007/s42979-023-02364-15:1Online publication date: 19-Nov-2023
https://dl.acm.org/doi/10.1007/s42979-023-02364-1
Meng DGuerriero MMachiry AAghakhani HBose PContinella AKruegel CVigna GCao JHo Au MLin ZYung M(2021)Bran: Reduce Vulnerability Search Space in Large Open Source Repositories by Learning Bug SymptomsProceedings of the 2021 ACM Asia Conference on Computer and Communications Security10.1145/3433210.3453115(731-743)Online publication date: 24-May-2021
https://dl.acm.org/doi/10.1145/3433210.3453115
Abidi MRahman MOpenja MKhomh F(2021)Are Multi-Language Design Smells Fault-Prone? An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/343269030:3(1-56)Online publication date: 11-Feb-2021
https://dl.acm.org/doi/10.1145/3432690
Lee JLee TIn H(2020)Topic Modeling Based Warning Prioritization from Change Sets of Software RepositoryJournal of Computer Science and Technology10.1007/s11390-020-0047-835:6(1461-1479)Online publication date: 30-Nov-2020
https://dl.acm.org/doi/10.1007/s11390-020-0047-8
Sun CDai HLiu HChen TCai K(2019)Adaptive Partition TestingIEEE Transactions on Computers10.1109/TC.2018.286604068:2(157-169)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1109/TC.2018.2866040
Turabieh HMafarja MLi X(2019)Iterated feature selection algorithms with layered recurrent neural network for software fault predictionExpert Systems with Applications: An International Journal10.1016/j.eswa.2018.12.033122:C(27-42)Online publication date: 15-May-2019
https://dl.acm.org/doi/10.1016/j.eswa.2018.12.033
Johannes DKhomh FAntoniol G(2019)A large-scale empirical study of code smells in JavaScript projectsSoftware Quality Journal10.1007/s11219-019-09442-927:3(1271-1314)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1007/s11219-019-09442-9
Zhou YYang YLu HChen LLi YZhao YQian JXu B(2018)How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect PredictionACM Transactions on Software Engineering and Methodology10.1145/318333927:1(1-51)Online publication date: 16-Apr-2018
https://dl.acm.org/doi/10.1145/3183339
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents