Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Theory of relative defect proneness

Published: 01 October 2008 Publication History

Abstract

In this study, we investigated the functional form of the size-defect relationship for software modules through replicated studies conducted on ten open-source products. We consistently observed a power-law relationship where defect proneness increases at a slower rate compared to size. Therefore, smaller modules are proportionally more defect prone. We externally validated the application of our results for two commercial systems. Given limited and fixed resources for code inspections, there would be an impressive improvement in the cost-effectiveness, as much as 341% in one of the systems, if a smallest-first strategy were preferred over a largest-first one. The consistent results obtained in this study led us to state a theory of relative defect proneness (RDP): In large-scale software systems, smaller modules will be proportionally more defect-prone compared to larger ones. We suggest that practitioners consider our results and give higher priority to smaller modules in their focused quality assurance efforts.

References

[1]
Akiyama F (1971) An example of software system debuggings. In: Information processing 71, Proceedings of IFIP congress 71, vol 1. IFIP, Amsterdam, pp 353-359.
[2]
Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, Heidelberg.
[3]
Askari M, Holt R (2006) Information theoretic evaluation of change prediction models for large-scale software. In: Workshop on mining software repositories, MSR 2006, ICSE workshop, Shanghai, 22-23 May 2006.
[4]
Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun ACM 27(1):42-52.
[5]
Briand LC, Basili VR, Hetmanski CJ (1993) Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Trans Softw Eng 19(11): 1028-1044.
[6]
Briand LC, Bunse C, Daly JW (2001) A controlled experiment for evaluating quality guidelines on the maintainability of object-oriented designs. IEEE Trans Softw Eng 27(6):513-530.
[7]
Briand LC, Melo WL, Wüst J (2002) Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng 28(7):706-720.
[8]
Chayes F (1971) Ratio correlation: a manual for students of petrology and geochemistry. University of Chicago Press, Chicago.
[9]
Compton BT, Withrow C (1990) Prediction and control of ada software defects. J Syst Softw 12(3):199-207.
[10]
Cox DR (1972) Regression models and life tables. J Royal Stat Soc 34:187-220.
[11]
El Emam K (2005) The ROI from software quality. Auerbach Publications, Taylor and Francis Group, LLC, Boca Raton.
[12]
El Emam K, Benlarbi S, Goel N, Rai SN (2001) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630-650.
[13]
El Emam K, Benlarbi S, Goel N, Melo W, Lounis H, Rai SN (2002) The optimal class size for object-oriented software. IEEE Trans Softw Eng 28(5):494-509.
[14]
Fenton N, Pfleeger SL (1996) Software metrics: a rigorous and practical approach, 2nd edn. PWS, Boston.
[15]
Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675-689.
[16]
Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797-814.
[17]
Funami Y, Halstead MH (1976) A software physics analysis of akiyama's debugging data. In: Proceedings of MRI XXIV international symposium on computer software engineering. IEEE, Piscataway, pp 133-138.
[18]
Gaffney JE (1984) Estimating the number of faults in code. IEEE Trans Softw Eng 10(4):459-465.
[19]
Halstead MH (1977) Elements of software science. Elsevier, Amsterdam.
[20]
Harrell FE (2001) Regression modeling strategies: with applications to linear modes, logistic regression, and survival analysis. Springer, Heidelberg.
[21]
Harrell FE (2005) Design: design package. R package version 2.0-12. http://biostat.mc.vanderbilt. edu/twiki/bin/view/Main/Design.
[22]
Harvey AC, Collier P (1977) Testing for functional misspecification in regression analysis. J Econom 6(1):103-119.
[23]
Hatton L (1997) Reexamining the fault density-component size connection. IEEE Softw 14(2):89-97.
[24]
Hatton L (1998) Does oo sync with how we think? IEEE Softw 15(3):46-54.
[25]
Hosmer DW, Lemeshow S (1999) Applied survival analysis: regression modeling of time to event data. Wiley, New York.
[26]
Khoshgoftaar TM, Allen EB, Hudepohl J, Aud S (1997) Applications of neural networks to software quality modeling of a very large telecommunications system. IEEE Trans Neural Netw 8(4): 902-909.
[27]
Koru AG, Tian J (2003) An empirical comparison and characterization of high defect and high complexity modules. J Syst Softw 67(3):153-163.
[28]
Koru AG, Tian J (2004) Defect handling in medium and large open source projects. Softw IEEE 21(4):54-61.
[29]
Koru AG, Ma L, Li Z (2003) Utilizing operational profile in refactoring large scale legacy systems. In: WCRE 2003: first IEEE international workshop on refactoring: achievements, challenges, effects, Victoria, November 2003.
[30]
Koru AG, Zhang D, Liu, H (2007) Modeling the effect of size on defect proneness for open-source software. In: Predictor models in software engineering, PROMISE'07, 20-26 May 2007.
[31]
Lipow M (1982) Number of faults per line of code. IEEE Trans Softw Eng 8(4):437-439.
[32]
McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 2(6):308-320.
[33]
Meine JPvdM, Miguel AR (2007) Correlations between internal software metrics and software dependability in a large population of small c/c++ programs. In: The 18th IEEE international symposium on software reliability. IEEE, Trollhattan, pp 203-208.
[34]
Mockus A, Fielding RT, Herbsleb J (2002) Two case studies of open source software development: apache and mozilla. ACM Trans Softw Eng Methodol 11(3):309-346.
[35]
Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE Trans Softw Eng 18(5):423-433.
[36]
Newman MEJ (2005) Power laws, pareto distributions and zipf's law. Contemp Phys 46:323.
[37]
Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340-355.
[38]
Promise (2007) Promise data repository.
[39]
R Development Core Team (2003) R: a language and environment for statistical computing. ISBN 3-900051-00-3.
[40]
Raymond ES (1999) The Cathedral and the Bazaar: musings on Linux and open source by an accidental revolutionary. O'Reilly, Sebastopol.
[41]
Rosenberg J (1997) Some misconceptions about lines of code. In: METRICS '97: Proceedings of the 4th international symposium on software metrics. IEEE Computer Society, Washington, DC, pp 137-142.
[42]
Schmidt DC (1995) Using design patterns to develop reusable object-oriented communication software. Commun ACM 38(10):65-74.
[43]
Scientific Toolworks I (2003) Understand for c++: user guide and reference manual, January. I Scientific Toolworks, St. George.
[44]
Shen VY, Yu TJ, Thebaut SM, Paulsen L (1985) Identifying error-prone software - an empirical study. IEEE Trans Softw Eng 11(4):317-324.
[45]
Therneau TM (1999) Survival: survival analysis package, including penalized likelihood. R package v. 2.29. http://cran.r-project.org/web/packages/survival/index.html
[46]
Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, Heidelberg.
[47]
Tian J, Troster J (1998) A comparison of measurement and defect characteristics of new and legacy software systems. J Syst Softw 44(2):135-146.
[48]
Troster J, Tian J (1995) Defect characteristics of legacy software: measurement, visualization, regression analysis, and tree-based modeling. Technical report, IBM SWS Toronto Laboratory, March.
[49]
Withrow C (1990) Error density and size in ada software. IEEE Softw 7(1):26-30.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 13, Issue 5
October 2008
124 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 October 2008

Author Tags

  1. Open---source software
  2. Planning for software quality assurance
  3. Size---defect relationship
  4. Software inspections
  5. Software metrics
  6. Software reviews
  7. Software science
  8. Software testing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Design smells in multi-language systems and bug-proneness: a survival analysisEmpirical Software Engineering10.1007/s10664-024-10476-229:5Online publication date: 3-Jul-2024
  • (2023)Assessing the Early Bird Heuristic (for Predicting Project Quality)ACM Transactions on Software Engineering and Methodology10.1145/358356532:5(1-39)Online publication date: 24-Jul-2023
  • (2023)DBDNN-Estimator: A Cross-Project Number of Fault Estimation TechniqueSN Computer Science10.1007/s42979-023-02364-15:1Online publication date: 19-Nov-2023
  • (2021)Bran: Reduce Vulnerability Search Space in Large Open Source Repositories by Learning Bug SymptomsProceedings of the 2021 ACM Asia Conference on Computer and Communications Security10.1145/3433210.3453115(731-743)Online publication date: 24-May-2021
  • (2021)Are Multi-Language Design Smells Fault-Prone? An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/343269030:3(1-56)Online publication date: 11-Feb-2021
  • (2020)Topic Modeling Based Warning Prioritization from Change Sets of Software RepositoryJournal of Computer Science and Technology10.1007/s11390-020-0047-835:6(1461-1479)Online publication date: 30-Nov-2020
  • (2019)Adaptive Partition TestingIEEE Transactions on Computers10.1109/TC.2018.286604068:2(157-169)Online publication date: 1-Feb-2019
  • (2019)Iterated feature selection algorithms with layered recurrent neural network for software fault predictionExpert Systems with Applications: An International Journal10.1016/j.eswa.2018.12.033122:C(27-42)Online publication date: 15-May-2019
  • (2019)A large-scale empirical study of code smells in JavaScript projectsSoftware Quality Journal10.1007/s11219-019-09442-927:3(1271-1314)Online publication date: 1-Sep-2019
  • (2018)How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect PredictionACM Transactions on Software Engineering and Methodology10.1145/318333927:1(1-51)Online publication date: 16-Apr-2018
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media