Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Analyzing and handling local bias for calibrating parametric cost estimation models

Published: 01 August 2013 Publication History

Abstract

ContextParametric cost estimation models need to be continuously calibrated and improved to assure more accurate software estimates and reflect changing software development contexts. Local calibration by tuning a subset of model parameters is a frequent practice when software organizations adopt parametric estimation models to increase model usability and accuracy. However, there is a lack of understanding about the cumulative effects of such local calibration practices on the evolution of general parametric models over time. ObjectiveThis study aims at quantitatively analyzing and effectively handling local bias associated with historical cross-company data, thus improves the usability of cross-company datasets for calibrating and maintaining parametric estimation models. MethodWe design and conduct three empirical studies to measure, analyze and address local bias in cross-company dataset, including: (1) defining a method for measuring the local bias associated with individual organization data subset in the overall dataset; (2) analyzing the impacts of local bias on the performance of an estimation model; (3) proposing a weighted sampling approach to handle local bias. The studies are conducted on the latest COCOMO II calibration dataset. ResultsOur results show that the local bias largely exists in cross company dataset, and the local bias negatively impacts the performance of parametric model. The local bias based weighted sampling technique helps reduce negative impacts of local bias on model performance. ConclusionLocal bias in cross-company data does harm model calibration and adds noisy factors to model maintenance. The proposed local bias measure offers a means to quantify degree of local bias associated with a cross-company dataset, and assess its influence on parametric model performance. The local bias based weighted sampling technique can be applied to trade-off and mitigate potential risk of significant local bias, which limits the usability of cross-company data for general parametric model calibration and maintenance.

References

[1]
B.W. Boehm, C. Abts, A.W. Brown, Software Cost Estimation with Cocomo II, Prentice Hall PTR, 2000.
[2]
SEER-SEM User Manual, Galorath Inc., 2005.
[3]
Your Guide to PRICE S: Estimating Cost and Schedule of Software Development and Support, Mt. Laurel, New Jersey, PRICE Systems, LLC., 1998.
[4]
NASA Cost Estimation Handbook. <http://www.ceh.nasa.gov/webhelpfiles/Cost_Estimating_Handbook_NASA_2004.htm>.
[5]
Handbook for Software Cost Estimation (JPL). <http://www.ceh.nasa.gov/downloadfiles/Web%20Links/cost_hb_public-6-5.pdf>.
[6]
R.D. Stutzke, Estimating Software-Intensive Systems: Projects, Products, and Processes, Addison-Wesley Professional, 2005.
[7]
J. Chen, Y. Yang, V. Nguyen, Reducing the local bias in calibrating the general COCOMO, in: The 24th International Forum on COCOMO and Systems/Software Cost Modeling, 2009.
[8]
L. Bottou, O. Bousquet, The tradeoffs of large scale learning, Advances in Neural Information Processing Systems, 20 (2008) 161-168.
[9]
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning Data Mining, Inference, and Prediction, Springer, 2003.
[10]
L. Wasserman, All of Statistics: A Concise Course in Statistical Inference, Springer-Verlag, Berlin, 2003.
[11]
B.A. Kitchenham, E. Mendes, G.H. Travassos, Cross versus within-company cost estimation studies: a systematic review, IEEE Transactions on Software Engineering, 33 (2007) 316-329.
[12]
R. Jeffery, M. Ruhe, I. Wieczorek, A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data, Information and Software Technology, 42 (2000) 1009-1016.
[13]
B.A. Kitchenham, A procedure for analyzing unbalanced datasets, IEEE Transactions on Software Engineering, 24 (1998) 278-301.
[14]
Q. Liu, R. Mintram, Preliminary data analysis methods in software estimation, Software Quality Journal, 13 (2005) 91-115.
[15]
J.J. Cuadrado-Gallego, M.A. Sicilia, An algorithm for the generation of segmented parametric software estimation models and its empirical evaluation, Computing and Informatics, 26 (2007) 1-15.
[16]
B. Clark, S. Devnani-Chulani, B.W. Boehm, Calibrating the COCOMO II Post-Architecture model, in: Proceedings of the 1998 International Conference on Software Engineering, 1998, pp. 477-480.
[17]
V. Nguyen, B. Steece, B.W. Boehm, A constrained regression technique for COCOMO calibration, in: Proceedings of the second ACM-IEEE International Symposium on Empirical Software Engineering and, Measurement, 2008, pp. 213-222.
[18]
S. Chulani, B.W. Boehm, B. Steece, Bayesian analysis of empirical software engineering cost models, IEEE Transactions on Software Engineering, 25 (1999) 573-583.
[19]
Y. Yang, B. Clark, COCOMO II.2004 calibration status, in: 19th International Forum on COCOMO and Systems/Software Cost Modeling, 2004.
[20]
T. Menzies, J. Hihn, Evidence-based cost estimation for better quality software, IEEE Software, 23 (2006) 64-66.
[21]
T. Menzies, Z. Chen, J. Hihnet, Selecting best practices for effort estimation, IEEE Transactions on Software Engineering, 32 (2006) 883-895.
[22]
L. Xie, Y. Yang, D. Yang, Mean-variance combination (MVC): a new method for evaluating effort estimation models, in: The Symposium in Honor of Dr. Barry Boehm, 2011.
[23]
E. Kocaguneli, T. Menzies, A. Bener, Exploiting the essential assumptions of analogy-based effort estimation, IEEE Transactions on Software Engineering, 38 (2012) 425-438.
[24]
D. Port, M. Korte, Comparative studies of the model evaluation criterions MMRE and PRED on software cost estimation research, in: Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and, measurement, 2008, pp. 51-60.
[25]
M. Jørgensen, Evidence-based guidelines for assessment of software development cost uncertainty, IEEE Transactions on Software Engineering, 31 (2005) 942-954.
[26]
M. Jørgensen, K.H. Teigen, Uncertainty intervals versus interval uncertainty: an alternative method for eliciting effort prediction intervals in software development projects, in: Proceedings of International Conference on Project Management, 2002, pp. 343-352.
[27]
M. Jørgensen, K.H. Teigen, K.J. Moløkken-østvold, Better sure than safe? Overconfidence in judgment based software development effort prediction intervals, Journal of Systems and Software, 70 (2004) 79-93.
[28]
B. Turhan, On the dataset shift problem in software engineering prediction models, Journal of Empirical Software Engineering, 17 (2012) 62-74.
[29]
W.N. Street, Y.S. Kim, A streaming ensemble algorithm for large-scale classification, in: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 377-382.
[30]
R. Klinkenberg, Learning drifting concepts: example selection vs. example weighting, Intelligent Data Analysis, 8 (2004) 281-300.
[31]
J. Wen, S. Li, Z. Lin, Systematic literature review of machine learning based software development effort estimation models, Information and Software Technology, 54 (2012) 41-59.
[32]
F. Walkerden, R. Jeffery, An empirical study of analogy-based software effort estimation, Empirical Software Engineering, 4 (1999) 135-158.
[33]
M. Jørgensen, U. Indahl, D. Sjøberg, Software effort estimation by analogy and "Regression Toward the Mean", Journal of Systems and Software, 68 (2003) 253-262.
[34]
J. Huang, A. Smola, A. Gretton, et al., Correcting sample selection bias by unlabeled data, in: Proceedings of the 20th Annual Conference on Neural Information Processing Systems, 2006, pp. 601-608.
[35]
Y. Yang, L. Xie, Z. He, et al., Local bias and its impacts on the performance of parametric estimation models, in: Proceedings of the 7th International Conference on Predictive Models in, Software Engineering, 2011.
[36]
B.W. Boehm, Software Engineering Economics, Prentice Hall PTR, 1981.

Cited By

View all
  • (2019)Investigating the use of duration‐based windows and estimation by analogy for COCOMOJournal of Software: Evolution and Process10.1002/smr.217631:10Online publication date: 25-Oct-2019
  • (2018)Evaluation of Software Quality for Competition-based Software Crowdsourcing ProjectsProceedings of the 2018 7th International Conference on Software and Computer Applications10.1145/3185089.3185152(102-109)Online publication date: 8-Feb-2018
  • (2017)Research patterns and trends in software effort estimationInformation and Software Technology10.1016/j.infsof.2017.06.00291:C(1-21)Online publication date: 1-Nov-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information and Software Technology
Information and Software Technology  Volume 55, Issue 8
August 2013
175 pages

Publisher

Butterworth-Heinemann

United States

Publication History

Published: 01 August 2013

Author Tags

  1. COCOMO II
  2. Effort estimation
  3. Local bias
  4. Model maintenance
  5. Parametric model
  6. Weighted sampling

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Investigating the use of duration‐based windows and estimation by analogy for COCOMOJournal of Software: Evolution and Process10.1002/smr.217631:10Online publication date: 25-Oct-2019
  • (2018)Evaluation of Software Quality for Competition-based Software Crowdsourcing ProjectsProceedings of the 2018 7th International Conference on Software and Computer Applications10.1145/3185089.3185152(102-109)Online publication date: 8-Feb-2018
  • (2017)Research patterns and trends in software effort estimationInformation and Software Technology10.1016/j.infsof.2017.06.00291:C(1-21)Online publication date: 1-Nov-2017
  • (2017)Are delayed issues harder to resolve? Revisiting cost-to-fix of defects throughout the lifecycleEmpirical Software Engineering10.1007/s10664-016-9469-x22:4(1903-1935)Online publication date: 1-Aug-2017
  • (2016)Local-based active classification of test report to assist crowdsourced testingProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering10.1145/2970276.2970300(190-201)Online publication date: 25-Aug-2016
  • (2016)Metaheuristic optimization of multivariate adaptive regression splines for predicting the schedule of software projectsNeural Computing and Applications10.1007/s00521-015-2003-z27:8(2229-2240)Online publication date: 1-Nov-2016
  • (2016)Proceedings of the 31st IEEE/ACM International Conference on Automated Software EngineeringundefinedOnline publication date: 25-Aug-2016
  • (2015)Neural networks for predicting the duration of new software projectsJournal of Systems and Software10.1016/j.jss.2014.12.002101:C(127-135)Online publication date: 1-Mar-2015
  • (2015)Predictive accuracy comparison between neural networks and statistical regression for development effort of software projectsApplied Soft Computing10.1016/j.asoc.2014.10.03327:C(434-449)Online publication date: 1-Feb-2015

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media