Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2365324.2365333acmotherconferencesArticle/Chapter ViewAbstractPublication PagespromiseConference Proceedingsconference-collections
research-article

Alternative methods using similarities in software effort estimation

Published: 21 September 2012 Publication History

Abstract

A large variety of methods has been proposed in the literature about Software Cost Estimation, in order to increase accuracy when predicting the effort of developing new projects. Estimation by Analogy is one of the most studied techniques in this area the last 20 years. The popularity of the methodology can be explained by its accordance to human problem thinking and solving, the straightforward interpretation and the usually comparable accuracy to other methodologies. Furthermore, the methodology is essentially a special case of non-parametric regression, easily implementable and free of theoretical assumptions, based on the notion of "similarity" which is used to define "neighbors". All of these reasons led us to study the technique in more depth, considering alternative ways to exploit similarities, in order to assign weights to neighbors. In this paper, our aim is to review the existing weighting practices and explore some new iterative procedures from matrix algebra, which transform a similarity matrix to a bi-stochastic matrix (a matrix with row and column summing to 1). Specifically, we apply algorithms such as the Sinkhorn--Knopp and the Bregmanian Bi-Stochastication to similarity matrices of well-known software cost datasets in order to derive matrices that assign weights to the neighbors used for effort estimates. We investigate the sensitivity of the results with respect to the similarity function, focusing on a Gaussian kernel matrix with a tuning parameter. The promising results show that the new methods deserve a more thorough investigation and can be considered as generalization of the Estimation by Analogy method.

References

[1]
Jorgensen, R. M. and Shepperd, M. 2007. A systematic review of software development cost estimation studies. IEEE Trans Softw Eng, 33 (1), 33--53.
[2]
Berger, J. O. 1985. Statistical Decision Theory and Bayesian Analysis. Springer-Verlag.
[3]
Hogarth, R. M. 2005. Deciding analytically or Trusting Your Intuition?: The Advantages and Disadvantages of Analytic and Intuitive Thought. 67--82, Lawrence Erlbaum Associates Publishers.
[4]
Hammond, K. R. 1996. Human Judgment and Social Policy: Irreducible Uncertainty Inevitable Error Unavoidable Injustice. Oxford University Press, New York.
[5]
Silverman, B. G. 1985. Expert intuition and ill-structured problem solving. IEEE Trans Eng Manag, 32 (1), 29--33.
[6]
Boehm, B. W. 1981. Software Engineering Economics. New Jersey: Prentice-Hall. XXVII.
[7]
Albrecht, A. J. and Gaffney, J. E. 1983. Software function, source lines of code and development effort prediction: A software science validation. IEEE Trans Softw Eng, 9 (6), 639--648.
[8]
Li, Y. F., Xie, M. and Goh, T. N. 2010. Adaptive ridge regression system for software cost estimating on multicollinear datasets. J Syst Softw, 83 (11), 2332--2343.
[9]
Mukhopadhyay, T., Vicinanza, S. S. and Prietula, J. M. 1992. Examining the feasibility of a case based reasoning model for software effort estimation. MIS Quarterly, 16 (2), 155--171.
[10]
Shepperd, M. and Schofield, C. 1997. Estimating software project effort using analogies. IEEE Trans Softw Eng, 23 (11), 736--743.
[11]
Angelis, L. and Stamelos, I. 2000. A simulation tool for efficient analogy based cost estimation. Emp Softw Eng, 5, 35--68.
[12]
Hardle, W. 1990. Applied Non-parametric Regression. Economics Society Monographs, Cambridge University Press.
[13]
Mittas, N., Athanasiades, M. and Angelis, L. 2008. Improving analogy-based software cost estimation by a resampling method. Inform Softw Technol, 50 (3), 221--230.
[14]
Leung, HKN. 2002. Estimating maintenance effort by analogy. Empir Softw Eng, 7 (2), 157--175.
[15]
Kadoda, G., Cartwright, M., Chen, L. and Shepperd, M. J. 2000. Experiences using case-based reasoning to predict software project effort. In proceeding of the 4th Conference on empirical assessment in software engineering (EASE), Keele University, 1--23.
[16]
Knight, P. A. 2008. The Sinkhorn-Knopp algorithm: Convergence and applications. SIAM J Matrix Anal A, 30 (1), 261--275.
[17]
Wang, F., Li, P., König, A. and Wan, M. 2011. Improving clustering by learning a bi-stochastic data similarity matrix. Knowl Inf Syst, 8, 1--32.
[18]
Kocaguneli, E., Menzies, T., Bener, A. and Keung, J. 2011. Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans Softw Eng, 38, 425--438.
[19]
Li, Y., Xie, M. and Goh, T. 2009. A study of project selection and feature weighting for analogy based software cost estimation. J Syst Softw, 82, 241--252.
[20]
Azzeh, M. 2012. A replicated assessment and comparison of adaptation techniques for analogy-based effort estimation. Emp Softw Eng, 17, 90--127.
[21]
Mendes, E., Watson, I., Triggs, C., Mosley, N. and Counsell, S. 2003. A comparative study of cost estimation models for web hypermedia applications. Emp Softw Eng, 8 (2), 163--196.
[22]
Li, J. Z., Ruhe, G., Al-Emran, A. and Richter, M. 2007. A flexible method for software effort estimation by analogy. Empir Softw Eng, 12 (1), 65--106.
[23]
Azzeh, M., Neagu, D. and Cowling, P. 2010. Fuzzy grey relational analysis for software effort estimation. Empir Softw Eng, 15, 60--90.
[24]
Chiu, N. H. and Huang, S. J. 2007. The adjusted analogy-based software effort estimation based on similarity distances. J Syst Softw, 80, 628--640.
[25]
Walkerden, F. and Jeffery, D. R. 1999. An empirical study of analogy-based software effort Estimation. Empir Softw Eng, 4 (2), 135--158.
[26]
Kirsopp, C., Mendes, E., Premraj, R. and Shepperd, M. 2003. An empirical analysis of linear adaptation techniques for case-based prediction. In Proceedings of the 5th International Conference on Case-based Reasoning: Research and Development, (ICCBR'03), 231--245.
[27]
Jorgensen, M., Indahl, U. and Sjoberg, D. 2003. Software effort estimation by analogy and "regression toward the mean". J Syst Softw, 68, 253--262.
[28]
Shepperd, M. and Cartwright, M. 2005. A replication of the use of regression towards the mean (R2M) as an adjustment to effort estimation models. In Proceedings of the 11th IEEE International Symposium on Software Metrics (METRIC '05), 10--38.
[29]
Cox, D. R. and Miller, H. D. 1965. Theory of Stochastic Processes. London: Chapman & Hall.
[30]
Kaufman, L. and Rousseeuw, P. J. 1990. Finding Groups in Data: An Introduction to Cluster Analysis. New York.
[31]
Sinkhorn, R. and Knopp, P. 1967. Concerning nonnegative matrices and doubly stochastic matrices. Pac J Math, 21 (2), 343--348.
[32]
Sinkhorn, R. 1964. A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann Math Stat, 35, 876--879.
[33]
Efron, B. 1983. Estimating the error rate of a prediction rule improvement on cross-validation. J Am Stat Assoc, 78, 316--330.
[34]
Foss, T., Stensrud, E., Kitchenham, B. and Myrtveit, I. 2003. A simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng, 29 (11), 985--995.
[35]
Kitchenham, B., Pickard, L., MacDonell, S. and Shepperd, M. 2001. What accuracy statistics really measure. IEEE Proc. Softw, 148 (3), 81--85.
[36]
Sheskin, D. 2004. Handbook of Parametric and Nonparametric Statistical Procedures. 3rd Ed., Chapman & Hall.
[37]
Mittas, N. and Angelis, L. 2008. Comparing cost prediction models by resampling techniques. J Syst Softw, 81 (5), 616--632.
[38]
Kitchenham, B. and Mendes, E. 2009. Why comparative effort prediction studies may be invalid. In Proceedings of the ACM 5th International Conference on Predictor Models in Software Engineering (PROMISE'09), 1--5.
[39]
Efron, B. and Tibshirani, R. 1993. An Introduction to the Bootstrap. Chapman & Hall.
[40]
Mendes, E. and Kitchenham, B. 2004. Further comparison of cross-company and within company effort estimation models for web applications. In Proceedings of the 10th IEEE International Symposium on Software Metrics (METRIC '04), 348--357.
[41]
Mendes, E. and Lokan, C. 2008. Replicating studies on cross -vs single-company effort models using the ISBSG database. Emp Softw Eng, 13 (1), 3--37.
[42]
Wilkinson, L. and Friendly, M. 2009. The history of the cluster heat map. Am Stat., 63 (2), 179--184.
[43]
Abran, A. and Robillard, P. 1996. Function point analysis: an empirical study of its measurement processes. IEEE Trans Softw Eng, 22 (12), 895--909.
[44]
Desharnais, J. 1989. Analyse statistique de la productivitie des projets informatique a partie de la technique des point des function. Master's Thesis, University of Montreal.

Cited By

View all
  • (2024)Prediction of hydraulic conductivity of sand with multivariate-index properties using optimal machine-learning-based regression modelsEnvironmental Earth Sciences10.1007/s12665-024-11840-783:18Online publication date: 3-Sep-2024
  • (2023)A novel approach for quality control of automated production lines working under highly inconsistent conditionsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106149122:COnline publication date: 1-Jun-2023
  • (2021)Machine Learning Protocols in Early Cancer Detection Based on Liquid Biopsy: A SurveyLife10.3390/life1107063811:7(638)Online publication date: 30-Jun-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PROMISE '12: Proceedings of the 8th International Conference on Predictive Models in Software Engineering
September 2012
126 pages
ISBN:9781450312417
DOI:10.1145/2365324
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bregmanian bi-stochastication
  2. Sinkhorn -- Knopp
  3. estimation by analogy
  4. iterative algorithms
  5. software effort estimation

Qualifiers

  • Research-article

Conference

PROMISE '12

Acceptance Rates

PROMISE '12 Paper Acceptance Rate 12 of 24 submissions, 50%;
Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Prediction of hydraulic conductivity of sand with multivariate-index properties using optimal machine-learning-based regression modelsEnvironmental Earth Sciences10.1007/s12665-024-11840-783:18Online publication date: 3-Sep-2024
  • (2023)A novel approach for quality control of automated production lines working under highly inconsistent conditionsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106149122:COnline publication date: 1-Jun-2023
  • (2021)Machine Learning Protocols in Early Cancer Detection Based on Liquid Biopsy: A SurveyLife10.3390/life1107063811:7(638)Online publication date: 30-Jun-2021
  • (2017)An Empirical Analysis of Three-Stage Data-Preprocessing for Analogy-Based Software Effort Estimation on the ISBSG Data2017 IEEE International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS.2017.54(442-449)Online publication date: Jul-2017
  • (2017)A stability assessment of solution adaptation techniques for analogy-based software effort estimationEmpirical Software Engineering10.1007/s10664-016-9434-822:1(474-504)Online publication date: 1-Feb-2017
  • (2016)LSA-X: Exploiting Productivity Factors in Linear Size Adaptation for Analogy-Based Software Effort EstimationIEICE Transactions on Information and Systems10.1587/transinf.2015EDP7237E99.D:1(151-162)Online publication date: 2016
  • (2015)An empirical analysis of data preprocessing for machine learning-based software cost estimationInformation and Software Technology10.1016/j.infsof.2015.07.00467:C(108-127)Online publication date: 1-Nov-2015
  • (2014)Scaling up analogy-based software effort estimation: a comparison of multiple hadoop implementation schemesProceedings of the International Workshop on Innovative Software Development Methodologies and Practices10.1145/2666581.2666582(65-72)Online publication date: 16-Nov-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media