Abstract
We have in previous studies reported our findings and concern about the reliability and validity of the evaluation procedures used in comparative studies on competing effort prediction models. In particular, we have raised concerns about the use of accuracy statistics to rank and select models. Our concern is strengthened by the observed lack of consistent findings. This study offers more insights into the causes of conclusion instability by elaborating on the findings of our previous work concerning the reliability and validity of the evaluation procedures. We show that model selection based on the accuracy statistics MMRE, MMER, MBRE, and MIBRE contribute to conclusion instability as well as selection of inferior models. We argue and show that the evaluation procedure must include an evaluation of whether the functional form of the prediction model makes sense to better prevent selection of inferior models.
Similar content being viewed by others
Notes
The 300 papers are journal papers from selected journals. In addition, there is an unknown number of conference papers.
References
Banker RD, Kemerer CF (1989) Scale Economies in New Software Development. IEEE Trans Software Eng 15(10):1199–1205
Boehm BW (1981) Software Engineering Economics. Prentice-Hall, London
Carmines EG and Zeller RA (1979) Reliability and Validity Assessment, Sage University Papers
Conte SD, Dunsmore HE, Shen VY (1986) Software Engineering Metrics and Models. Benjamin/Cummings, Menlo Park
Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A Simulation Study of the Model Evaluation Criterion MMRE. IEEE Trans Softw Eng 29(11):985–995
Gujarati DN (2003) Basic Econometrics, 4ed, McGrawHill
Hendry DF, Richard JF (1983) The Econometric Analysis of Economic Time series. International Statistics Review 51:3–33
Jørgensen M, Shepperd M (2007) A Systematic Review of Software Development Cost Estimation Studies. IEEE Trans Softw Eng 33(1):33–53
Kashigan SK (1991) Multivariate Statistical Analysis. A Conceptual Introduction, 2nd edn. Radius Press, New York
Kitchenham B and Mendes E (2009) Why Comparative Effort Prediction Studies may be Invalid. Proceedings of the 5th International Conference on Predictor Models in Software Engineering (PROMISE '09).1–5
Kitchenham BA, MacDonell SG, Pickard LM, Shepperd MJ (2001) What Accuracy Statistics Really Measure. IEE Proceedings Software 148(3):81–85
Korte M and Port D (2008) Confidence in Software Cost Estimation Results based on MMRE and PRED. Proceedings of the 5th International Conference on Predictor Models in Software Engineering (PROMISE'08). 63–70
Miyazaki Y, Terakado M, Ozaki K, Nozaki H (1994) Robust Regression for Developing Software Estimation Models. J Syst Softw 27:3–16
Myrtveit I, Stensrud E (1999) A Controlled Experiment to Assess the Benefits of Estimating with Analogy and Regression Models. IEEE Trans Softw Eng 25(4):510–525
Myrtveit I, Stensrud E, Shepperd MJ (2005) Reliability and Validity in Comparative Studies of Software Prediction Models. IEEE Trans Softw Eng 31(5):380–391
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Martin Shepperd and Tim Menzies
Rights and permissions
About this article
Cite this article
Myrtveit, I., Stensrud, E. Validity and reliability of evaluation procedures in comparative studies of effort prediction models. Empir Software Eng 17, 23–33 (2012). https://doi.org/10.1007/s10664-011-9183-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-011-9183-7