article

Validity and reliability of evaluation procedures in comparative studies of effort prediction models

Authors:

Ingunn Myrtveit,

Erik StensrudAuthors Info & Claims

Empirical Software Engineering, Volume 17, Issue 1-2

Pages 23 - 33

https://doi.org/10.1007/s10664-011-9183-7

Published: 01 February 2012 Publication History

Abstract

We have in previous studies reported our findings and concern about the reliability and validity of the evaluation procedures used in comparative studies on competing effort prediction models. In particular, we have raised concerns about the use of accuracy statistics to rank and select models. Our concern is strengthened by the observed lack of consistent findings. This study offers more insights into the causes of conclusion instability by elaborating on the findings of our previous work concerning the reliability and validity of the evaluation procedures. We show that model selection based on the accuracy statistics MMRE, MMER, MBRE, and MIBRE contribute to conclusion instability as well as selection of inferior models. We argue and show that the evaluation procedure must include an evaluation of whether the functional form of the prediction model makes sense to better prevent selection of inferior models.

References

[1]

Banker RD, Kemerer CF (1989) Scale Economies in New Software Development. IEEE Trans Software Eng 15(10):1199-1205.

Digital Library

[2]

Boehm BW (1981) Software Engineering Economics. Prentice-Hall, London.

[3]

Carmines EG and Zeller RA (1979) Reliability and Validity Assessment, Sage University Papers.

[4]

Conte SD, Dunsmore HE, Shen VY (1986) Software Engineering Metrics and Models. Benjamin/Cummings, Menlo Park.

[5]

Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A Simulation Study of the Model Evaluation Criterion MMRE. IEEE Trans Softw Eng 29(11):985-995.

Digital Library

[6]

Gujarati DN (2003) Basic Econometrics, 4ed, McGrawHill.

[7]

Hendry DF, Richard JF (1983) The Econometric Analysis of Economic Time series. International Statistics Review 51:3-33.

[8]

Jørgensen M, Shepperd M (2007) A Systematic Review of Software Development Cost Estimation Studies. IEEE Trans Softw Eng 33(1):33-53.

[9]

Kashigan SK (1991) Multivariate Statistical Analysis. A Conceptual Introduction, 2nd edn. Radius Press, New York.

[10]

Kitchenham B and Mendes E (2009) Why Comparative Effort Prediction Studies may be Invalid. Proceedings of the 5th International Conference on Predictor Models in Software Engineering (PROMISE '09). 1-5.

[11]

Kitchenham BA, MacDonell SG, Pickard LM, Shepperd MJ (2001) What Accuracy Statistics Really Measure. IEE Proceedings Software 148(3):81-85.

[12]

Korte M and Port D (2008) Confidence in Software Cost Estimation Results based on MMRE and PRED. Proceedings of the 5th International Conference on Predictor Models in Software Engineering (PROMISE'08). 63-70.

[13]

Miyazaki Y, Terakado M, Ozaki K, Nozaki H (1994) Robust Regression for Developing Software Estimation Models. J Syst Softw 27:3-16.

Digital Library

[14]

Myrtveit I, Stensrud E (1999) A Controlled Experiment to Assess the Benefits of Estimating with Analogy and Regression Models. IEEE Trans Softw Eng 25(4):510-525.

Digital Library

[15]

Myrtveit I, Stensrud E, Shepperd MJ (2005) Reliability and Validity in Comparative Studies of Software Prediction Models. IEEE Trans Softw Eng 31(5):380-391.

Digital Library

Cited By

Mustyala SBisi M(2025)Ensembling Harmony Search Algorithm with case-based reasoning for software development effort estimationCluster Computing10.1007/s10586-024-04858-w28:2Online publication date: 1-Apr-2025
https://dl.acm.org/doi/10.1007/s10586-024-04858-w
Manchala PBisi M(2024)TSoptEE: two-stage optimization technique for software development effort estimationCluster Computing10.1007/s10586-024-04418-227:7(8889-8908)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1007/s10586-024-04418-2
Kitchenham BMadeyski LBudgen D(2023)SEGRESS: Software Engineering Guidelines for REporting Secondary StudiesIEEE Transactions on Software Engineering10.1109/TSE.2022.317409249:3(1273-1298)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1109/TSE.2022.3174092
Show More Cited By

Recommendations

A Simulation Study of the Model Evaluation Criterion MMRE

The Mean Magnitude of Relative Error, MMRE, is probably the most widely used evaluation criterion for assessing the performance of competing software prediction models. One purpose of MMRE is to assist us to select the best model. In this paper, we have ...
Comparative studies of the model evaluation criterions mmre and pred in software cost estimation research
ESEM '08: Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement

Software cost model research results depend on model accuracy criteria such as MMRE and PRED. Despite criticism, MMRE has emerged as the de facto standard criterion. Many alternatives have been proposed and studied, surprisingly however PRED, the second ...
Software development effort prediction: A study on the factors impacting the accuracy of fuzzy logic systems

Reliable effort prediction remains an ongoing challenge to software engineers. Traditional approaches to effort prediction such as the use of models derived from historical data, or the use of expert opinion are plagued with issues pertaining to their ...

Comments

Information & Contributors

Information

Published In

cover image Empirical Software Engineering

Empirical Software Engineering Volume 17, Issue 1-2

February 2012

127 pages

ISSN:1382-3256

Issue’s Table of Contents

Copyright © Copyright © 2012 Springer Science+Business Media, LLC.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 February 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mustyala SBisi M(2025)Ensembling Harmony Search Algorithm with case-based reasoning for software development effort estimationCluster Computing10.1007/s10586-024-04858-w28:2Online publication date: 1-Apr-2025
https://dl.acm.org/doi/10.1007/s10586-024-04858-w
Manchala PBisi M(2024)TSoptEE: two-stage optimization technique for software development effort estimationCluster Computing10.1007/s10586-024-04418-227:7(8889-8908)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1007/s10586-024-04418-2
Kitchenham BMadeyski LBudgen D(2023)SEGRESS: Software Engineering Guidelines for REporting Secondary StudiesIEEE Transactions on Software Engineering10.1109/TSE.2022.317409249:3(1273-1298)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1109/TSE.2022.3174092
Mittas NAngelis L(2020)Data‐driven benchmarking in software development effort estimationJournal of Software: Evolution and Process10.1002/smr.225832:9Online publication date: 3-Sep-2020
https://dl.acm.org/doi/10.1002/smr.2258
Nassif AAzzeh MIdri AAbran A(2019)Software Development Effort Estimation Using Regression Fuzzy ModelsComputational Intelligence and Neuroscience10.1155/2019/83672142019Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1155/2019/8367214
Ali AGravino C(2019)A systematic literature review of software effort prediction using machine learning methodsJournal of Software: Evolution and Process10.1002/smr.221131:10Online publication date: 25-Oct-2019
https://dl.acm.org/doi/10.1002/smr.2211
(2018)Duplex output software effort estimation model with self-guided interpretationInformation and Software Technology10.1016/j.infsof.2017.09.01094:C(1-13)Online publication date: 1-Feb-2018
https://dl.acm.org/doi/10.1016/j.infsof.2017.09.010
Gautam SSingh V(2018)The state‐of‐the‐art in software development effort estimationJournal of Software: Evolution and Process10.1002/smr.198330:12Online publication date: 12-Dec-2018
https://dl.acm.org/doi/10.1002/smr.1983
Nair VMenzies TSiegmund NApel SBodden ESchäfer WDeursen AZisman A(2017)Using bad learners to find good configurationsProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering10.1145/3106237.3106238(257-267)Online publication date: 21-Aug-2017
https://dl.acm.org/doi/10.1145/3106237.3106238
Lavazza LMorasca SMendes ECounsell SPetersen K(2017)On the Evaluation of Effort Estimation ModelsProceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering10.1145/3084226.3084260(41-50)Online publication date: 15-Jun-2017
https://dl.acm.org/doi/10.1145/3084226.3084260
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents