Validity and reliability of evaluation procedures in comparative studies of effort prediction models

Myrtveit, Ingunn; Stensrud, Erik

doi:10.1007/s10664-011-9183-7

Validity and reliability of evaluation procedures in comparative studies of effort prediction models

Published: 01 November 2011

Volume 17, pages 23–33, (2012)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Ingunn Myrtveit² &
Erik Stensrud^1,2

824 Accesses
Explore all metrics

Abstract

We have in previous studies reported our findings and concern about the reliability and validity of the evaluation procedures used in comparative studies on competing effort prediction models. In particular, we have raised concerns about the use of accuracy statistics to rank and select models. Our concern is strengthened by the observed lack of consistent findings. This study offers more insights into the causes of conclusion instability by elaborating on the findings of our previous work concerning the reliability and validity of the evaluation procedures. We show that model selection based on the accuracy statistics MMRE, MMER, MBRE, and MIBRE contribute to conclusion instability as well as selection of inferior models. We argue and show that the evaluation procedure must include an evaluation of whether the functional form of the prediction model makes sense to better prevent selection of inferior models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single Case Method in Psychology: How to Improve as a Possible Methodology in Quantitative Research

Article 07 May 2015

Designing and evaluating tasks to measure individual differences in experimental psychology: a tutorial

Article Open access 27 February 2024

Likeability in subjective performance evaluations: does it bias managers’ weighting of performance measures?

Article 19 March 2020

Notes

The 300 papers are journal papers from selected journals. In addition, there is an unknown number of conference papers.

References

Banker RD, Kemerer CF (1989) Scale Economies in New Software Development. IEEE Trans Software Eng 15(10):1199–1205
Article Google Scholar
Boehm BW (1981) Software Engineering Economics. Prentice-Hall, London
MATH Google Scholar
Carmines EG and Zeller RA (1979) Reliability and Validity Assessment, Sage University Papers
Conte SD, Dunsmore HE, Shen VY (1986) Software Engineering Metrics and Models. Benjamin/Cummings, Menlo Park
Google Scholar
Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A Simulation Study of the Model Evaluation Criterion MMRE. IEEE Trans Softw Eng 29(11):985–995
Article Google Scholar
Gujarati DN (2003) Basic Econometrics, 4ed, McGrawHill
Hendry DF, Richard JF (1983) The Econometric Analysis of Economic Time series. International Statistics Review 51:3–33
MathSciNet Google Scholar
Jørgensen M, Shepperd M (2007) A Systematic Review of Software Development Cost Estimation Studies. IEEE Trans Softw Eng 33(1):33–53
Article Google Scholar
Kashigan SK (1991) Multivariate Statistical Analysis. A Conceptual Introduction, 2nd edn. Radius Press, New York
Google Scholar
Kitchenham B and Mendes E (2009) Why Comparative Effort Prediction Studies may be Invalid. Proceedings of the 5th International Conference on Predictor Models in Software Engineering (PROMISE '09).1–5
Kitchenham BA, MacDonell SG, Pickard LM, Shepperd MJ (2001) What Accuracy Statistics Really Measure. IEE Proceedings Software 148(3):81–85
Article Google Scholar
Korte M and Port D (2008) Confidence in Software Cost Estimation Results based on MMRE and PRED. Proceedings of the 5th International Conference on Predictor Models in Software Engineering (PROMISE'08). 63–70
Miyazaki Y, Terakado M, Ozaki K, Nozaki H (1994) Robust Regression for Developing Software Estimation Models. J Syst Softw 27:3–16
Article Google Scholar
Myrtveit I, Stensrud E (1999) A Controlled Experiment to Assess the Benefits of Estimating with Analogy and Regression Models. IEEE Trans Softw Eng 25(4):510–525
Article Google Scholar
Myrtveit I, Stensrud E, Shepperd MJ (2005) Reliability and Validity in Comparative Studies of Software Prediction Models. IEEE Trans Softw Eng 31(5):380–391
Article Google Scholar

Download references

Author information

Authors and Affiliations

Det Norske Veritas (DNV), Oslo, Norway
Erik Stensrud
BI Norwegian Business School, Oslo, Norway
Ingunn Myrtveit & Erik Stensrud

Authors

Ingunn Myrtveit
View author publications
You can also search for this author in PubMed Google Scholar
Erik Stensrud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erik Stensrud.

Additional information

Editors: Martin Shepperd and Tim Menzies

Rights and permissions

Reprints and permissions

About this article

Cite this article

Myrtveit, I., Stensrud, E. Validity and reliability of evaluation procedures in comparative studies of effort prediction models. Empir Software Eng 17, 23–33 (2012). https://doi.org/10.1007/s10664-011-9183-7

Download citation

Published: 01 November 2011
Issue Date: February 2012
DOI: https://doi.org/10.1007/s10664-011-9183-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Validity and reliability of evaluation procedures in comparative studies of effort prediction models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Single Case Method in Psychology: How to Improve as a Possible Methodology in Quantitative Research

Designing and evaluating tasks to measure individual differences in experimental psychology: a tutorial

Likeability in subjective performance evaluations: does it bias managers’ weighting of performance measures?

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Validity and reliability of evaluation procedures in comparative studies of effort prediction models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Single Case Method in Psychology: How to Improve as a Possible Methodology in Quantitative Research

Designing and evaluating tasks to measure individual differences in experimental psychology: a tutorial

Likeability in subjective performance evaluations: does it bias managers’ weighting of performance measures?

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation