research-article

Can we evaluate the quality of software engineering experiments?

Authors:

Barbara Kitchenham,

Dag I. K. Sjøberg,

O. Pearl Brereton,

Per RunesonAuthors Info & Claims

ESEM '10: Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement

Article No.: 2, Pages 1 - 8

https://doi.org/10.1145/1852786.1852789

Published: 16 September 2010 Publication History

Abstract

Context: The authors wanted to assess whether the quality of published human-centric software engineering experiments was improving. This required a reliable means of assessing the quality of such experiments. Aims: The aims of the study were to confirm the usability of a quality evaluation checklist, determine how many reviewers were needed per paper that reports an experiment, and specify an appropriate process for evaluating quality. Method: With eight reviewers and four papers describing human-centric software engineering experiments, we used a quality checklist with nine questions. We conducted the study in two parts: the first was based on individual assessments and the second on collaborative evaluations. Results: The inter-rater reliability was poor for individual assessments but much better for joint evaluations. Four reviewers working in two pairs with discussion were more reliable than eight reviewers with no discussion. The sum of the nine criteria was more reliable than individual questions or a simple overall assessment. Conclusions: If quality evaluation is critical, more than two reviewers are required and a round of discussion is necessary. We advise using quality criteria and basing the final assessment on the sum of the aggregated criteria. The restricted number of papers used and the relatively extensive expertise of the reviewers limit our results. In addition, the results of the second part of the study could have been affected by removing a time restriction on the review as well as the consultation process.

References

[1]

Abrahão, S. and Poels, G. 2007. Experimental evaluation of an object-oriented function. Information and Software Technology, 49(4), pp 366--380.

Digital Library

[2]

Afzal, W., Torkar, R. and Feldt, R. 2009. A systematic review of search-based testing for non-functional system properties. Information and Software Technology, 51(6), pp 959--976.

Digital Library

[3]

Bornmann, L. 2011. Scientific Peer Review. Annual Review of Information Science and Technology, 45, in press.

[4]

Bornmann, L., Mutz, R. and Daniel, D.-D. A reliability-generalization study of journal peer reviews -- a multi-level analysis of inter-rater reliability and its determinants, submitted.

[5]

Dybå, T. and Dingsøyr, T. 2008a. Strength of Evidence in Systematic Reviews in Software Engineering, ESEM'2008, pp 178--187.

Digital Library

[6]

Dybå, T. and Dingsøyr, T. 2008b. Empirical studies of agile software development: A systematic review. Information and Software Technology, 50(9--10), pp 833--859.

Digital Library

[7]

Dybå, T., Kitchenham, B. A. and Jørgensen, M. 2005. Evidence-based Software Engineering for Practitioners, IEEE Software, 22 (1), pp 58--65.

Digital Library

[8]

Jørgensen, M. and Moløkken-Østvold, K. 2000. Impact of effort estimates on software project work? A review of the 1994 Chaos report. Information and Software Technology, 48(4), pp 297--301.

[9]

Kampenes, V. B. 2007. Quality of Design Analysis and Reporting of Software Engineering Experiments. A Systematic Review. PhD Thesis, Dept. Informatics, University of Oslo.

[10]

Karahasanović, A. Levine, A. K. and Thomas, R. 2007. Comprehension strategies and difficulties in maintaining object-oriented systems: An explorative study. Journal of Systems and Software, 80, pp 1541--1559.

Digital Library

[11]

Karlsson, L., Thelin, T., Regnell, B., Berander, P. and Wohlin, C. 2007. Pair-wise comparisons versus planning game partitioning -- experts on requirements prioritisation techniques. Empirical Software Engineering, 12, pp 3--33.

Digital Library

[12]

Kitchenham, B. A., Brereton, O. P., Budgen, D. and Li, Z. 2009. An evaluation of quality checklist proposals -- A participant observer case study. EASE'09, BCS eWic.

Digital Library

[13]

Kitchenham, B., Brereton, P., Turner, M., Niazi, M., Linkman, S., Pretorius, R. and Budgen, D. Refining the systematic literature review process -- Two observer-participant case studies, accepted for publication in Empirical Software Engineering.

Digital Library

[14]

Kitchenham, B. A., Dybå, T. and Jørgensen, M. 2004 Evidence-based Software Engineering. Proceedings of the 26th International Conference on Software Engineering, (ICSE '04), IEEE Computer Society, Washington DC, USA, pp 273--281.

Digital Library

[15]

Liu, H. and Tan, H. B. K. 2008. Testing input validation in Web applications through automated model recovery. Journal of Systems and Software, 81, pp 222--233

Digital Library

[16]

Marsh, H. W., Jayasinghe, U. W. and Bond, N. W. 2008. Improving the Peer-Review Process for Grant Application. Reliability, Validity, Bias and Generalizability. American Psychologist, 63(3), pp 160--168.

[17]

Neff, B. D. and Olden, J. D. 2006. Is Peer Review a Game of Chance. BioScience, 56(4), pp 333--340.

[18]

Poolman, R. W., Keijser, L. C., de Waal Malefijt, M. C., Blankevoort, L., Farrokhyar, F., Bhandari, M. 2007. Reviewer agreement in scoring 419 abstracts for scientific orthopedics meetings. Acta Orthopaedica, 78(2) pp 278--284.

[19]

Rowe, B. H., Strome, T. L. Spooner, C., Blitz, S., Grafstein, E. and Worster, A. 2006. Reviewer agreement trends from four years of electronic submissions of conference abstract. BMC Medical Research Methodology, 6(14).

[20]

Schultz, D. M. 2009. Are three heads better than two? How the number of reviewers and editor behaviour affect the rejection rate. Scientometrics, Springer.

[21]

Shang, A., Huwwiler-Müntener, K., Nartney, L., Jüni, P., Dörig, S., Pwesner, D. and Egger, M. 2005. Are the clinical effects of homeopathy placebo effects? Comparative study of placebo-controlled trials of homeopathy and allopathy. Lancet, 366 (9487), pp 726--732.

[22]

Sjøberg, D. I. K., Dybå T. and Jørgensen. M. 2007. The Future of Empirical Methods in Software Engineering Research, In: Future of Software Engineering (FOSE '07), ed. by Briand L. and Wolf A., pp. 358--378, IEEE-CS Press.

Digital Library

[23]

Weller, A. C. 2002. Editorial Peer Review: Its Strengths and Weaknesses. Medford, NJ, USA. Information Today, Inc.

[24]

Wood, M., Roberts, M. and Howell, B. 2004. The reliability of peer reviews of papers on information systems. Journal of Information Science, 30(1), pp 2--11.

Cited By

Negri-Ribalta CGeraud-Stewart RSergeeva ALenzini G(2024)A systematic literature review on the impact of AI models on the security of code generationFrontiers in Big Data10.3389/fdata.2024.13867207Online publication date: 13-May-2024
https://doi.org/10.3389/fdata.2024.1386720
Wohlin CRuneson PHöst MOhlsson MRegnell BWesslén AWohlin CRuneson PHöst MOhlsson MRegnell BWesslén A(2024)Systematic Literature StudiesExperimentation in Software Engineering10.1007/978-3-662-69306-3_4(51-63)Online publication date: 8-May-2024
https://doi.org/10.1007/978-3-662-69306-3_4
Winter ENowack VBowes DCounsell SHall THaraldsson SWoodward J(2023) Let’s Talk With Developers, Not About Developers: A Review of Automatic Program Repair Research IEEE Transactions on Software Engineering10.1109/TSE.2022.315208949:1(419-436)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TSE.2022.3152089
Show More Cited By

Index Terms

Can we evaluate the quality of software engineering experiments?
1. Software and its engineering

Recommendations

Three empirical studies on the agreement of reviewers about the quality of software engineering experiments

Context: During systematic literature reviews it is necessary to assess the quality of empirical papers. Current guidelines suggest that two researchers should independently apply a quality checklist and any disagreements must be resolved. However, ...
Trends in the Quality of Human-Centric Software Engineering Experiments--A Quasi-Experiment

Context: Several text books and papers published between 2000 and 2002 have attempted to introduce experimental design and statistical methods to software engineers undertaking empirical studies. Objective: This paper investigates whether there has been ...
Evaluate the quality of foundational software platform by Bayesian network
AICI'10: Proceedings of the 2010 international conference on Artificial intelligence and computational intelligence: Part II

The software quality model and software quality measurement model are the basis of evaluating the quality of the Foundational Software Platform (FSP), but it is quite difficult or even impossible to collect the whole metric data required in the process ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEM '10: Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement

September 2010

423 pages

ISBN:9781450300391

DOI:10.1145/1852786

General Chair:
Giancarlo Succi
Free University of Bolzano, Italy
,
Program Chairs:
Maurizio Morisio
Politecnico di Torino, Italy
,
Nachi Nagappan
Microsoft Research

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

quality evaluation

Qualifiers

Research-article

Conference

ESEM '10

Sponsor:

SIGSOFT

ESEM '10: 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement

September 16 - 17, 2010

Bolzano-Bozen, Italy

Acceptance Rates

ESEM '10 Paper Acceptance Rate 30 of 102 submissions, 29%;

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
867
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)6

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Negri-Ribalta CGeraud-Stewart RSergeeva ALenzini G(2024)A systematic literature review on the impact of AI models on the security of code generationFrontiers in Big Data10.3389/fdata.2024.13867207Online publication date: 13-May-2024
https://doi.org/10.3389/fdata.2024.1386720
Wohlin CRuneson PHöst MOhlsson MRegnell BWesslén AWohlin CRuneson PHöst MOhlsson MRegnell BWesslén A(2024)Systematic Literature StudiesExperimentation in Software Engineering10.1007/978-3-662-69306-3_4(51-63)Online publication date: 8-May-2024
https://doi.org/10.1007/978-3-662-69306-3_4
Winter ENowack VBowes DCounsell SHall THaraldsson SWoodward J(2023) Let’s Talk With Developers, Not About Developers: A Review of Automatic Program Repair Research IEEE Transactions on Software Engineering10.1109/TSE.2022.315208949:1(419-436)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TSE.2022.3152089
Kırtıloğlu GÖzcan-Top Ö(2023)Exploring Coexistence of Software Architecture Development and Agility through a Multivocal Literature Review2023 9th International Conference on Control, Decision and Information Technologies (CoDIT)10.1109/CoDIT58514.2023.10284297(2304-2309)Online publication date: 3-Jul-2023
https://doi.org/10.1109/CoDIT58514.2023.10284297
Minhas NBörstler JPetersen K(2023)Checklists to support decision-making in regression testingJournal of Systems and Software10.1016/j.jss.2023.111697202(111697)Online publication date: Aug-2023
https://doi.org/10.1016/j.jss.2023.111697
Mahaju SCarver JBradshaw G(2023)Human error management in requirements engineering: Should we fix the people, the processes, or the environment?Information and Software Technology10.1016/j.infsof.2023.107223160(107223)Online publication date: Aug-2023
https://doi.org/10.1016/j.infsof.2023.107223
Nagaria BHall T(2022)How Software Developers Mitigate Their Errors When Developing CodeIEEE Transactions on Software Engineering10.1109/TSE.2020.304055448:6(1853-1867)Online publication date: 1-Jun-2022
https://doi.org/10.1109/TSE.2020.3040554
Furtado VVignando HLuz CSteinmacher IKalinowski MOliveiraJr E(2022)Controlled Experimentation of Software Product LinesUML-Based Software Product Line Engineering with SMarty10.1007/978-3-031-18556-4_19(417-443)Online publication date: 28-Sep-2022
https://doi.org/10.1007/978-3-031-18556-4_19
Nowack VBowes DCounsell SHall THaraldsson SWinter EWoodward J(2021)Expanding Fix Patterns to Enable Automatic Program Repair2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE52982.2021.00015(12-23)Online publication date: Oct-2021
https://doi.org/10.1109/ISSRE52982.2021.00015
Lana CGuessi MAntonino PRombach DNakagawa E(2019)A Systematic Identification of Formal and Semi-Formal Languages and Techniques for Software-Intensive Systems-of-Systems Requirements ModelingIEEE Systems Journal10.1109/JSYST.2018.287406113:3(2201-2212)Online publication date: Sep-2019
https://doi.org/10.1109/JSYST.2018.2874061
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents