Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1852786.1852789acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article

Can we evaluate the quality of software engineering experiments?

Published: 16 September 2010 Publication History

Abstract

Context: The authors wanted to assess whether the quality of published human-centric software engineering experiments was improving. This required a reliable means of assessing the quality of such experiments. Aims: The aims of the study were to confirm the usability of a quality evaluation checklist, determine how many reviewers were needed per paper that reports an experiment, and specify an appropriate process for evaluating quality. Method: With eight reviewers and four papers describing human-centric software engineering experiments, we used a quality checklist with nine questions. We conducted the study in two parts: the first was based on individual assessments and the second on collaborative evaluations. Results: The inter-rater reliability was poor for individual assessments but much better for joint evaluations. Four reviewers working in two pairs with discussion were more reliable than eight reviewers with no discussion. The sum of the nine criteria was more reliable than individual questions or a simple overall assessment. Conclusions: If quality evaluation is critical, more than two reviewers are required and a round of discussion is necessary. We advise using quality criteria and basing the final assessment on the sum of the aggregated criteria. The restricted number of papers used and the relatively extensive expertise of the reviewers limit our results. In addition, the results of the second part of the study could have been affected by removing a time restriction on the review as well as the consultation process.

References

[1]
Abrahão, S. and Poels, G. 2007. Experimental evaluation of an object-oriented function. Information and Software Technology, 49(4), pp 366--380.
[2]
Afzal, W., Torkar, R. and Feldt, R. 2009. A systematic review of search-based testing for non-functional system properties. Information and Software Technology, 51(6), pp 959--976.
[3]
Bornmann, L. 2011. Scientific Peer Review. Annual Review of Information Science and Technology, 45, in press.
[4]
Bornmann, L., Mutz, R. and Daniel, D.-D. A reliability-generalization study of journal peer reviews -- a multi-level analysis of inter-rater reliability and its determinants, submitted.
[5]
Dybå, T. and Dingsøyr, T. 2008a. Strength of Evidence in Systematic Reviews in Software Engineering, ESEM'2008, pp 178--187.
[6]
Dybå, T. and Dingsøyr, T. 2008b. Empirical studies of agile software development: A systematic review. Information and Software Technology, 50(9--10), pp 833--859.
[7]
Dybå, T., Kitchenham, B. A. and Jørgensen, M. 2005. Evidence-based Software Engineering for Practitioners, IEEE Software, 22 (1), pp 58--65.
[8]
Jørgensen, M. and Moløkken-Østvold, K. 2000. Impact of effort estimates on software project work? A review of the 1994 Chaos report. Information and Software Technology, 48(4), pp 297--301.
[9]
Kampenes, V. B. 2007. Quality of Design Analysis and Reporting of Software Engineering Experiments. A Systematic Review. PhD Thesis, Dept. Informatics, University of Oslo.
[10]
Karahasanović, A. Levine, A. K. and Thomas, R. 2007. Comprehension strategies and difficulties in maintaining object-oriented systems: An explorative study. Journal of Systems and Software, 80, pp 1541--1559.
[11]
Karlsson, L., Thelin, T., Regnell, B., Berander, P. and Wohlin, C. 2007. Pair-wise comparisons versus planning game partitioning -- experts on requirements prioritisation techniques. Empirical Software Engineering, 12, pp 3--33.
[12]
Kitchenham, B. A., Brereton, O. P., Budgen, D. and Li, Z. 2009. An evaluation of quality checklist proposals -- A participant observer case study. EASE'09, BCS eWic.
[13]
Kitchenham, B., Brereton, P., Turner, M., Niazi, M., Linkman, S., Pretorius, R. and Budgen, D. Refining the systematic literature review process -- Two observer-participant case studies, accepted for publication in Empirical Software Engineering.
[14]
Kitchenham, B. A., Dybå, T. and Jørgensen, M. 2004 Evidence-based Software Engineering. Proceedings of the 26th International Conference on Software Engineering, (ICSE '04), IEEE Computer Society, Washington DC, USA, pp 273--281.
[15]
Liu, H. and Tan, H. B. K. 2008. Testing input validation in Web applications through automated model recovery. Journal of Systems and Software, 81, pp 222--233
[16]
Marsh, H. W., Jayasinghe, U. W. and Bond, N. W. 2008. Improving the Peer-Review Process for Grant Application. Reliability, Validity, Bias and Generalizability. American Psychologist, 63(3), pp 160--168.
[17]
Neff, B. D. and Olden, J. D. 2006. Is Peer Review a Game of Chance. BioScience, 56(4), pp 333--340.
[18]
Poolman, R. W., Keijser, L. C., de Waal Malefijt, M. C., Blankevoort, L., Farrokhyar, F., Bhandari, M. 2007. Reviewer agreement in scoring 419 abstracts for scientific orthopedics meetings. Acta Orthopaedica, 78(2) pp 278--284.
[19]
Rowe, B. H., Strome, T. L. Spooner, C., Blitz, S., Grafstein, E. and Worster, A. 2006. Reviewer agreement trends from four years of electronic submissions of conference abstract. BMC Medical Research Methodology, 6(14).
[20]
Schultz, D. M. 2009. Are three heads better than two? How the number of reviewers and editor behaviour affect the rejection rate. Scientometrics, Springer.
[21]
Shang, A., Huwwiler-Müntener, K., Nartney, L., Jüni, P., Dörig, S., Pwesner, D. and Egger, M. 2005. Are the clinical effects of homeopathy placebo effects? Comparative study of placebo-controlled trials of homeopathy and allopathy. Lancet, 366 (9487), pp 726--732.
[22]
Sjøberg, D. I. K., Dybå T. and Jørgensen. M. 2007. The Future of Empirical Methods in Software Engineering Research, In: Future of Software Engineering (FOSE '07), ed. by Briand L. and Wolf A., pp. 358--378, IEEE-CS Press.
[23]
Weller, A. C. 2002. Editorial Peer Review: Its Strengths and Weaknesses. Medford, NJ, USA. Information Today, Inc.
[24]
Wood, M., Roberts, M. and Howell, B. 2004. The reliability of peer reviews of papers on information systems. Journal of Information Science, 30(1), pp 2--11.

Cited By

View all
  • (2024)A systematic literature review on the impact of AI models on the security of code generationFrontiers in Big Data10.3389/fdata.2024.13867207Online publication date: 13-May-2024
  • (2024)Systematic Literature StudiesExperimentation in Software Engineering10.1007/978-3-662-69306-3_4(51-63)Online publication date: 8-May-2024
  • (2023) Let’s Talk With Developers, Not About Developers: A Review of Automatic Program Repair Research IEEE Transactions on Software Engineering10.1109/TSE.2022.315208949:1(419-436)Online publication date: 1-Jan-2023
  • Show More Cited By

Index Terms

  1. Can we evaluate the quality of software engineering experiments?

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ESEM '10: Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement
    September 2010
    423 pages
    ISBN:9781450300391
    DOI:10.1145/1852786
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 September 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. quality evaluation

    Qualifiers

    • Research-article

    Conference

    ESEM '10
    Sponsor:

    Acceptance Rates

    ESEM '10 Paper Acceptance Rate 30 of 102 submissions, 29%;
    Overall Acceptance Rate 130 of 594 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)46
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A systematic literature review on the impact of AI models on the security of code generationFrontiers in Big Data10.3389/fdata.2024.13867207Online publication date: 13-May-2024
    • (2024)Systematic Literature StudiesExperimentation in Software Engineering10.1007/978-3-662-69306-3_4(51-63)Online publication date: 8-May-2024
    • (2023) Let’s Talk With Developers, Not About Developers: A Review of Automatic Program Repair Research IEEE Transactions on Software Engineering10.1109/TSE.2022.315208949:1(419-436)Online publication date: 1-Jan-2023
    • (2023)Exploring Coexistence of Software Architecture Development and Agility through a Multivocal Literature Review2023 9th International Conference on Control, Decision and Information Technologies (CoDIT)10.1109/CoDIT58514.2023.10284297(2304-2309)Online publication date: 3-Jul-2023
    • (2023)Checklists to support decision-making in regression testingJournal of Systems and Software10.1016/j.jss.2023.111697202(111697)Online publication date: Aug-2023
    • (2023)Human error management in requirements engineering: Should we fix the people, the processes, or the environment?Information and Software Technology10.1016/j.infsof.2023.107223160(107223)Online publication date: Aug-2023
    • (2022)How Software Developers Mitigate Their Errors When Developing CodeIEEE Transactions on Software Engineering10.1109/TSE.2020.304055448:6(1853-1867)Online publication date: 1-Jun-2022
    • (2022)Controlled Experimentation of Software Product LinesUML-Based Software Product Line Engineering with SMarty10.1007/978-3-031-18556-4_19(417-443)Online publication date: 28-Sep-2022
    • (2021)Expanding Fix Patterns to Enable Automatic Program Repair2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE52982.2021.00015(12-23)Online publication date: Oct-2021
    • (2019)A Systematic Identification of Formal and Semi-Formal Languages and Techniques for Software-Intensive Systems-of-Systems Requirements ModelingIEEE Systems Journal10.1109/JSYST.2018.287406113:3(2201-2212)Online publication date: Sep-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media