research-article

Comparing techniques for aggregating interrelated replications in software engineering

Authors:

Natalia JuristoAuthors Info & Claims

ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Article No.: 8, Pages 1 - 10

https://doi.org/10.1145/3239235.3239239

Published: 11 October 2018 Publication History

Abstract

Context: Researchers from different groups and institutions are collaborating towards the construction of groups of interrelated replications. Applying unsuitable techniques to aggregate interrelated replications' results may impact the reliability of joint conclusions.

Objectives: Comparing the advantages and disadvantages of the techniques applied to aggregate interrelated replications' results in Software Engineering (SE).

Method: We conducted a literature review to identify the techniques applied to aggregate interrelated replications' results in SE. We analyze a prototypical group of interrelated replications in SE with the techniques that we identified. We check whether the advantages and disadvantages of each technique---according to mature experimental disciplines such as medicine---materialize in the SE context.

Results: Narrative synthesis and Aggregation of p-values do not take advantage of all the information contained within the raw-data for providing joint conclusions. Aggregated Data (AD) meta-analysis provides visual summaries of results and allows assessing experiment-level moderators. Individual Participant Data (IPD) meta-analysis allows interpreting results in natural units and assessing experiment-level and participant-level moderators.

Conclusion: All the information contained within the raw-data should be used to provide joint conclusions. AD and IPD, when used in tandem, seem suitable to analyze groups of interrelated replications in SE.

References

[1]

2011. ISO/IEC 25010:2011. https://www.iso.Org/obp/ui/#iso:std:iso-iec:25010:ed-1:v1:en

[2]

Ghada Abo-Zaid, Boliang Guo, Jonathan J Deeks, Thomas PA Debray, Ewout W Steyerberg, Karel GM Moons, and Richard David Riley. 2013. Individual participant data meta-analyses should not ignore clustering. Journal of clinical epidemiology 66, 8 (2013), 865--873.

[3]

Jesse A Berlin, Jill Santanna, Christopher H Schmid, Lynda A Szczech, and Harold I Feldman. 2002. Individual patient-versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Statistics in medicine 21, 3 (2002), 371--387.

[4]

Michael Borenstein, Larry V Hedges, Julian PT Higgins, and Hannah R Rothstein. 2011. Introduction to Meta-Analysis. John Wiley & Sons.

[5]

Helen Brown and Robin Prescott. 2014. Applied mixed models in medicine. John Wiley & Sons.

[6]

Mariano Ceccato, Massimiliano Di Penta, Paolo Falcarin, Filippo Ricca, Marco Torchiano, and Paolo Tonella. 2014. A family of experiments to assess the effectiveness and efficiency of source code obfuscation techniques. Empirical Software Engineering 19, 4 (2014), 1040--1074.

Digital Library

[7]

Robert Coe. 2002. It's the effect size, stupid: What effect size is and why it is important. (2002).

[8]

Jacob Cohen. 1994. The earth is round (p<. 05). American psychologist 49, 12 (1994), 997.

[9]

Harris Cooper and Erika A Patall. 2009. The relative benefits of meta-analysis conducted with individual participant data versus aggregated data. Psychological methods 14, 2 (2009), 165.

[10]

Geoff Cumming. 2013. Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.

[11]

Thomas Debray, Karel GM Moons, Gert Valkenhoef, Orestis Efthimiou, Noemi Hummel, Rolf HH Groenwold, and Johannes B Reitsma. 2015. Get real in individual participant data (IPD) meta-analysis: a review of the methodology. Research synthesis methods 6, 4 (2015), 293--309.

[12]

Oscar Dieste, Alejandrina M Aranda, Fernando Uyaguari, Burak Turhan, Ayse Tosun, Davide Fucci, Markku Oivo, and Natalia Juristo. 2017. Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study. Empirical Software Engineering 22, 5 (2017), 2457--2542.

Digital Library

[13]

Tore Dyb4aR, Vigdis By Kampenes, and Dag IK Sjøberg. 2006. A systematic review of statistical power in software engineering experiments. Information and Software Technology 48, 8 (2006), 745--755.

[14]

H Erdogmus, M Morisio, and M Torchiano. 2005. On the effectiveness of the test-first approach to programming. IEEE Transactions on Software Engineering 31, 3 (2005), 226--237.

Digital Library

[15]

Morten W Fagerland. 2012. t-tests, non-parametric tests, and large studies -a paradox of statistical practice? BMC Medical Research Methodology 12, 1 (2012), 78.

[16]

Daniel J Feaster, Susan Mikulich-Gilbertson, and Ahnalee M Brincks. 2011. Modeling site effects in the design and analysis of multi-site trials. The American journal of drug and alcohol abuse 37, 5 (2011), 383--391.

[17]

Adrian Fernandez, Silvia Abrahão, and Emilio Insfran. 2013. Empirical validation of a usability inspection method for model-driven Web development. Journal of Systems and Software 86, 1 (2013), 161--186.

Digital Library

[18]

Andy Field. 2013. Discovering statistics using IBM SPSS statistics. Sage.

Digital Library

[19]

Andy Field, Jeremy Miles, and Zoë Field. 2012. Discovering statistics using R. Sage publications.

Digital Library

[20]

DJ Fisher, AJ Copas, JF Tierney, and MKB Parmar. 2011. A critical review of methods for the assessment of patient-level interactions in individual participant data meta-analysis of randomized trials, and guidance for practitioners. Journal of clinical epidemiology 64, 9 (2011), 949--967.

[21]

Julian PT Higgins, Anne Whitehead, Rebecca M Turner, Rumana Z Omar, and Simon G Thompson. 2001. Meta-analysis of continuous outcome data from individual patients. Statistics in medicine 20, 15 (2001), 2219--2241.

[22]

Joop J Hox, Mirjam Moerbeek, and Rens van de Schoot. 2010. Multilevel analysis: Techniques and applications. Routledge.

[23]

Natalia Juristo and Ana M Moreno. 2011. Basics of software engineering experimentation. Springer Science & Business Media.

Digital Library

[24]

Brennan C Kahan and Tim P Morris. 2013. Assessing potential sources of clustering in individually randomised trials. BMC medical research methodology 13, 1 (2013), 58.

[25]

Vigdis By Kampenes, Tore Dybå, Jo E Hannay, and Dag IK Sjøberg. 2007. A systematic review of effect size in software engineering experiments. Information and Software Technology 49, 11 (2007), 1073--1086.

Digital Library

[26]

Barbara Kitchenham. 2004. Procedures for performing systematic reviews. Keele, UK, Keele University 33, 2004 (2004), 1--26.

[27]

Helena Chmura Kraemer. 2000. Pitfalls of multisite randomized clinical trials of efficacy and effectiveness. Schizophrenia Bulletin 26, 3 (2000), 533--541.

[28]

Paul C Lambert, Alex J Sutton, Keith R Abrams, and David R Jones. 2002. A comparison of summary patient-level covariates in meta-regression with individual patient data meta-analysis. Journal of clinical epidemiology 55, 1 (2002), 86--94.

[29]

John A Lewis. 1999. Statistical principles for clinical trials (ICH E9): an introductory note on an international guideline. Statistics in medicine 18, 15 (1999), 1903--1942.

[30]

Richard Light and Paul Smith. 1971. Accumulating evidence: Procedures for resolving contradictions among different research studies. Harvard educational review 41, 4 (1971), 429--471.

[31]

Thomas Lumley, Paula Diehr, Scott Emerson, and Lu Chen. 2002. The importance of the normality assumption in large public health data sets. Annual review of public health 23, 1 (2002), 151--169.

[32]

Richard McElreath. 2015. Statistical Rethinking. Texts in Statistical Science.

[33]

Lilia Muñoz, Jose-Norberto Mazón, and Juan Trujillo. 2010. A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses. Information and Software Technology 52, 11 (2010), 1188--1203.

Digital Library

[34]

Geoff Norman. 2010. Likert scales, levels of measurement and the laws of statistics. Advances in health sciences education 15, 5 (2010), 625--632.

[35]

Diana B Petitti. 2000. Meta-analysis, decision analysis, and cost-effectiveness analysis: methods for quantitative synthesis in medicine. Number 31. OUP USA.

[36]

Lesley M Pickard, Barbara A Kitchenham, and Peter W Jones. 1998. Combining empirical results in software engineering. Information and software technology 40, 14 (1998), 811--821.

Digital Library

[37]

Jennie Popay, Helen Roberts, Amanda Sowden, Mark Petticrew, Lisa Arai, Mark Rodgers, Nicky Britten, Katrina Roen, and Steven Duffy. 2006. Guidance on the conduct of narrative synthesis in systematic reviews. A product from the ESRC methods programme Version 1 (2006), b92.

[38]

Filippo Ricca, Massimiliano Di Penta, Marco Torchiano, Paolo Tonella, and Mariano Ceccato. 2010. How developers' experience and ability influence web application comprehension tasks supported by UML stereotypes: A series of four experiments. IEEE Transactions on Software Engineering 36, 1 (2010), 96--118.

Digital Library

[39]

Adrian Santos, Omar Gómez, and Natalia Juristo. 2018. Analyzing Families of Experiments in SE: A Systematic Mapping Study. https://arxiv.org/abs/1805.09009. Online; accessed 23 May 2018.

[40]

Christopher H Schmid, Paul C Stark, Jesse A Berlin, Paul Landais, and Joseph Lau. 2004. Meta-regression detected associations between heterogeneous treatment effects and study-level, but not patient-level, factors. Journal of clinical epidemiology 57, 7 (2004), 683--697.

[41]

Mark C Simmonds, Julian PT Higginsa, Lesley A Stewartb, Jayne F Tierneyb, Mike J Clarke, and Simon G Thompson. 2005. Meta-analysis of individual patient data from randomized trials: a review of methods used in practice. Clinical Trials 2, 3 (2005), 209--217.

[42]

Dag IK Sjoberg, Tore Dyba, and Magne Jorgensen. 2007. The future of empirical methods in software engineering research. In Future of Software Engineering, 2007. FOSE'07. IEEE, 358--378.

Digital Library

[43]

Lesley A Stewart, Mike Clarke, Maroeska Rovers, Richard D Riley, Mark Simmonds, Gavin Stewart, and Jayne F Tierney. 2015. Preferred reporting items for a systematic review and meta-analysis of individual participant data: the PRISMA-IPD statement. Jama 313, 16 (2015), 1657--1665.

[44]

Lesley A Stewart and Jayne F Tierney. 2002. To IPD or not to IPD? Advantages and disadvantages of systematic reviews using individual patient data. Evaluation & the health professions 25, 1 (2002), 76--97.

[45]

Ayse Tosun, Oscar Dieste, Davide Fucci, Sira Vegas, Burak Turhan, Hakan Erdogmus, Adrian Santos, Markku Oivo, Kimmo Toro, Janne Jarvinen, et al. 2017. An industry experiment on the effects of test-driven development on external quality and productivity. Empirical Software Engineering 22, 6 (2017), 2763--2805.

Digital Library

[46]

Ronald L Wasserstein and Nicole A Lazar. 2016. The ASA's statement on p-values: context, process, and purpose.

[47]

Anne Whitehead. 2002. Meta-analysis of controlled clinical trials. Vol. 7. John Wiley & Sons.

[48]

Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in software engineering. Springer Science & Business Media.

Cited By

Walkinshaw NShepperd MLi JJaccheri LDingsøyr TChitchyan R(2020)Reasoning about Uncertainty in Empirical ResultsProceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering10.1145/3383219.3383234(140-149)Online publication date: 15-Apr-2020
https://dl.acm.org/doi/10.1145/3383219.3383234
Sajedi-Badashian AStroulia E(2020)Guidelines for evaluating bug-assignment researchJournal of Software: Evolution and Process10.1002/smr.2250(e2250)Online publication date: 3-Mar-2020
https://doi.org/10.1002/smr.2250
Santos AVegas SOivo MJuristo N(2019)A Procedure and Guidelines for Analyzing Groups of Software Engineering ReplicationsIEEE Transactions on Software Engineering10.1109/TSE.2019.2935720(1-1)Online publication date: 2019
https://doi.org/10.1109/TSE.2019.2935720

Index Terms

Comparing techniques for aggregating interrelated replications in software engineering
1. General and reference
  1. Cross-computing tools and techniques
    1. Experimentation

Recommendations

Comparing the results of replications in software engineering
AbstractContext
It has been argued that software engineering replications are useful for verifying the results of previous experiments. However, it has not yet been agreed how to check whether the results hold across replications. Besides, some authors ...
The role and value of replication in empirical software engineering results
Abstract Context
Concerns have been raised from many quarters regarding the reliability of empirical research findings and this includes software engineering. Replication has been proposed as an important means of increasing ...
A process for managing interaction between experimenters to get useful similar replications

Context: A replication is the repetition of an experiment. Several efforts have been made to adopt replication as a common practice in software engineering. There are different types of replications, depending on their purpose. Similar replications keep ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

October 2018

487 pages

ISBN:9781450358231

DOI:10.1145/3239235

General Chair:
Markku Oivo
University of Oulu, Finland
,
Program Chairs:
Daniel Méndez
Technical University of Munich, Germany
,
Audris Mockus
University of Tennessee

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Best Paper

Author Tags

Qualifiers

Research-article

Funding Sources

Spanish Ministry of Science and Innovation

Conference

ESEM '18

Sponsor:

SIGSOFT

ESEM '18: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

October 11 - 12, 2018

Oulu, Finland

Acceptance Rates

Overall Acceptance Rate 130 of 594 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
214
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Walkinshaw NShepperd MLi JJaccheri LDingsøyr TChitchyan R(2020)Reasoning about Uncertainty in Empirical ResultsProceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering10.1145/3383219.3383234(140-149)Online publication date: 15-Apr-2020
https://dl.acm.org/doi/10.1145/3383219.3383234
Sajedi-Badashian AStroulia E(2020)Guidelines for evaluating bug-assignment researchJournal of Software: Evolution and Process10.1002/smr.2250(e2250)Online publication date: 3-Mar-2020
https://doi.org/10.1002/smr.2250
Santos AVegas SOivo MJuristo N(2019)A Procedure and Guidelines for Analyzing Groups of Software Engineering ReplicationsIEEE Transactions on Software Engineering10.1109/TSE.2019.2935720(1-1)Online publication date: 2019
https://doi.org/10.1109/TSE.2019.2935720

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents