Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3239235.3239239acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article

Comparing techniques for aggregating interrelated replications in software engineering

Published: 11 October 2018 Publication History

Abstract

Context: Researchers from different groups and institutions are collaborating towards the construction of groups of interrelated replications. Applying unsuitable techniques to aggregate interrelated replications' results may impact the reliability of joint conclusions.
Objectives: Comparing the advantages and disadvantages of the techniques applied to aggregate interrelated replications' results in Software Engineering (SE).
Method: We conducted a literature review to identify the techniques applied to aggregate interrelated replications' results in SE. We analyze a prototypical group of interrelated replications in SE with the techniques that we identified. We check whether the advantages and disadvantages of each technique---according to mature experimental disciplines such as medicine---materialize in the SE context.
Results: Narrative synthesis and Aggregation of p-values do not take advantage of all the information contained within the raw-data for providing joint conclusions. Aggregated Data (AD) meta-analysis provides visual summaries of results and allows assessing experiment-level moderators. Individual Participant Data (IPD) meta-analysis allows interpreting results in natural units and assessing experiment-level and participant-level moderators.
Conclusion: All the information contained within the raw-data should be used to provide joint conclusions. AD and IPD, when used in tandem, seem suitable to analyze groups of interrelated replications in SE.

References

[1]
2011. ISO/IEC 25010:2011. https://www.iso.Org/obp/ui/#iso:std:iso-iec:25010:ed-1:v1:en
[2]
Ghada Abo-Zaid, Boliang Guo, Jonathan J Deeks, Thomas PA Debray, Ewout W Steyerberg, Karel GM Moons, and Richard David Riley. 2013. Individual participant data meta-analyses should not ignore clustering. Journal of clinical epidemiology 66, 8 (2013), 865--873.
[3]
Jesse A Berlin, Jill Santanna, Christopher H Schmid, Lynda A Szczech, and Harold I Feldman. 2002. Individual patient-versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Statistics in medicine 21, 3 (2002), 371--387.
[4]
Michael Borenstein, Larry V Hedges, Julian PT Higgins, and Hannah R Rothstein. 2011. Introduction to Meta-Analysis. John Wiley & Sons.
[5]
Helen Brown and Robin Prescott. 2014. Applied mixed models in medicine. John Wiley & Sons.
[6]
Mariano Ceccato, Massimiliano Di Penta, Paolo Falcarin, Filippo Ricca, Marco Torchiano, and Paolo Tonella. 2014. A family of experiments to assess the effectiveness and efficiency of source code obfuscation techniques. Empirical Software Engineering 19, 4 (2014), 1040--1074.
[7]
Robert Coe. 2002. It's the effect size, stupid: What effect size is and why it is important. (2002).
[8]
Jacob Cohen. 1994. The earth is round (p<. 05). American psychologist 49, 12 (1994), 997.
[9]
Harris Cooper and Erika A Patall. 2009. The relative benefits of meta-analysis conducted with individual participant data versus aggregated data. Psychological methods 14, 2 (2009), 165.
[10]
Geoff Cumming. 2013. Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.
[11]
Thomas Debray, Karel GM Moons, Gert Valkenhoef, Orestis Efthimiou, Noemi Hummel, Rolf HH Groenwold, and Johannes B Reitsma. 2015. Get real in individual participant data (IPD) meta-analysis: a review of the methodology. Research synthesis methods 6, 4 (2015), 293--309.
[12]
Oscar Dieste, Alejandrina M Aranda, Fernando Uyaguari, Burak Turhan, Ayse Tosun, Davide Fucci, Markku Oivo, and Natalia Juristo. 2017. Empirical evaluation of the effects of experience on code quality and programmer productivity: an exploratory study. Empirical Software Engineering 22, 5 (2017), 2457--2542.
[13]
Tore Dyb4aR, Vigdis By Kampenes, and Dag IK Sjøberg. 2006. A systematic review of statistical power in software engineering experiments. Information and Software Technology 48, 8 (2006), 745--755.
[14]
H Erdogmus, M Morisio, and M Torchiano. 2005. On the effectiveness of the test-first approach to programming. IEEE Transactions on Software Engineering 31, 3 (2005), 226--237.
[15]
Morten W Fagerland. 2012. t-tests, non-parametric tests, and large studies -a paradox of statistical practice? BMC Medical Research Methodology 12, 1 (2012), 78.
[16]
Daniel J Feaster, Susan Mikulich-Gilbertson, and Ahnalee M Brincks. 2011. Modeling site effects in the design and analysis of multi-site trials. The American journal of drug and alcohol abuse 37, 5 (2011), 383--391.
[17]
Adrian Fernandez, Silvia Abrahão, and Emilio Insfran. 2013. Empirical validation of a usability inspection method for model-driven Web development. Journal of Systems and Software 86, 1 (2013), 161--186.
[18]
Andy Field. 2013. Discovering statistics using IBM SPSS statistics. Sage.
[19]
Andy Field, Jeremy Miles, and Zoë Field. 2012. Discovering statistics using R. Sage publications.
[20]
DJ Fisher, AJ Copas, JF Tierney, and MKB Parmar. 2011. A critical review of methods for the assessment of patient-level interactions in individual participant data meta-analysis of randomized trials, and guidance for practitioners. Journal of clinical epidemiology 64, 9 (2011), 949--967.
[21]
Julian PT Higgins, Anne Whitehead, Rebecca M Turner, Rumana Z Omar, and Simon G Thompson. 2001. Meta-analysis of continuous outcome data from individual patients. Statistics in medicine 20, 15 (2001), 2219--2241.
[22]
Joop J Hox, Mirjam Moerbeek, and Rens van de Schoot. 2010. Multilevel analysis: Techniques and applications. Routledge.
[23]
Natalia Juristo and Ana M Moreno. 2011. Basics of software engineering experimentation. Springer Science & Business Media.
[24]
Brennan C Kahan and Tim P Morris. 2013. Assessing potential sources of clustering in individually randomised trials. BMC medical research methodology 13, 1 (2013), 58.
[25]
Vigdis By Kampenes, Tore Dybå, Jo E Hannay, and Dag IK Sjøberg. 2007. A systematic review of effect size in software engineering experiments. Information and Software Technology 49, 11 (2007), 1073--1086.
[26]
Barbara Kitchenham. 2004. Procedures for performing systematic reviews. Keele, UK, Keele University 33, 2004 (2004), 1--26.
[27]
Helena Chmura Kraemer. 2000. Pitfalls of multisite randomized clinical trials of efficacy and effectiveness. Schizophrenia Bulletin 26, 3 (2000), 533--541.
[28]
Paul C Lambert, Alex J Sutton, Keith R Abrams, and David R Jones. 2002. A comparison of summary patient-level covariates in meta-regression with individual patient data meta-analysis. Journal of clinical epidemiology 55, 1 (2002), 86--94.
[29]
John A Lewis. 1999. Statistical principles for clinical trials (ICH E9): an introductory note on an international guideline. Statistics in medicine 18, 15 (1999), 1903--1942.
[30]
Richard Light and Paul Smith. 1971. Accumulating evidence: Procedures for resolving contradictions among different research studies. Harvard educational review 41, 4 (1971), 429--471.
[31]
Thomas Lumley, Paula Diehr, Scott Emerson, and Lu Chen. 2002. The importance of the normality assumption in large public health data sets. Annual review of public health 23, 1 (2002), 151--169.
[32]
Richard McElreath. 2015. Statistical Rethinking. Texts in Statistical Science.
[33]
Lilia Muñoz, Jose-Norberto Mazón, and Juan Trujillo. 2010. A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses. Information and Software Technology 52, 11 (2010), 1188--1203.
[34]
Geoff Norman. 2010. Likert scales, levels of measurement and the laws of statistics. Advances in health sciences education 15, 5 (2010), 625--632.
[35]
Diana B Petitti. 2000. Meta-analysis, decision analysis, and cost-effectiveness analysis: methods for quantitative synthesis in medicine. Number 31. OUP USA.
[36]
Lesley M Pickard, Barbara A Kitchenham, and Peter W Jones. 1998. Combining empirical results in software engineering. Information and software technology 40, 14 (1998), 811--821.
[37]
Jennie Popay, Helen Roberts, Amanda Sowden, Mark Petticrew, Lisa Arai, Mark Rodgers, Nicky Britten, Katrina Roen, and Steven Duffy. 2006. Guidance on the conduct of narrative synthesis in systematic reviews. A product from the ESRC methods programme Version 1 (2006), b92.
[38]
Filippo Ricca, Massimiliano Di Penta, Marco Torchiano, Paolo Tonella, and Mariano Ceccato. 2010. How developers' experience and ability influence web application comprehension tasks supported by UML stereotypes: A series of four experiments. IEEE Transactions on Software Engineering 36, 1 (2010), 96--118.
[39]
Adrian Santos, Omar Gómez, and Natalia Juristo. 2018. Analyzing Families of Experiments in SE: A Systematic Mapping Study. https://arxiv.org/abs/1805.09009. Online; accessed 23 May 2018.
[40]
Christopher H Schmid, Paul C Stark, Jesse A Berlin, Paul Landais, and Joseph Lau. 2004. Meta-regression detected associations between heterogeneous treatment effects and study-level, but not patient-level, factors. Journal of clinical epidemiology 57, 7 (2004), 683--697.
[41]
Mark C Simmonds, Julian PT Higginsa, Lesley A Stewartb, Jayne F Tierneyb, Mike J Clarke, and Simon G Thompson. 2005. Meta-analysis of individual patient data from randomized trials: a review of methods used in practice. Clinical Trials 2, 3 (2005), 209--217.
[42]
Dag IK Sjoberg, Tore Dyba, and Magne Jorgensen. 2007. The future of empirical methods in software engineering research. In Future of Software Engineering, 2007. FOSE'07. IEEE, 358--378.
[43]
Lesley A Stewart, Mike Clarke, Maroeska Rovers, Richard D Riley, Mark Simmonds, Gavin Stewart, and Jayne F Tierney. 2015. Preferred reporting items for a systematic review and meta-analysis of individual participant data: the PRISMA-IPD statement. Jama 313, 16 (2015), 1657--1665.
[44]
Lesley A Stewart and Jayne F Tierney. 2002. To IPD or not to IPD? Advantages and disadvantages of systematic reviews using individual patient data. Evaluation & the health professions 25, 1 (2002), 76--97.
[45]
Ayse Tosun, Oscar Dieste, Davide Fucci, Sira Vegas, Burak Turhan, Hakan Erdogmus, Adrian Santos, Markku Oivo, Kimmo Toro, Janne Jarvinen, et al. 2017. An industry experiment on the effects of test-driven development on external quality and productivity. Empirical Software Engineering 22, 6 (2017), 2763--2805.
[46]
Ronald L Wasserstein and Nicole A Lazar. 2016. The ASA's statement on p-values: context, process, and purpose.
[47]
Anne Whitehead. 2002. Meta-analysis of controlled clinical trials. Vol. 7. John Wiley & Sons.
[48]
Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in software engineering. Springer Science & Business Media.

Cited By

View all
  • (2020)Reasoning about Uncertainty in Empirical ResultsProceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering10.1145/3383219.3383234(140-149)Online publication date: 15-Apr-2020
  • (2020)Guidelines for evaluating bug-assignment researchJournal of Software: Evolution and Process10.1002/smr.2250(e2250)Online publication date: 3-Mar-2020
  • (2019)A Procedure and Guidelines for Analyzing Groups of Software Engineering ReplicationsIEEE Transactions on Software Engineering10.1109/TSE.2019.2935720(1-1)Online publication date: 2019

Index Terms

  1. Comparing techniques for aggregating interrelated replications in software engineering

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
    October 2018
    487 pages
    ISBN:9781450358231
    DOI:10.1145/3239235
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    • Best Paper

    Author Tags

    1. AD
    2. IPD
    3. experimentation
    4. meta-analysis
    5. replication

    Qualifiers

    • Research-article

    Funding Sources

    • Spanish Ministry of Science and Innovation

    Conference

    ESEM '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 130 of 594 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Reasoning about Uncertainty in Empirical ResultsProceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering10.1145/3383219.3383234(140-149)Online publication date: 15-Apr-2020
    • (2020)Guidelines for evaluating bug-assignment researchJournal of Software: Evolution and Process10.1002/smr.2250(e2250)Online publication date: 3-Mar-2020
    • (2019)A Procedure and Guidelines for Analyzing Groups of Software Engineering ReplicationsIEEE Transactions on Software Engineering10.1109/TSE.2019.2935720(1-1)Online publication date: 2019

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media