research-article

A Survey of Controlled Experiments in Software Engineering

Authors:

Nils-Kristian Liborg,

Anette C. RekdalAuthors Info & Claims

IEEE Transactions on Software Engineering, Volume 31, Issue 9

Pages 733 - 753

https://doi.org/10.1109/TSE.2005.97

Published: 01 September 2005 Publication History

Publisher Site

Abstract

The classical method for identifying cause-effect relationships is to conduct controlled experiments. This paper reports upon the present state of how controlled experiments in software engineering are conducted and the extent to which relevant information is reported. Among the 5,453 scientific articles published in 12 leading software engineering journals and conferences in the decade from 1993 to 2002, 103 articles (1.9 percent) reported controlled experiments in which individuals or teams performed one or more software engineering tasks. This survey quantitatively characterizes the topics of the experiments and their subjects (number of subjects, students versus professionals, recruitment, and rewards for participation), tasks (type of task, duration, and type and size of application) and environments (location, development tools). Furthermore, the survey reports on how internal and external validity is addressed and the extent to which experiments are replicated. The gathered data reflects the relevance of software engineering experiments to industrial practice and the scientific maturity of software engineering research.

References

[1]

A. Abran and J.W. Moore, “SWEBOK—Guide to the Software Engineering Body of Knowledge,” 2004 Version, IEEE CS Professional Practices Committee, 2004.

Digital Library

Google Scholar

[2]

ACM Computing Classification, http://theory.lcs.mit.edu/jacm/CR/1991, 1995.

Google Scholar

[3]

E. Arisholm and D.I.K. Sjøberg, “Evaluating the Effect of a Delegated versus Centralized Control Style on the Maintainability of Object-Oriented Software,” IEEE Trans. Software Eng., vol. 30, no. 8, pp. 521-534, Aug. 2004.

Digital Library

Google Scholar

[4]

V.R. Basili, “The Experimental Paradigm in Software Engineering,” Experimental Eng. Issues: Critical Assessment and Future Directions, Proc. Int'l Workshop, vol. 706, pp. 3-12, 1993.

Digital Library

Google Scholar

[5]

V.R. Basili, “The Role of Experimentation in Software Engineering: Past, Current, and Future,” Proc. 18th Int'l Conf. Software Eng., pp. 442-449, 1996.

Digital Library

Google Scholar

[6]

V.R. Basili R.W. Selby and D.H. Hutchens, “Experimentation in Software Engineering,” IEEE Trans. Software Eng., pp. 733-743, July 1986.

Digital Library

Google Scholar

[7]

V.R. Basili F. Shull and F. Lanubile, “Building Knowledge through Families of Experiments,” IEEE Trans. Software Eng., vol. 25, pp. 456-473, July/Aug. 1999.

Digital Library

Google Scholar

[8]

D.M. Berry and W.F. Tichy, “Response to `Comments on Formal Methods Application: An Empirical Tale of Software Development,'” IEEE Trans. Software Eng., vol. 29, no. 6, pp. 572-575, June 2003.

Digital Library

Google Scholar

[9]

D.T. Campbell and J.C. Stanley, “Experimental and Quasi-Experimental Designs for Research on Teaching,” Handbook of Research on Teaching, N.L. Cage, ed., Chicago: Rand McNally, 1963.

Google Scholar

[10]

L.B. Christensen, Experimental Methodology, eighth ed. Boston: Pearson/Allyn & Bacon, 2001.

Google Scholar

[11]

T.D. Cook and D.T. Campbell, Quasi-Experimentation. Design & Analysis Issues for Field Settings. Houghton Mifflin, 1979.

Google Scholar

[12]

B. Curtis, “Measurement and Experimentation in Software Engineering,” Proc. IEEE, vol. 68, no. 9, pp. 1144-1157, Sept. 1980.

Crossref

Google Scholar

[13]

B. Curtis, “By the Way, Did Anyone Study Real Programmers?” Empirical Studies of Programmers, Proc. First Workshop, pp. 256-262, 1986.

Digital Library

Google Scholar

[14]

I.S. Deligiannis M. Shepperd S. Webster and M. Roumeliotis, “A Review of Experimental Investigations into Object-Oriented Technology,” Empirical Software Eng., vol. 7, no. 3, pp. 193-231, Sept. 2002.

Digital Library

Google Scholar

[15]

A. Endres and D. Rombach, A Handbook of Software and Systems Engineering: Empirical Observations, Laws, and Theories, Fraunhofer IESE series on software engineering. Pearson Education Limited, 2003.

Google Scholar

[16]

N. Fenton, “How Effective Are Software Engineering Methods?” J. Systems and Software, vol. 22, no. 2, pp. 141-146, 1993.

Digital Library

Google Scholar

[17]

R. Ferber, “Editorial: Research by Convenience,” J. Consumer Research, vol. 4, pp. 57-58, June 1977.

Crossref

Google Scholar

[18]

R.L. Glass and T.Y. Chen, “An Assessment of Systems and Software Engineering Scholars and Institutions (1998-2002),” J. Systems and Software, vol. 68, no. 1, pp. 77-84, 2003.

Digital Library

Google Scholar

[19]

R.L. Glass V. Ramesh and I. Vessey, “An Analysis of Research in Computing Disciplines,” Comm. ACM, vol. 47, no. 6, pp. 89-94, June 2004.

Digital Library

Google Scholar

[20]

R.L. Glass I. Vessey and V. Ramesh, “Research in Software Engineering: An Analysis of the Literature,” J. Information and Software Technology, vol. 44, no. 8, pp. 491-506, June 2002.

Crossref

Google Scholar

[21]

S. Greenland J.M. Robins and J. Pearl, “Confounding and Collapsibility in Causal Inference,” Statistical Science, vol. 14,no. 1, pp. 29-46, 1999.

Crossref

Google Scholar

[22]

W. Hayes, “Research Synthesis in Software Engineering: A Case for Meta- Analysis,” Proc. Sixth Int'l Symp. Software Metrics, pp. 143-151, 2003.

Digital Library

Google Scholar

[23]

M. Höst C. Wohlin and T. Thelin, “Experimental Context Classification: Incentives and Experience of Subjects,” Proc. 27th Int'l Conf. Software Eng., pp. 470-478, 2005.

Digital Library

Google Scholar

[24]

IEEE Keyword Taxonomy, http://www.computer.org/mc/ keywords/software.htm, 2002.

Google Scholar

[25]

M. Jørgensen, “A Review of Studies on Expert Estimation of Software Development Effort,” J. Systems and Software, vol. 70, nos. 1,2, pp. 37-60, 2004.

Digital Library

Google Scholar

[26]

M. Jørgensen and D.I.K. Sjøberg, “Generalization and Theory Building in Software Engineering Research,” Empirical Assessment in Software Eng. Proc., pp. 29-36, 2004.

Google Scholar

[27]

M. Jørgensen K.H. Teigen and K. Moløkken, “Better Sure than Safe? Over-Confidence in Judgement Based Software Development Effort Prediction Intervals,” J. Systems and Software, vol. 70, nos. 1, 2, pp. 79-93, 2004.

Digital Library

Google Scholar

[28]

N. Juristo A.M. Moreno and S. Vegas, “Reviewing 25 Years of Testing Technique Experiments,” Empirical Software Eng., vol. 9, pp. 7-44, Mar. 2004.

Digital Library

Google Scholar

[29]

B.A. Kitchenham, “Procedures for Performing Systematic Reviews,” Technical Report TR/SE-0401, Keele University, and Technical Report 0400011T.1, NICTA, 2004.

Google Scholar

[30]

B.A. Kitchenham S.L. Pfleeger L.M. Pickard P.W. Jones D.C. Hoaglin K. El-Emam and J. Rosenberg, “Preliminary Guidelines for Empirical Research in Software Engineering,” IEEE Trans. Software Eng., vol. 28, no. 8, pp. 721-734, Aug. 2002.

Digital Library

Google Scholar

[31]

R.M. Lindsay and A.S.C. Ehrenberg, “The Design of Replicated Studies,” The Am. Statistician, vol. 47, pp. 217-228, Aug. 1993.

Google Scholar

[32]

C. Lott and D. Rombach, “Repeatable Software Engineering Experiments for Comparing Defect-Detection Techniques,” Empirical Software Eng., vol. 1, pp. 241-277, 1996.

Crossref

Google Scholar

[33]

J.W. Lucas, “Theory-Testing, Generalization, and the Problem of External Validity,” Sociological Theory, vol. 21, pp. 236-253, Sept. 2003.

Crossref

Google Scholar

[34]

T.R. Lunsford and B.R. Lunsford, “The Research Sample, Part I: Sampling,” J. Prosthetics and Orthotics, vol. 7, pp. 105-112, 1995.

Crossref

Google Scholar

[35]

Experimental Software Engineering Issues: Critical Assessment and Future Directions, Int'l Workshop Dagstuhl Castle (Germany), Sept. 14-18, 1992, Proc., H.D. Rombach, V.R. Basili, and R.W. Selby, eds. Springer Verlag, 1993.

Google Scholar

[36]

P. Runeson, “Using Students as Experimental Subjects—An Analysis of Graduate and Freshmen PSP Student Data,” Proc. Empirical Assessment in Software Eng., pp. 95-102, 2003.

Google Scholar

[37]

W.R. Shadish T.D. Cook and D.T. Campbell, Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin, 2002.

Google Scholar

[38]

M. Shaw, “Writing Good Software Engineering Research Papers: Minitutorial,” Proc. 25th Int'l Conf. Software Eng., pp. 726-736, 2003.

Digital Library

Google Scholar

[39]

D.I.K. Sjøberg B. Anda E. Arisholm T. Dybå M. Jørgensen A. Karahasanović E. Koren and M. Vokáč, “Conducting Realistic Experiments in Software Engineering,” Proc. 18th Int'l Symp. Empirical Software Eng., pp. 17-26, 2002.

Digital Library

Google Scholar

[40]

D.I.K. Sjøberg B. Anda E. Arisholm T. Dybå M. Jørgensen A. Karahasanović and M. Vokáč, “Challenges and Recommendations when Increasing the Realism of Controlled Software Engineering Experiments,” Empirical Methods and Studies in Software Eng., pp. 24-38, Springer Verlag, 2003.

Google Scholar

[41]

W.F. Tichy, “Should Computer Scientists Experiment More? 16 Excuses to Avoid Experimentation,” Computer, vol. 31, no. 5, pp. 32-40, May 1998.

Digital Library

Google Scholar

[42]

W.F. Tichy, “Hints for Reviewing Empirical Work in Software Engineering,” Empirical Software Eng., vol. 5, no. 4, pp. 309-312, 2000.

Digital Library

Google Scholar

[43]

W.F. Tichy P. Lukowicz L. Prechelt and E.A. Heinz, “Experimental Evaluation in Computer Science: A Quantitative Study,” J. Systems and Software, vol. 28, no. 1, pp. 9-18, Jan. 1995.

Digital Library

Google Scholar

[44]

W.M.K Trochim, The Research Methods Knowledge Base, second ed., Cincinnati: Atomic Dog Publishing, 2001.

Google Scholar

[45]

R. Weber, “Editor's Comments,” MIS Quarterly, vol. 27, no. 3, pp. iii-xii, Sept. 2003.

Digital Library

Google Scholar

[46]

C. Wohlin P. Runeson M. Höst M.C. Ohlsson B. Regnell and A. Wesslen, Experimentation in Software Eng.: An Introduction. John Wiley & Sons Inc., 1999.

Digital Library

Google Scholar

[47]

R.K. Yin, Case Study Research: Design and Methods. Thousand Oaks, Calif.: Sage, 2003.

Google Scholar

[48]

E.A. Youngs, “Human Errors in Programming,” Int'l J. Man-Machine Studies, vol. 6, no. 3, pp. 361-376, 1974.

Crossref

Google Scholar

[49]

M.V. Zelkowitz and D. Wallace, “Experimental Validation in Software Engineering,” J. Information and Software Technology, vol. 39, pp. 735-743, 1997.

Digital Library

Google Scholar

[50]

M.V. Zelkowitz and D. Wallace, “Experimental Models for Validating Technology,” Theory and Practice of Object Systems, vol. 31, no. 5, pp. 23-31, May 1998.

Digital Library

Google Scholar

[51]

A. Zendler, “A Preliminary Software Engineering Theory as Investigated by Published Experiments,” Empirical Software Eng., vol. 6, no. 2, pp. 161-180, 2001.

Digital Library

Google Scholar

[52]

G.H. Zimney, Method in Experimental Psychology. New York: Ronald Press, 1961.

Google Scholar

Cited By

View all

Honig W(2024)Establishing Metrics to Encourage Broader Use of Atomic Requirements - A Call for Exchange and ExperimentationACM SIGSOFT Software Engineering Notes10.1145/3672089.367209649:3(23-26)Online publication date: 18-Jul-2024
https://dl.acm.org/doi/10.1145/3672089.3672096
Wang WNing HZhang GLiu LWang Y(2024)Rocks Coding, Not Development: A Human-Centric, Experimental Evaluation of LLM-Supported SE TasksProceedings of the ACM on Software Engineering10.1145/36437581:FSE(699-721)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643758
Santana RMartins LVirgínio TRocha LCosta HMachado I(2024)An empirical evaluation of RAIDEScience of Computer Programming10.1016/j.scico.2023.103013231:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.scico.2023.103013
Show More Cited By

Index Terms

A Survey of Controlled Experiments in Software Engineering
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. Project and people management
2. Software and its engineering
  1. Software creation and management
    1. Software development process management
    2. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software notations and tools
    1. Software configuration management and version control systems

Recommendations

A Systematic Review of Theory Use in Software Engineering Experiments

Empirically based theories are generally perceived as foundational to science. However, in many disciplines, the nature, role and even the necessity of theories remain matters for debate, particularly in young or practical disciplines such as software ...
A practical guide to controlled experiments of software engineering tools with human participants
Abstract
Empirical studies, often in the form of controlled experiments, have been widely adopted in software engineering research as a way to evaluate the merits of new software engineering tools. However, controlled experiments involving human ...
Analyzing software engineering experiments: everything you always wanted to know but were afraid to ask
ICSE-C '17: Proceedings of the 39th International Conference on Software Engineering Companion

Experimentation is a key issue in science and engineering. But it is one of software engineering's stumbling blocks. Quite a lot of experiments are run nowadays, but it is a risky business. Software engineering has some special features, leading to some ...

Reviews

Reviewer: Larry Bernstein

This is a very important and insightful paper. By digging through the limited literature on controlled software engineering experiments, the authors have done a yeoman's job. They show clearly that planned methods work and that agile methods also work. As with all software engineering methods, it depends on the nature of the problem. The authors show what has been done and reach the conclusion that more experiments are needed. In their summary, the authors point out that, "although as many as 108 institutions from 19 countries were involved in conducting the [software engineering] experiments, a relatively low proportion of software engineering articles (1.9 percent) report controlled experiments." The authors go on to excuse this miscarriage of professional ethics, but they should not. Their data is an indictment of computer science and software engineering. Enough money is spent-much of it wasted on bankrupt software development-that the field demands controlled experiments. Better hypotheses and repeatable experiments can elevate hypotheses to the level of theory. In section 9, the authors state that "only 18 percent of the surveyed experiments were replications." We need to mount a large effort and obtain the necessary resources that will enable us to have a truly scientific theory for software design and implementation. The National Science Foundation in the US is funding a very limited amount of this kind of work. I was pleased to see that so many countries are participating in this global search, as the authors show in Table 4, with Visaggio and Lanubile at the University of Bari in Italy leading the way. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering

IEEE Transactions on Software Engineering Volume 31, Issue 9

September 2005

88 pages

ISSN:0098-5589

Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 September 2005

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

226
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Honig W(2024)Establishing Metrics to Encourage Broader Use of Atomic Requirements - A Call for Exchange and ExperimentationACM SIGSOFT Software Engineering Notes10.1145/3672089.367209649:3(23-26)Online publication date: 18-Jul-2024
https://dl.acm.org/doi/10.1145/3672089.3672096
Wang WNing HZhang GLiu LWang Y(2024)Rocks Coding, Not Development: A Human-Centric, Experimental Evaluation of LLM-Supported SE TasksProceedings of the ACM on Software Engineering10.1145/36437581:FSE(699-721)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643758
Santana RMartins LVirgínio TRocha LCosta HMachado I(2024)An empirical evaluation of RAIDEScience of Computer Programming10.1016/j.scico.2023.103013231:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.scico.2023.103013
Krüger JÇalıklı GBershadskyy DOtto SZabel SHeyer R(2024)Guidelines for using financial incentives in software-engineering experimentationEmpirical Software Engineering10.1007/s10664-024-10517-w29:5Online publication date: 10-Aug-2024
https://dl.acm.org/doi/10.1007/s10664-024-10517-w
Zieglmeier VPretschner A(2023)Rethinking People Analytics With Inverse Transparency by DesignProceedings of the ACM on Human-Computer Interaction10.1145/36100837:CSCW2(1-29)Online publication date: 4-Oct-2023
https://dl.acm.org/doi/10.1145/3610083
Davis MAghayi ELatoza TWang XMyers BSunshine J(2023)What’s (Not) Working in Programmer User Studies?ACM Transactions on Software Engineering and Methodology10.1145/358715732:5(1-32)Online publication date: 24-Jul-2023
https://dl.acm.org/doi/10.1145/3587157
Sanders KVahrenhold JMcCartney R(2023)How Do Computing Education Researchers Talk About Threats and Limitations?Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 110.1145/3568813.3600114(381-396)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3568813.3600114
Feitelson D(2023)From Code Complexity Metrics to Program ComprehensionCommunications of the ACM10.1145/354657666:5(52-61)Online publication date: 21-Apr-2023
https://dl.acm.org/doi/10.1145/3546576
Casamayor RCetina CPastor ÓPérez F(2023)Studying the Influence and Distribution of the Human Effort in a Hybrid Fitness Function for Search-Based Model-Driven EngineeringIEEE Transactions on Software Engineering10.1109/TSE.2023.332973049:12(5189-5202)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3329730
Sjøberg DBergersen G(2023)Construct Validity in Software EngineeringIEEE Transactions on Software Engineering10.1109/TSE.2022.317672549:3(1374-1396)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1109/TSE.2022.3176725
Show More Cited By

Abstract

References

Cited By

Index Terms

Recommendations

A Systematic Review of Theory Use in Software Engineering Experiments

A practical guide to controlled experiments of software engineering tools with human participants

Analyzing software engineering experiments: everything you always wanted to know but were afraid to ask

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations