research-article

Fair and balanced?: bias in bug-fix datasets

Authors:

Christian Bird,

Adrian Bachmann,

Abraham Bernstein,

Vladimir Filkov,

Premkumar DevanbuAuthors Info & Claims

ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

Pages 121 - 130

https://doi.org/10.1145/1595696.1595716

Published: 24 August 2009 Publication History

Abstract

Software engineering researchers have long been interested in where and why bugs occur in code, and in predicting where they might turn up next. Historical bug-occurence data has been key to this research. Bug tracking systems, and code version histories, record when, how and by whom bugs were fixed; from these sources, datasets that relate file changes to bug fixes can be extracted. These historical datasets can be used to test hypotheses concerning processes of bug introduction, and also to build statistical bug prediction models. Unfortunately, processes and humans are imperfect, and only a fraction of bug fixes are actually labelled in source code version histories, and thus become available for study in the extracted datasets. The question naturally arises, are the bug fixes recorded in these historical datasets a fair representation of the full population of bug fixes? In this paper, we investigate historical data from several software projects, and find strong evidence of systematic bias. We then investigate the potential effects of "unfair, imbalanced" datasets on the performance of prediction techniques. We draw the lesson that bias is a critical problem that threatens both the effectiveness of processes that rely on biased datasets to build prediction models and the generalizability of hypotheses tested on biased data.

References

[1]

A. Agresti and B. Coull. Approximate Is Better Than "Exact" for Interval Estimation ofBinomial Proportions. The American Statistician, 52(2), 1998.

[2]

C. Ambroise and G. McLachlan. Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences,99(10):6562--6566, 2002.

[3]

A. Bachmann and A. Bernstein. Data retrieval, processing and linking for software process dataanalysis. Technical report, University of Zurich, 2009. Published May, 2009. http://www.ifi.uzh.ch/ddis/people/adrian-bachmann/pdq/.

[4]

A. Bachmann and A. Bernstein. Software process data quality and characteristics - a historical viewon open and closed source projects. IWPSE-EVOL 2009, 2009.

Digital Library

[5]

V. Basili, G. Caldiera, and H. Rombach. The Goal Question Metric Approach. Encyclopedia of Software Engineering, 1:528--532, 1994.

[6]

V. Basili and R. Selby Jr. Data collection and analysis in software research and management. Proc. of the American Statistical Association and BiomeasureSociety Joint Statistical Meetings, 1984.

[7]

V. Basili and D. Weiss. A methodology for collecting valid software engineering data. IEEE Transactions on Software Engineering, 10(6):728--738,1984.

Digital Library

[8]

Y. Benjamini and Y. Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 1995.

[9]

R. Berk. An introduction to sample selection bias in sociological data. American Sociological Review, 48(3):386--398, 1983.

[10]

Bugzilla Fields, http://www.eclipse.org/tptp/home/documents/process/development/bugzilla.html.

[11]

C. Catal and B. Diri. A systematic review of software fault prediction studies. Expert Systems With Applications, 2008.

[12]

W. J. Conover. Practical Nonparametric Statistics. John Wiley&Sons, 1971.

[13]

D. Cubranic, G. Murphy, J. Singer, and K. Booth. Hipikat: a project memory for software development. Software Engineering, IEEE Transactions on, 31(6):446--465,2005.

Digital Library

[14]

V. Dallmeier and T. Zimmermann. Extraction of bug localization benchmarks from history. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, 2007.

Digital Library

[15]

S. Dowdy, S. Wearden, and D. Chilko. Statistics for research. John Wiley&Sons, third edition, 2004.

[16]

M. Eaddy, T. Zimmermann, K. Sherwood, V. Garg, G. Murphy, N. Nagappan, and A. Aho. Do Crosscutting Concerns Cause Defects? IEEE Transactions on Software Engineering, 34(4):497--515, 2008.

Digital Library

[17]

P. Easterbrook, J. Berlin, R. Gopalan, and D. Matthews. Publication bias in clinical research. Lancet, 337(8746):867--72, 1991.

[18]

S. Easterbrook, J. Singer, M. Storey, and D. Damian. Selecting Empirical Methods for Software Engineering Research. Guide to Advanced Empirical Software Engineering, 2007.

[19]

M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from version control and bug tracking systems. In ICSM '03: Proceedings of the International Conference on Software Maintenance, page 23, 2003.

Digital Library

[20]

L. Gasser and G. Ripoche. Distributed collective practices and free/open-source software problem management: perspectives and methods. Proceedings of the Conference on Coopration, Innovations et Technologies, 2003.

[21]

M. Grabe, S. Zhou, and B. Barnett. Explicating sensationalism in television news: Content and the bells and whistles of form. Journal of Broadcasting&Electronic Media, 45:635, 2001.

[22]

R. Grady and D. Caswell. Software metrics: establishing a company-wide program. Prentice-Hall, Inc. Upper Saddle River, NJ, USA, 1987.

Digital Library

[23]

J. Heckman. Sample Selection Bias as a Specification Error. Econometrica, 47(1):153--161, 1979.

[24]

I. Herraiz, G. Robles, and J. Gonzalez-Barahona. Towards predictor models for large libre software projects. ACM SIGSOFT Software Engineering Notes, 30(4):1--6, 2005.

Digital Library

[25]

S. Kim, K. Pan, and E. Whitehead Jr. Memories of bug fixes. Proceedings of the ACM SIGSOFT international symposium on Foundations of software engineering, 2006.

Digital Library

[26]

S. Kim, T. Zimmermann, and K. Pan. Automatic Identification of Bug-Introducing Changes. Proceedings of the 21st IEEE International Conference on Automated Software Engineering, 2006.

Digital Library

[27]

S. Kim, T. Zimmermann, E. Whitehead Jr, and A. Zeller. Predicting Faults from Cached History. Proceedings of the International Conference on Software Engineering, 2007.

Digital Library

[28]

A. Koru and H. Liu. An investigation of the effect of module size on defect prediction using static measures. ACM SIGSOFT Software Engineering Notes (Special Promise Issue), 30(4):1--5, 2005.

Digital Library

[29]

A. G. Koru and J. Tian. Defect handling in medium and large open source projects. IEEE Software, 21(4):54--61, July/August 2004.

Digital Library

[30]

M. R. Levy. The Methodology and Performance of Election Day Polls. Public Opinion Quarterly, 47(1):54--67, 1983.

[31]

G. A. Liebchen and M. Shepperd. Data sets and data quality in software engineering. In PROMISE '08: Proceedings of the 4th international workshop on Predictor models in software engineering, 2008.

Digital Library

[32]

G. A. Liebchen, B. Twala, M. J. Shepperd, M. Cartwright, and M. Stephens. Filtering, robust filtering, polishing: Techniques for addressing quality in software data. In ESEM, pages 99--106, 2007.

Digital Library

[33]

R. Little and D. Rubin. Statistical analysis with missing data. Technometrics, 45(4):364--365, 2003.

[34]

T. M. Mitchell. Machine Learning. McGraw-Hill, 1997.

Digital Library

[35]

A. Mockus. Missing Data in Software Engineering. Empirical Methods in Software Engineering. The MIT Press), 2000.

[36]

R. Moser, W. Pedrycz, and G. Succi. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the international conference on Software engineering, 2008.

Digital Library

[37]

S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller. Predicting vulnerable software components. In Proc. of the ACM conference on Computer and communications security, 2007.

Digital Library

[38]

M. Nick and C. Tautz. Practical evaluation of an organizational memory using the goal-question-metric technique. Lecture notes in computer science, pages 138--147, 1999.

[39]

R. Nickerson. Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2:175--220, 1998.

[40]

D. Perry, A. Porter, and L. Votta. Empirical studies of software engineering: a roadmap. In Proc. of the conference on The future of Software engineering, 2000.

Digital Library

[41]

Promise '08: Proceedings of the 4th international workshop on predictor models in software engineering, 2008. Eds. B. Boetticher and T. Ostrand.

[42]

Promise Dataset, http://promisedata.org.

[43]

A. Schröter, T. Zimmermann, and A. Zeller. Predicting component failures at design time. In Proceedings of the International Symposium on Empirical Software Engineering, 2006.

Digital Library

[44]

F. Shull, J. Singer, and D. Sjøberg. Guide to Advanced Empirical Software Engineering. Springer Verlag, 2007.

[45]

R. A. Singleton, Jr. and B. C. Straits. Approaches to Social Research. Oxford University Press, 2005.

[46]

J. Śliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? In Proceedings of the international workshop on Mining software repositories, 2005.

Digital Library

[47]

K. Weiss. Confounding, ascertainment bias, and the blind quest for a genetic 'fountain of youth'. Annals of Medicine, 35:532--544, 2003.

[48]

B. Zadrozny. Learning and evaluating classifiers under sample selection bias. In Proc. of the twenty-first international conference on Machine learning, 2004.

Digital Library

[49]

T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In Proc. of the International Workshop on Predictor Models in Software Engineering, 2007.

Digital Library

[50]

T. Zimmermann and P. Weißgerber. Preprocessing CVS data for fine-grained analysis. In Proceedings of the International Workshop on Mining Software Repositories, 2004.

[51]

S. Zuboff. In the age of the smart machine: the future of work and power. Basic Books, 1988

Digital Library

Cited By

Shi GKazemi FGodfrey MMcIntosh S(2024)Reevaluating the Defect Proneness of Atoms of Confusion in Java SystemsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686677(154-164)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3686677
Bai SLiu HDai ELiu L(2024)Improving Issue-PR Link Prediction via Knowledge-aware Heterogeneous Graph LearningIEEE Transactions on Software Engineering10.1109/TSE.2024.3408448(1-19)Online publication date: 2024
https://doi.org/10.1109/TSE.2024.3408448
Lyu YKang HWidyasari RLawall JLo D(2024)Evaluating SZZ Implementations: An Empirical Study on the Linux KernelIEEE Transactions on Software Engineering10.1109/TSE.2024.340671850:9(2219-2239)Online publication date: Sep-2024
https://doi.org/10.1109/TSE.2024.3406718
Show More Cited By

Index Terms

Fair and balanced?: bias in bug-fix datasets
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Reusability
        Software product lines
    2. Software verification and validation
      1. Process validation

Recommendations

The missing links: bugs and bug-fix commits
FSE '10: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering

Empirical studies of software defects rely on links between bug databases and program code repositories. This linkage is typically based on bug-fixes identified in developer-entered commit logs. Unfortunately, developers do not always report which ...
Potential biases in bug localization: do they matter?
ASE '14: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering

Issue tracking systems are valuable resources during software maintenance activities and contain information about the issues faced during the development of a project as well as after its release. Many projects receive many reports of bugs and it is ...
Balanced codes

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

August 2009

408 pages

ISBN:9781605580012

DOI:10.1145/1595696

General Chair:
Hans van Vliet
VU University Amsterdam, The Netherlands
,
Program Chair:
Valerie Issarny
INRIA-Paris-Rocquencourt, France

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

bias

Qualifiers

Research-article

Conference

ESEC/FSE09

Sponsor:

ESEC/FSE09: Joint 12th European Software Engineering Conference

August 24 - 28, 2009

Amsterdam, The Netherlands

Acceptance Rates

ESEC/FSE '09 Paper Acceptance Rate 32 of 217 submissions, 15%;

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

250
Total Citations
View Citations
1,496
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)4

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shi GKazemi FGodfrey MMcIntosh S(2024)Reevaluating the Defect Proneness of Atoms of Confusion in Java SystemsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686677(154-164)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3686677
Bai SLiu HDai ELiu L(2024)Improving Issue-PR Link Prediction via Knowledge-aware Heterogeneous Graph LearningIEEE Transactions on Software Engineering10.1109/TSE.2024.3408448(1-19)Online publication date: 2024
https://doi.org/10.1109/TSE.2024.3408448
Lyu YKang HWidyasari RLawall JLo D(2024)Evaluating SZZ Implementations: An Empirical Study on the Linux KernelIEEE Transactions on Software Engineering10.1109/TSE.2024.340671850:9(2219-2239)Online publication date: Sep-2024
https://doi.org/10.1109/TSE.2024.3406718
Shatnawi R(2024)Deriving change-prone thresholds from software evolution using ROC curvesThe Journal of Supercomputing10.1007/s11227-024-06366-5Online publication date: 20-Jul-2024
https://doi.org/10.1007/s11227-024-06366-5
Yang XLiu JZhang D(2023)A Comprehensive Taxonomy for Prediction Models in Software EngineeringInformation10.3390/info1402011114:2(111)Online publication date: 10-Feb-2023
https://doi.org/10.3390/info14020111
Yang DLei YMao XQi YYi X(2023)Seeing the Whole Elephant: Systematically Understanding and Uncovering Evaluation Biases in Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/356138232:3(1-37)Online publication date: 27-Apr-2023
https://dl.acm.org/doi/10.1145/3561382
Pârţachi PWhite DBarr E(2023)Aide-mémoire: Improving a Project’s Collective Memory via Pull Request–Issue LinksACM Transactions on Software Engineering and Methodology10.1145/354293732:2(1-36)Online publication date: 29-Mar-2023
https://dl.acm.org/doi/10.1145/3542937
Liu SGuo ZLi YWang CChen LSun ZZhou YXu B(2023)Inconsistent Defect Labels: Essence, Causes, and InfluenceIEEE Transactions on Software Engineering10.1109/TSE.2022.315678749:2(586-610)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TSE.2022.3156787
Zhu JHuang YChen XWang RZheng Z(2023)SyntaxLineDP: a Line-level Software Defect Prediction Model based on Extended Syntax Information2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)10.1109/QRS60937.2023.00018(83-94)Online publication date: 22-Oct-2023
https://doi.org/10.1109/QRS60937.2023.00018
Pinku SMondal DRoy C(2023)Pathways to Leverage Transcompiler based Data Augmentation for Cross-Language Clone Detection2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)10.1109/ICPC58990.2023.00031(169-180)Online publication date: May-2023
https://doi.org/10.1109/ICPC58990.2023.00031
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents