Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1595696.1595716acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Fair and balanced?: bias in bug-fix datasets

Published: 24 August 2009 Publication History

Abstract

Software engineering researchers have long been interested in where and why bugs occur in code, and in predicting where they might turn up next. Historical bug-occurence data has been key to this research. Bug tracking systems, and code version histories, record when, how and by whom bugs were fixed; from these sources, datasets that relate file changes to bug fixes can be extracted. These historical datasets can be used to test hypotheses concerning processes of bug introduction, and also to build statistical bug prediction models. Unfortunately, processes and humans are imperfect, and only a fraction of bug fixes are actually labelled in source code version histories, and thus become available for study in the extracted datasets. The question naturally arises, are the bug fixes recorded in these historical datasets a fair representation of the full population of bug fixes? In this paper, we investigate historical data from several software projects, and find strong evidence of systematic bias. We then investigate the potential effects of "unfair, imbalanced" datasets on the performance of prediction techniques. We draw the lesson that bias is a critical problem that threatens both the effectiveness of processes that rely on biased datasets to build prediction models and the generalizability of hypotheses tested on biased data.

References

[1]
A. Agresti and B. Coull. Approximate Is Better Than "Exact" for Interval Estimation ofBinomial Proportions. The American Statistician, 52(2), 1998.
[2]
C. Ambroise and G. McLachlan. Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences,99(10):6562--6566, 2002.
[3]
A. Bachmann and A. Bernstein. Data retrieval, processing and linking for software process dataanalysis. Technical report, University of Zurich, 2009. Published May, 2009. http://www.ifi.uzh.ch/ddis/people/adrian-bachmann/pdq/.
[4]
A. Bachmann and A. Bernstein. Software process data quality and characteristics - a historical viewon open and closed source projects. IWPSE-EVOL 2009, 2009.
[5]
V. Basili, G. Caldiera, and H. Rombach. The Goal Question Metric Approach. Encyclopedia of Software Engineering, 1:528--532, 1994.
[6]
V. Basili and R. Selby Jr. Data collection and analysis in software research and management. Proc. of the American Statistical Association and BiomeasureSociety Joint Statistical Meetings, 1984.
[7]
V. Basili and D. Weiss. A methodology for collecting valid software engineering data. IEEE Transactions on Software Engineering, 10(6):728--738,1984.
[8]
Y. Benjamini and Y. Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 1995.
[9]
R. Berk. An introduction to sample selection bias in sociological data. American Sociological Review, 48(3):386--398, 1983.
[10]
Bugzilla Fields, http://www.eclipse.org/tptp/home/documents/process/development/bugzilla.html.
[11]
C. Catal and B. Diri. A systematic review of software fault prediction studies. Expert Systems With Applications, 2008.
[12]
W. J. Conover. Practical Nonparametric Statistics. John Wiley&Sons, 1971.
[13]
D. Cubranic, G. Murphy, J. Singer, and K. Booth. Hipikat: a project memory for software development. Software Engineering, IEEE Transactions on, 31(6):446--465,2005.
[14]
V. Dallmeier and T. Zimmermann. Extraction of bug localization benchmarks from history. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, 2007.
[15]
S. Dowdy, S. Wearden, and D. Chilko. Statistics for research. John Wiley&Sons, third edition, 2004.
[16]
M. Eaddy, T. Zimmermann, K. Sherwood, V. Garg, G. Murphy, N. Nagappan, and A. Aho. Do Crosscutting Concerns Cause Defects? IEEE Transactions on Software Engineering, 34(4):497--515, 2008.
[17]
P. Easterbrook, J. Berlin, R. Gopalan, and D. Matthews. Publication bias in clinical research. Lancet, 337(8746):867--72, 1991.
[18]
S. Easterbrook, J. Singer, M. Storey, and D. Damian. Selecting Empirical Methods for Software Engineering Research. Guide to Advanced Empirical Software Engineering, 2007.
[19]
M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from version control and bug tracking systems. In ICSM '03: Proceedings of the International Conference on Software Maintenance, page 23, 2003.
[20]
L. Gasser and G. Ripoche. Distributed collective practices and free/open-source software problem management: perspectives and methods. Proceedings of the Conference on Coopration, Innovations et Technologies, 2003.
[21]
M. Grabe, S. Zhou, and B. Barnett. Explicating sensationalism in television news: Content and the bells and whistles of form. Journal of Broadcasting&Electronic Media, 45:635, 2001.
[22]
R. Grady and D. Caswell. Software metrics: establishing a company-wide program. Prentice-Hall, Inc. Upper Saddle River, NJ, USA, 1987.
[23]
J. Heckman. Sample Selection Bias as a Specification Error. Econometrica, 47(1):153--161, 1979.
[24]
I. Herraiz, G. Robles, and J. Gonzalez-Barahona. Towards predictor models for large libre software projects. ACM SIGSOFT Software Engineering Notes, 30(4):1--6, 2005.
[25]
S. Kim, K. Pan, and E. Whitehead Jr. Memories of bug fixes. Proceedings of the ACM SIGSOFT international symposium on Foundations of software engineering, 2006.
[26]
S. Kim, T. Zimmermann, and K. Pan. Automatic Identification of Bug-Introducing Changes. Proceedings of the 21st IEEE International Conference on Automated Software Engineering, 2006.
[27]
S. Kim, T. Zimmermann, E. Whitehead Jr, and A. Zeller. Predicting Faults from Cached History. Proceedings of the International Conference on Software Engineering, 2007.
[28]
A. Koru and H. Liu. An investigation of the effect of module size on defect prediction using static measures. ACM SIGSOFT Software Engineering Notes (Special Promise Issue), 30(4):1--5, 2005.
[29]
A. G. Koru and J. Tian. Defect handling in medium and large open source projects. IEEE Software, 21(4):54--61, July/August 2004.
[30]
M. R. Levy. The Methodology and Performance of Election Day Polls. Public Opinion Quarterly, 47(1):54--67, 1983.
[31]
G. A. Liebchen and M. Shepperd. Data sets and data quality in software engineering. In PROMISE '08: Proceedings of the 4th international workshop on Predictor models in software engineering, 2008.
[32]
G. A. Liebchen, B. Twala, M. J. Shepperd, M. Cartwright, and M. Stephens. Filtering, robust filtering, polishing: Techniques for addressing quality in software data. In ESEM, pages 99--106, 2007.
[33]
R. Little and D. Rubin. Statistical analysis with missing data. Technometrics, 45(4):364--365, 2003.
[34]
T. M. Mitchell. Machine Learning. McGraw-Hill, 1997.
[35]
A. Mockus. Missing Data in Software Engineering. Empirical Methods in Software Engineering. The MIT Press), 2000.
[36]
R. Moser, W. Pedrycz, and G. Succi. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the international conference on Software engineering, 2008.
[37]
S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller. Predicting vulnerable software components. In Proc. of the ACM conference on Computer and communications security, 2007.
[38]
M. Nick and C. Tautz. Practical evaluation of an organizational memory using the goal-question-metric technique. Lecture notes in computer science, pages 138--147, 1999.
[39]
R. Nickerson. Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2:175--220, 1998.
[40]
D. Perry, A. Porter, and L. Votta. Empirical studies of software engineering: a roadmap. In Proc. of the conference on The future of Software engineering, 2000.
[41]
Promise '08: Proceedings of the 4th international workshop on predictor models in software engineering, 2008. Eds. B. Boetticher and T. Ostrand.
[42]
Promise Dataset, http://promisedata.org.
[43]
A. Schröter, T. Zimmermann, and A. Zeller. Predicting component failures at design time. In Proceedings of the International Symposium on Empirical Software Engineering, 2006.
[44]
F. Shull, J. Singer, and D. Sjøberg. Guide to Advanced Empirical Software Engineering. Springer Verlag, 2007.
[45]
R. A. Singleton, Jr. and B. C. Straits. Approaches to Social Research. Oxford University Press, 2005.
[46]
J. Śliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? In Proceedings of the international workshop on Mining software repositories, 2005.
[47]
K. Weiss. Confounding, ascertainment bias, and the blind quest for a genetic 'fountain of youth'. Annals of Medicine, 35:532--544, 2003.
[48]
B. Zadrozny. Learning and evaluating classifiers under sample selection bias. In Proc. of the twenty-first international conference on Machine learning, 2004.
[49]
T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In Proc. of the International Workshop on Predictor Models in Software Engineering, 2007.
[50]
T. Zimmermann and P. Weißgerber. Preprocessing CVS data for fine-grained analysis. In Proceedings of the International Workshop on Mining Software Repositories, 2004.
[51]
S. Zuboff. In the age of the smart machine: the future of work and power. Basic Books, 1988

Cited By

View all
  • (2024)Reevaluating the Defect Proneness of Atoms of Confusion in Java SystemsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686677(154-164)Online publication date: 24-Oct-2024
  • (2024)Improving Issue-PR Link Prediction via Knowledge-aware Heterogeneous Graph LearningIEEE Transactions on Software Engineering10.1109/TSE.2024.3408448(1-19)Online publication date: 2024
  • (2024)Evaluating SZZ Implementations: An Empirical Study on the Linux KernelIEEE Transactions on Software Engineering10.1109/TSE.2024.340671850:9(2219-2239)Online publication date: Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
August 2009
408 pages
ISBN:9781605580012
DOI:10.1145/1595696
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2009

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. bias

Qualifiers

  • Research-article

Conference

ESEC/FSE09
Sponsor:
ESEC/FSE09: Joint 12th European Software Engineering Conference
August 24 - 28, 2009
Amsterdam, The Netherlands

Acceptance Rates

ESEC/FSE '09 Paper Acceptance Rate 32 of 217 submissions, 15%;
Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)4
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Reevaluating the Defect Proneness of Atoms of Confusion in Java SystemsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686677(154-164)Online publication date: 24-Oct-2024
  • (2024)Improving Issue-PR Link Prediction via Knowledge-aware Heterogeneous Graph LearningIEEE Transactions on Software Engineering10.1109/TSE.2024.3408448(1-19)Online publication date: 2024
  • (2024)Evaluating SZZ Implementations: An Empirical Study on the Linux KernelIEEE Transactions on Software Engineering10.1109/TSE.2024.340671850:9(2219-2239)Online publication date: Sep-2024
  • (2024)Deriving change-prone thresholds from software evolution using ROC curvesThe Journal of Supercomputing10.1007/s11227-024-06366-5Online publication date: 20-Jul-2024
  • (2023)A Comprehensive Taxonomy for Prediction Models in Software EngineeringInformation10.3390/info1402011114:2(111)Online publication date: 10-Feb-2023
  • (2023)Seeing the Whole Elephant: Systematically Understanding and Uncovering Evaluation Biases in Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/356138232:3(1-37)Online publication date: 27-Apr-2023
  • (2023)Aide-mémoire: Improving a Project’s Collective Memory via Pull Request–Issue LinksACM Transactions on Software Engineering and Methodology10.1145/354293732:2(1-36)Online publication date: 29-Mar-2023
  • (2023)Inconsistent Defect Labels: Essence, Causes, and InfluenceIEEE Transactions on Software Engineering10.1109/TSE.2022.315678749:2(586-610)Online publication date: 1-Feb-2023
  • (2023)SyntaxLineDP: a Line-level Software Defect Prediction Model based on Extended Syntax Information2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)10.1109/QRS60937.2023.00018(83-94)Online publication date: 22-Oct-2023
  • (2023)Pathways to Leverage Transcompiler based Data Augmentation for Cross-Language Clone Detection2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)10.1109/ICPC58990.2023.00031(169-180)Online publication date: May-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media