research-article

A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes

Authors:

Daniel Alencar da Costa,

Shane McIntosh,

Roberta Coelho,

Ahmed E. HassanAuthors Info & Claims

IEEE Transactions on Software Engineering, Volume 43, Issue 7

Pages 641 - 657

https://doi.org/10.1109/TSE.2016.2616306

Published: 01 July 2017 Publication History

Abstract

The approach proposed by Śliwerski, Zimmermann, and Zeller (SZZ) for identifying bug-introducing changes is at the foundation of several research areas within the software engineering discipline. Despite the foundational role of SZZ, little effort has been made to evaluate its results. Such an evaluation is a challenging task because the ground truth is not readily available. By acknowledging such challenges, we propose a framework to evaluate the results of alternative SZZ implementations. The framework evaluates the following criteria: (1) the earliest bug appearance, (2) the future impact of changes, and (3) the realism of bug introduction. We use the proposed framework to evaluate five SZZ implementations using data from ten open source projects. We find that previously proposed improvements to SZZ tend to inflate the number of incorrectly identified bug-introducing changes. We also find that a single bug-introducing change may be blamed for introducing hundreds of future bugs. Furthermore, we find that SZZ implementations report that at least 46 percent of the bugs are caused by bug-introducing changes that are years apart from one another. Such results suggest that current SZZ implementations still lack mechanisms to accurately identify bug-introducing changes. Our proposed framework provides a systematic mean for evaluating the data that is generated by a given SZZ implementation.

References

[1]

M. Lerner, “Software maintenance crisis resolution: The new IEEE standard,” Softw. Develop., vol. 2, no. 8, pp. 65 –72, Aug. 1994.

[2]

T. D. LaToza, G. Venolia, and R. DeLine, “Maintaining mental models: A study of developer work habits,” in Proc. 28th Int. Conf. Softw. Eng., 2006, pp. 492–501.

Digital Library

[3]

T. Gyimothy, R. Ferenc, and I. Siket, “Empirical validation of object-oriented metrics on open source software for fault prediction,” IEEE Trans. Softw. Eng., vol. 31, no. 10, pp. 897–910, Oct. 2005.

Digital Library

[4]

A. E. Hassan, “Predicting faults using the complexity of code changes,” in Proc. 31st Int. Conf. Softw. Eng., 2009, pp. 78–88.

Digital Library

[5]

P. L. Li, J. Herbsleb, M. Shaw, and B. Robinson, “Experiences and results from initiating field defect prediction and product test prioritization efforts at ABB Inc,” in Proc. 28th Int. Conf. Softw. Eng., 2006, pp. 413 –422.

Digital Library

[6]

S. Kim, E. J. Whitehead, and Y. Zhang, “Classifying software changes: Clean or buggy?” IEEE Trans. Softw. Eng., vol. 34, no. 2, pp. 181–196, Mar./Apr. 2008.

Digital Library

[7]

A. Mockus and D. M. Weiss, “Predicting risk of software changes,” Bell Labs Tech. J., vol. 5, no. 2, pp. 169–180, Apr. –Jun. 2000.

[8]

Y. Kamei, et al., “A large-scale empirical study of just-in-time quality assurance,” IEEE Trans. Softw. Eng., vol. 39, no. 6, pp. 757–773, Jun. 2013.

Digital Library

[9]

J. Śliwerski, T. Zimmermann, and A. Zeller, “ When do changes induce fixes?” ACM SIGSOFT Softw. Eng. Notes, vol. 30, pp. 1–5, 2005.

Digital Library

[10]

J. Eyolfson, L. Tan, and P. Lam, “Do time of day and developer experience affect commit bugginess?” in Proc. 8th Working Conf. Mining Softw. Repositories, 2011, pp. 153–162.

Digital Library

[11]

S. Kim, T. Zimmermann, K. Pan, and E. J. Whitehead, “Automatic identification of bug-introducing changes,” in Proc. 21st IEEE/ACM Int. Conf. Automated Softw. Eng. , 2006, pp. 81–90.

Digital Library

[12]

M. Asaduzzaman, M. C. Bullock, C. K. Roy, and K. A. Schneider, “Bug introducing changes: A case study with Android,” in Proc. 9th Working Conf. Mining Softw. Repositories , 2012, pp. 116–119.

[13]

F. Rahman and P. Devanbu, “Ownership, experience and defects: A fine-grained study of authorship,” in Proc. 33rd Int. Conf. Softw. Eng., 2011, pp. 491 –500.

[14]

K. Pan, S. Kim, and E. J. Whitehead Jr, “Toward an understanding of bug fix patterns,” Empirical Softw. Eng., vol. 14, pp. 286–315, 2009.

Digital Library

[15]

S. Kim and E. J. Whitehead Jr., “How long did it take to fix bugs?” in Proc. 3rd Int. Workshop Mining Softw. Repositories, 2006, pp. 173–174.

[16]

H. Yang, C. Wang, Q. Shi, Y. Feng, and Z. Chen, “Bug inducing analysis to prevent fault prone bug fixes,” in Proc. 26th Int. Conf. Softw. Eng. Knowl. Eng., 2014, pp. 620–625.

[17]

M. L. Bernardi, G. Canfora, G. A. Di Lucca, M. Di Penta, and D. Distante, “Do developers introduce bugs when they do not communicate? the case of Eclipse and Mozilla,” in Proc. 16th Eur. Conf. Softw. Maintenance Reengineering, 2012, pp. 139–148.

[18]

F. Rahman, C. Bird, and P. Devanbu, “Clones: What is that smell?” Empirical Softw. Eng., vol. 17, pp. 503–530, 2012.

Digital Library

[19]

G. Canfora, M. Ceccarelli, L. Cerulo, and M. Di Penta, “How long does a bug survive? an empirical study,” in Proc. 18th Working Conf. Reverse Eng., 2011, pp. 191–200.

[20]

J. Ell, “Identifying failure inducing developer pairs within developer networks,” in Proc. 35th Int. Conf. Softw. Eng., 2013, pp. 1471–1473.

[21]

S. Kim, T. Zimmermann, E. J. Whitehead Jr, and A. Zeller, “Predicting faults from cached history,” in Proc. 29th Int. Conf. Softw. Eng., 2007, pp. 489–498.

Digital Library

[22]

D. A. da Costa, U. Kulesza, E. Aranha, and R. Coelho, “Unveiling developers contributions behind code commits: An exploratory study,” in Proc. 29th Annu. ACM Symp. Appl. Comput., 2014, pp. 1152–1157.

Digital Library

[23]

Y. Kamei, S. Matsumoto, A. Monden, K.-I. Matsumoto, B. Adams, and A. E. Hassan, “Revisiting common bug prediction findings using effort-aware models,” in Proc. 26th IEEE Int. Conf. Softw. Maintenance, 2010, pp. 1 –10.

Digital Library

[24]

T. Fukushima, Y. Kamei, S. McIntosh, K. Yamashita, and N. Ubayashi, “An empirical study of just-in-time defect prediction using cross-project models,” in Proc. 11th Working Conf. Mining Softw. Repositories, 2014, pp. 172–181.

Digital Library

[25]

O. Mizuno and H. Hata, “Prediction of fault-prone modules using a text filtering based metric,” Int. J. Softw. Eng. Appl., vol. 4, pp. 43–52, 2010.

[26]

C. Williams and J. Spacco, “SZZ revisited: Verifying when changes induce fixes,” in Proc. Workshop. Defects Large Softw. Syst., 2008, pp. 32 –36.

[27]

S. Davies, M. Roper, and M. Wood, “Comparing text-based and dependence-based approaches for determining the origins of bugs,” J. Softw.: Evolution Process, vol. 26, pp. 107–139, 2014.

[28]

J. Śliwerski, T. Zimmermann, and A. Zeller, “ HATARI: Raising risk awareness,” ACM SIGSOFT Softw. Eng. Notes, vol. 30, pp. 107–110, 2005.

Digital Library

[29]

L. Prechelt and A. Pepper, “Why software repositories are not used for defect-insertion circumstance analysis more often: A case study,” Inf. Softw. Technol., vol. 56, pp. 1377 –1389, 2014.

Digital Library

[30]

T. Zimmermann, S. Kim, A. Zeller, and E. J. Whitehead Jr, “Mining version archives for co-changed lines,” in Proc. Int. Workshop Mining Softw. Repositories , 2006, pp. 72–75.

[31]

C. C. Williams and J. W. Spacco, “Branching and merging in the repository,” in Proc. 5th Int. Working Conf. Mining Softw. Repositories, 2008, pp. 19– 22.

[32]

A. Mockus and L. G. Votta, “Identifying reasons for software changes using historic databases,” in Proc. 16th Int. Conf. Softw. Maintenance, 2000, pp. 120 –130.

[33]

D. Čubranić and G. C. Murphy, “Hipikat: Recommending pertinent software development artifacts,” in Proc. 25th Int. Conf. Softw. Eng., 2003, pp. 408 –418.

[34]

M. Fischer, M. Pinzger, and H. Gall, “Populating a release history database from version control and bug tracking systems,” in Proc. 19th Int. Conf. Softw. Maintenance, 2003, pp. 23 –32.

[35]

M. W. Godfrey and L. Zou, “Using origin analysis to detect merging and splitting of source code entities,” IEEE Trans. Softw. Eng., vol. 31, no. 2, pp. 166–181, Feb. 2005.

Digital Library

[36]

A. E. Hassan and R. C. Holt, “The top ten list: Dynamic fault prediction,” in Proc. 21st Int. Conf. Softw. Maintenance, 2005, pp. 263 –272.

[37]

N. Nagappan and T. Ball, “Use of relative code churn measures to predict system defect density,” in Proc. 27th Int. Conf. Softw. Eng., 2005, pp. 284 –292.

Digital Library

[38]

T. J. Ostrand, E. J. Weyuker, and R. M. Bell, “ Where the bugs are,” ACM SIGSOFT Softw. Eng. Notes, vol. 29, pp. 86–96, 2004.

Digital Library

[39]

S. Shivaji, E. J. Whitehead, R. Akella, and S. Kim, “Reducing features to improve code change-based bug prediction,” IEEE Trans. Softw. Eng., vol. 39, vol. 4, pp. 552–569, Apr. 2013.

Digital Library

[40]

M. Kim, S. Sinha, C. Gorg, H. Shah, M. J. Harrold, and M. G. Nanda, “ Automated bug neighborhood analysis for identifying incomplete bug fixes,” in Proc. 3rd Int. Conf. Softw. Testing Verification Validation, 2010, pp. 383–392.

[41]

V. S. Sinha, S. Sinha, and S. Rao, “BUGINNINGS: Identifying the origins of a bug,” in Proc. 3rd Indian Softw. Eng. Conf., 2010, pp. 3–12.

[42]

O. Alam, B. Adams, and A. E. Hassan, “ Preserving knowledge in software projects,” J. Syst. Softw., vol. 85, pp. 2318–2330, 2012.

Digital Library

[43]

O. Alam, A. Bram, and A. E. Hassan, “ Measuring the progress of projects using the time dependence of code changes,” in Proc. 25th IEEE Int. Conf. Softw. Maintenance, 2009, pp. 329 –338.

[44]

T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy, “Predicting fault incidence using software change history,” IEEE Trans. Softw. Eng., vol. 26, no. 7, pp. 653–661, Jul. 2000 .

Digital Library

[45]

T.-H. Chen, M. Nagappan, E. Shihab, and A. E. Hassan, “An empirical study of dormant bugs,” in Proc. 11th Working Conf. Mining Softw. Repositories, 2014, pp. 82–91.

Digital Library

[46]

D. C. Howell, “Median absolute deviation,” Encyclopedia Statistics Behavioral Sci., 2005.

[47]

C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata, “Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median,” J. Experimental Social Psychology, vol. 49, pp. 764–766, 2013.

[48]

T. Zimmermann, S. Kim, A. Zeller, and E. J. Whitehead Jr, “Mining version archives for co-changed lines. technical report,” [Online]. Available: http://www.st.cs.uni-sb.de/softevo/, Accessed on : Apr. 30, 2016.

[49]

F. Servant and J. A. Jones, “History slicing: Assisting code-evolution tasks,” in Proc. ACM SIGSOFT 20th Int. Symp. Found. Softw. Eng., 2012, Art. no. 43.

[50]

N. Cliff, “Dominance statistics: Ordinal analyses to answer ordinal questions,” Psychological Bulletin, vol. 114, 1993, p. 494.

[51]

K. Charmaz, Constructing Grounded Theory. Newbury Park, CA, USA : Sage, 2014.

[52]

D. Steidl, B. Hummel, and E. Juergens, “ Incremental origin analysis of source code files,” in Proc. 11th Working Conf. Mining Softw. Repositories, 2014, pp. 42–51.

Digital Library

[53]

R. Souza, C. Chavez, and R. Bittencourt, “Rapid releases and patch backouts: A software analytics approach,” IEEE Softw., vol. 32, no. 2, pp. 89–96, Mar./Apr. 2015.

Digital Library

[54]

K. Herzig and A. Zeller, “Mining cause-effect-chains from version histories,” in Proc. 22nd IEEE Int. Symp. Softw.e Rel. Eng., 2011, pp. 60 –69.

Cited By

Jiang MJiang JWu TMa ZLuo XZhou Y(2024)Understanding Vulnerability Inducing Commits of the Linux KernelACM Transactions on Software Engineering and Methodology10.1145/367245233:7(1-28)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3672452
Yu JFu MIgnatiev ATantithamthavorn CStuckey P(2024)A Formal Explainer for Just-In-Time Defect PredictionsACM Transactions on Software Engineering and Methodology10.1145/366480933:7(1-31)Online publication date: 26-Aug-2024
https://dl.acm.org/doi/10.1145/3664809
Gu KZhang YCao JTan XYang Md'Amorim M(2024)How Well Industry-Level Cause Bisection Works in Real-World: A Study on Linux KernelCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663828(62-73)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663828
Show More Cited By

Index Terms

A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. Software management
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software notations and tools
    1. Software configuration management and version control systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Identifying bug-inducing changes for code additions
ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Background. SZZ algorithm has been popularly used to identify bug-inducing changes in version history. It is still limited to link a fixing change to an inducing one, when the fix constitutes of code additions only. Goal. We improve the original SZZ by ...
SZZ revisited: verifying when changes induce fixes
DEFECTS '08: Proceedings of the 2008 workshop on Defects in large software systems

Automatically identifying commits that induce fixes is an important task, as it enables researchers to quickly and efficiently validate many types of software engineering analyses, such as software metrics or models for predicting faulty components. ...
An empirical study on the use of SZZ for identifying inducing changes of non-functional bugs
Abstract
Non-functional bugs, e.g., performance bugs and security bugs, bear a heavy cost on both software developers and end-users. For example, IBM estimates the cost of a single data breach to be millions of dollars. Tools to reduce the occurrence, ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering

IEEE Transactions on Software Engineering Volume 43, Issue 7

July 2017

104 pages

ISSN:0098-5589

Issue’s Table of Contents

0098-5589 © 2016 IEEE.

Publisher

IEEE Press

Publication History

Published: 01 July 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

62
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jiang MJiang JWu TMa ZLuo XZhou Y(2024)Understanding Vulnerability Inducing Commits of the Linux KernelACM Transactions on Software Engineering and Methodology10.1145/367245233:7(1-28)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3672452
Yu JFu MIgnatiev ATantithamthavorn CStuckey P(2024)A Formal Explainer for Just-In-Time Defect PredictionsACM Transactions on Software Engineering and Methodology10.1145/366480933:7(1-31)Online publication date: 26-Aug-2024
https://dl.acm.org/doi/10.1145/3664809
Gu KZhang YCao JTan XYang Md'Amorim M(2024)How Well Industry-Level Cause Bisection Works in Real-World: A Study on Linux KernelCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663828(62-73)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663828
Olewicki DHabchi SAdams B(2024)An Empirical Study on Code Review Activity Prediction and Its Impact in PracticeProceedings of the ACM on Software Engineering10.1145/36608061:FSE(2238-2260)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660806
Begoug MChouchen MOuni AAbdullah Alomar EMkaouer MSpinellis DConstantinou EBacchelli A(2024)Fine-Grained Just-In-Time Defect Prediction at the Block Level in Infrastructure-as-Code (IaC)Proceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644934(100-112)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644934
Le TDu XBabar MSpinellis DConstantinou EBacchelli A(2024)Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644919(716-727)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644919
Sun GMeidani MHabchi SNayrolles MMcintosh SRoychoudhury APaiva AAbreu RStorey MAniche MNagappan N(2024)Code Impact Beyond Disciplinary Boundaries: Constructing a Multidisciplinary Dependency Graph and Analyzing Cross-Boundary ImpactProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639726(122-133)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639477.3639726
Olewicki DHabchi SNayrolles MFaramarzi MChandar SAdams BRoychoudhury APaiva AAbreu RStorey MAniche MNagappan N(2024)On the Costs and Benefits of Adopting Lifelong Learning for Software Analytics - Empirical Study on Brown Build and Risk PredictionProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639717(275-286)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639477.3639717
Guo SLi DHuang LLv SChen RLi HLi XJiang H(2024)Estimating Uncertainty in Labeled Changes by SZZ Tools on Just-In-Time Defect PredictionACM Transactions on Software Engineering and Methodology10.1145/363722633:4(1-25)Online publication date: 18-Apr-2024
https://dl.acm.org/doi/10.1145/3637226
Hasan MTsantalis NAlikhanifard P(2024)Refactoring-Aware Block Tracking in Commit HistoryIEEE Transactions on Software Engineering10.1109/TSE.2024.348458650:12(3330-3350)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3484586
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents