Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes

Published: 01 July 2017 Publication History

Abstract

The approach proposed by Śliwerski, Zimmermann, and Zeller (SZZ) for identifying bug-introducing changes is at the foundation of several research areas within the software engineering discipline. Despite the foundational role of SZZ, little effort has been made to evaluate its results. Such an evaluation is a challenging task because the ground truth is not readily available. By acknowledging such challenges, we propose a framework to evaluate the results of alternative SZZ implementations. The framework evaluates the following criteria: (1) the earliest bug appearance, (2) the future impact of changes, and (3) the realism of bug introduction. We use the proposed framework to evaluate five SZZ implementations using data from ten open source projects. We find that previously proposed improvements to SZZ tend to inflate the number of incorrectly identified bug-introducing changes. We also find that a single bug-introducing change may be blamed for introducing hundreds of future bugs. Furthermore, we find that SZZ implementations report that at least 46 percent of the bugs are caused by bug-introducing changes that are years apart from one another. Such results suggest that current SZZ implementations still lack mechanisms to accurately identify bug-introducing changes. Our proposed framework provides a systematic mean for evaluating the data that is generated by a given SZZ implementation.

References

[1]
M. Lerner, “Software maintenance crisis resolution: The new IEEE standard,” Softw. Develop., vol. 2, no. 8, pp. 65 –72, Aug. 1994.
[2]
T. D. LaToza, G. Venolia, and R. DeLine, “Maintaining mental models: A study of developer work habits,” in Proc. 28th Int. Conf. Softw. Eng., 2006, pp. 492–501.
[3]
T. Gyimothy, R. Ferenc, and I. Siket, “Empirical validation of object-oriented metrics on open source software for fault prediction,” IEEE Trans. Softw. Eng., vol. 31, no. 10, pp. 897–910, Oct. 2005.
[4]
A. E. Hassan, “Predicting faults using the complexity of code changes,” in Proc. 31st Int. Conf. Softw. Eng., 2009, pp. 78–88.
[5]
P. L. Li, J. Herbsleb, M. Shaw, and B. Robinson, “Experiences and results from initiating field defect prediction and product test prioritization efforts at ABB Inc,” in Proc. 28th Int. Conf. Softw. Eng., 2006, pp. 413 –422.
[6]
S. Kim, E. J. Whitehead, and Y. Zhang, “Classifying software changes: Clean or buggy?” IEEE Trans. Softw. Eng., vol. 34, no. 2, pp. 181–196, Mar./Apr. 2008.
[7]
A. Mockus and D. M. Weiss, “Predicting risk of software changes,” Bell Labs Tech. J., vol. 5, no. 2, pp. 169–180, Apr. –Jun. 2000.
[8]
Y. Kamei, et al., “A large-scale empirical study of just-in-time quality assurance,” IEEE Trans. Softw. Eng., vol. 39, no. 6, pp. 757–773, Jun. 2013.
[9]
J. Śliwerski, T. Zimmermann, and A. Zeller, “ When do changes induce fixes?” ACM SIGSOFT Softw. Eng. Notes, vol. 30, pp. 1–5, 2005.
[10]
J. Eyolfson, L. Tan, and P. Lam, “Do time of day and developer experience affect commit bugginess?” in Proc. 8th Working Conf. Mining Softw. Repositories, 2011, pp. 153–162.
[11]
S. Kim, T. Zimmermann, K. Pan, and E. J. Whitehead, “Automatic identification of bug-introducing changes,” in Proc. 21st IEEE/ACM Int. Conf. Automated Softw. Eng. , 2006, pp. 81–90.
[12]
M. Asaduzzaman, M. C. Bullock, C. K. Roy, and K. A. Schneider, “Bug introducing changes: A case study with Android,” in Proc. 9th Working Conf. Mining Softw. Repositories , 2012, pp. 116–119.
[13]
F. Rahman and P. Devanbu, “Ownership, experience and defects: A fine-grained study of authorship,” in Proc. 33rd Int. Conf. Softw. Eng., 2011, pp. 491 –500.
[14]
K. Pan, S. Kim, and E. J. Whitehead Jr, “Toward an understanding of bug fix patterns,” Empirical Softw. Eng., vol. 14, pp. 286–315, 2009.
[15]
S. Kim and E. J. Whitehead Jr., “How long did it take to fix bugs?” in Proc. 3rd Int. Workshop Mining Softw. Repositories, 2006, pp. 173–174.
[16]
H. Yang, C. Wang, Q. Shi, Y. Feng, and Z. Chen, “Bug inducing analysis to prevent fault prone bug fixes,” in Proc. 26th Int. Conf. Softw. Eng. Knowl. Eng., 2014, pp. 620–625.
[17]
M. L. Bernardi, G. Canfora, G. A. Di Lucca, M. Di Penta, and D. Distante, “Do developers introduce bugs when they do not communicate? the case of Eclipse and Mozilla,” in Proc. 16th Eur. Conf. Softw. Maintenance Reengineering, 2012, pp. 139–148.
[18]
F. Rahman, C. Bird, and P. Devanbu, “Clones: What is that smell?” Empirical Softw. Eng., vol. 17, pp. 503–530, 2012.
[19]
G. Canfora, M. Ceccarelli, L. Cerulo, and M. Di Penta, “How long does a bug survive? an empirical study,” in Proc. 18th Working Conf. Reverse Eng., 2011, pp. 191–200.
[20]
J. Ell, “Identifying failure inducing developer pairs within developer networks,” in Proc. 35th Int. Conf. Softw. Eng., 2013, pp. 1471–1473.
[21]
S. Kim, T. Zimmermann, E. J. Whitehead Jr, and A. Zeller, “Predicting faults from cached history,” in Proc. 29th Int. Conf. Softw. Eng., 2007, pp. 489–498.
[22]
D. A. da Costa, U. Kulesza, E. Aranha, and R. Coelho, “Unveiling developers contributions behind code commits: An exploratory study,” in Proc. 29th Annu. ACM Symp. Appl. Comput., 2014, pp. 1152–1157.
[23]
Y. Kamei, S. Matsumoto, A. Monden, K.-I. Matsumoto, B. Adams, and A. E. Hassan, “Revisiting common bug prediction findings using effort-aware models,” in Proc. 26th IEEE Int. Conf. Softw. Maintenance, 2010, pp. 1 –10.
[24]
T. Fukushima, Y. Kamei, S. McIntosh, K. Yamashita, and N. Ubayashi, “An empirical study of just-in-time defect prediction using cross-project models,” in Proc. 11th Working Conf. Mining Softw. Repositories, 2014, pp. 172–181.
[25]
O. Mizuno and H. Hata, “Prediction of fault-prone modules using a text filtering based metric,” Int. J. Softw. Eng. Appl., vol. 4, pp. 43–52, 2010.
[26]
C. Williams and J. Spacco, “SZZ revisited: Verifying when changes induce fixes,” in Proc. Workshop. Defects Large Softw. Syst., 2008, pp. 32 –36.
[27]
S. Davies, M. Roper, and M. Wood, “Comparing text-based and dependence-based approaches for determining the origins of bugs,” J. Softw.: Evolution Process, vol. 26, pp. 107–139, 2014.
[28]
J. Śliwerski, T. Zimmermann, and A. Zeller, “ HATARI: Raising risk awareness,” ACM SIGSOFT Softw. Eng. Notes, vol. 30, pp. 107–110, 2005.
[29]
L. Prechelt and A. Pepper, “Why software repositories are not used for defect-insertion circumstance analysis more often: A case study,” Inf. Softw. Technol., vol. 56, pp. 1377 –1389, 2014.
[30]
T. Zimmermann, S. Kim, A. Zeller, and E. J. Whitehead Jr, “Mining version archives for co-changed lines,” in Proc. Int. Workshop Mining Softw. Repositories , 2006, pp. 72–75.
[31]
C. C. Williams and J. W. Spacco, “Branching and merging in the repository,” in Proc. 5th Int. Working Conf. Mining Softw. Repositories, 2008, pp. 19– 22.
[32]
A. Mockus and L. G. Votta, “Identifying reasons for software changes using historic databases,” in Proc. 16th Int. Conf. Softw. Maintenance, 2000, pp. 120 –130.
[33]
D. Čubranić and G. C. Murphy, “Hipikat: Recommending pertinent software development artifacts,” in Proc. 25th Int. Conf. Softw. Eng., 2003, pp. 408 –418.
[34]
M. Fischer, M. Pinzger, and H. Gall, “Populating a release history database from version control and bug tracking systems,” in Proc. 19th Int. Conf. Softw. Maintenance, 2003, pp. 23 –32.
[35]
M. W. Godfrey and L. Zou, “Using origin analysis to detect merging and splitting of source code entities,” IEEE Trans. Softw. Eng., vol. 31, no. 2, pp. 166–181, Feb. 2005.
[36]
A. E. Hassan and R. C. Holt, “The top ten list: Dynamic fault prediction,” in Proc. 21st Int. Conf. Softw. Maintenance, 2005, pp. 263 –272.
[37]
N. Nagappan and T. Ball, “Use of relative code churn measures to predict system defect density,” in Proc. 27th Int. Conf. Softw. Eng., 2005, pp. 284 –292.
[38]
T. J. Ostrand, E. J. Weyuker, and R. M. Bell, “ Where the bugs are,” ACM SIGSOFT Softw. Eng. Notes, vol. 29, pp. 86–96, 2004.
[39]
S. Shivaji, E. J. Whitehead, R. Akella, and S. Kim, “Reducing features to improve code change-based bug prediction,” IEEE Trans. Softw. Eng., vol. 39, vol. 4, pp. 552–569, Apr. 2013.
[40]
M. Kim, S. Sinha, C. Gorg, H. Shah, M. J. Harrold, and M. G. Nanda, “ Automated bug neighborhood analysis for identifying incomplete bug fixes,” in Proc. 3rd Int. Conf. Softw. Testing Verification Validation, 2010, pp. 383–392.
[41]
V. S. Sinha, S. Sinha, and S. Rao, “BUGINNINGS: Identifying the origins of a bug,” in Proc. 3rd Indian Softw. Eng. Conf., 2010, pp. 3–12.
[42]
O. Alam, B. Adams, and A. E. Hassan, “ Preserving knowledge in software projects,” J. Syst. Softw., vol. 85, pp. 2318–2330, 2012.
[43]
O. Alam, A. Bram, and A. E. Hassan, “ Measuring the progress of projects using the time dependence of code changes,” in Proc. 25th IEEE Int. Conf. Softw. Maintenance, 2009, pp. 329 –338.
[44]
T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy, “Predicting fault incidence using software change history,” IEEE Trans. Softw. Eng., vol. 26, no. 7, pp. 653–661, Jul. 2000 .
[45]
T.-H. Chen, M. Nagappan, E. Shihab, and A. E. Hassan, “An empirical study of dormant bugs,” in Proc. 11th Working Conf. Mining Softw. Repositories, 2014, pp. 82–91.
[46]
D. C. Howell, “Median absolute deviation,” Encyclopedia Statistics Behavioral Sci., 2005.
[47]
C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata, “Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median,” J. Experimental Social Psychology, vol. 49, pp. 764–766, 2013.
[48]
T. Zimmermann, S. Kim, A. Zeller, and E. J. Whitehead Jr, “Mining version archives for co-changed lines. technical report,” [Online]. Available: http://www.st.cs.uni-sb.de/softevo/, Accessed on : Apr. 30, 2016.
[49]
F. Servant and J. A. Jones, “History slicing: Assisting code-evolution tasks,” in Proc. ACM SIGSOFT 20th Int. Symp. Found. Softw. Eng., 2012, Art. no. 43.
[50]
N. Cliff, “Dominance statistics: Ordinal analyses to answer ordinal questions,” Psychological Bulletin, vol. 114, 1993, p. 494.
[51]
K. Charmaz, Constructing Grounded Theory. Newbury Park, CA, USA : Sage, 2014.
[52]
D. Steidl, B. Hummel, and E. Juergens, “ Incremental origin analysis of source code files,” in Proc. 11th Working Conf. Mining Softw. Repositories, 2014, pp. 42–51.
[53]
R. Souza, C. Chavez, and R. Bittencourt, “Rapid releases and patch backouts: A software analytics approach,” IEEE Softw., vol. 32, no. 2, pp. 89–96, Mar./Apr. 2015.
[54]
K. Herzig and A. Zeller, “Mining cause-effect-chains from version histories,” in Proc. 22nd IEEE Int. Symp. Softw.e Rel. Eng., 2011, pp. 60 –69.

Cited By

View all
  • (2024)Understanding Vulnerability Inducing Commits of the Linux KernelACM Transactions on Software Engineering and Methodology10.1145/367245233:7(1-28)Online publication date: 14-Jun-2024
  • (2024)A Formal Explainer for Just-In-Time Defect PredictionsACM Transactions on Software Engineering and Methodology10.1145/366480933:7(1-31)Online publication date: 26-Aug-2024
  • (2024)How Well Industry-Level Cause Bisection Works in Real-World: A Study on Linux KernelCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663828(62-73)Online publication date: 10-Jul-2024
  • Show More Cited By

Index Terms

  1. A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Software Engineering
    IEEE Transactions on Software Engineering  Volume 43, Issue 7
    July 2017
    104 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 July 2017

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Understanding Vulnerability Inducing Commits of the Linux KernelACM Transactions on Software Engineering and Methodology10.1145/367245233:7(1-28)Online publication date: 14-Jun-2024
    • (2024)A Formal Explainer for Just-In-Time Defect PredictionsACM Transactions on Software Engineering and Methodology10.1145/366480933:7(1-31)Online publication date: 26-Aug-2024
    • (2024)How Well Industry-Level Cause Bisection Works in Real-World: A Study on Linux KernelCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663828(62-73)Online publication date: 10-Jul-2024
    • (2024)An Empirical Study on Code Review Activity Prediction and Its Impact in PracticeProceedings of the ACM on Software Engineering10.1145/36608061:FSE(2238-2260)Online publication date: 12-Jul-2024
    • (2024)Fine-Grained Just-In-Time Defect Prediction at the Block Level in Infrastructure-as-Code (IaC)Proceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644934(100-112)Online publication date: 15-Apr-2024
    • (2024)Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644919(716-727)Online publication date: 15-Apr-2024
    • (2024)Code Impact Beyond Disciplinary Boundaries: Constructing a Multidisciplinary Dependency Graph and Analyzing Cross-Boundary ImpactProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639726(122-133)Online publication date: 14-Apr-2024
    • (2024)On the Costs and Benefits of Adopting Lifelong Learning for Software Analytics - Empirical Study on Brown Build and Risk PredictionProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639717(275-286)Online publication date: 14-Apr-2024
    • (2024)Estimating Uncertainty in Labeled Changes by SZZ Tools on Just-In-Time Defect PredictionACM Transactions on Software Engineering and Methodology10.1145/363722633:4(1-25)Online publication date: 18-Apr-2024
    • (2024)Evaluating SZZ Implementations: An Empirical Study on the Linux KernelIEEE Transactions on Software Engineering10.1109/TSE.2024.340671850:9(2219-2239)Online publication date: 29-May-2024
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media