research-article

Bench4BL: reproducibility study on the performance of IR-based bug localization

Authors:

Tegawendé F. Bissyandé,

Yves Le TraonAuthors Info & Claims

ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 61 - 72

https://doi.org/10.1145/3213846.3213856

Published: 12 July 2018 Publication History

Abstract

In recent years, the use of Information Retrieval (IR) techniques to automate the localization of buggy files, given a bug report, has shown promising results. The abundance of approaches in the literature, however, contrasts with the reality of IR-based bug localization (IRBL) adoption by developers (or even by the research community to complement other research approaches). Presumably, this situation is due to the lack of comprehensive evaluations for state-of-the-art approaches which offer insights into the actual performance of the techniques.

We report on a comprehensive reproduction study of six state-of-the-art IRBL techniques. This study applies not only subjects used in existing studies (old subjects) but also 46 new subjects (61,431 Java files and 9,459 bug reports) to the IRBL techniques. In addition, the study compares two different version matching (between bug reports and source code files) strategies to highlight some observations related to performance deterioration. We also vary test file inclusion to investigate the effectiveness of IRBL techniques on test files, or its noise impact on performance. Finally, we assess potential performance gain if duplicate bug reports are leveraged.

References

[1]

R. Abreu, P. Zoeteweij, and A.J.C. van Gemund. 2007. On the Accuracy of Spectrum-based Fault Localization. In Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION, 2007. TAICPART-MUTATION 2007.

Digital Library

[2]

89 –98.

[3]

Rui Abreu, Peter Zoeteweij, and Arjan J. C. van Gemund. 2006. An Evaluation of Similarity Coefficients for Software Fault Localization. In 12th Pacific Rim International Symposium on Dependable Computing, 2006. PRDC ’06. IEEE, 39–46.

Digital Library

[4]

N. Bettenburg, R. Premraj, T. Zimmermann, and Sunghun Kim. 2008. Duplicate bug reports considered harmful. .. really?. In IEEE International Conference on Software Maintenance, ICSM 2008. IEEE, 337–345.

[5]

T. F. Bissyandé, D. Lo, L. Jiang, L. Réveillère, J. Klein, and Y. L. Traon. 2013. Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub. In 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). 188–197.

[6]

Tegawendé F Bissyandé, Laurent Réveillère, Julia L Lawall, and Gilles Muller. 2012. Diagnosys: automatic generation of a debugging interface to the linux kernel. In Automated Software Engineering (ASE), 2012 Proceedings of the 27th IEEE/ACM International Conference on. IEEE, 60–69.

Digital Library

[7]

Tegawendé F Bissyandé, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang, and Laurent Reveillere. 2013. Empirical evaluation of bug linking. In 17th European Conference on Software Maintenance and Reengineering (CSMR). IEEE, 89–98.

Digital Library

[8]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3 (March 2003), 993–1022.

Digital Library

[9]

Brendan Cleary, Chris Exton, Jim Buckley, and Michael English. 2009. An empirical analysis of information retrieval based concept location techniques in software comprehension. Empirical Software Engineering 14, 1 (01 Feb 2009), 93–130.

Digital Library

[10]

Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6 (Sept. 1990), 391–407.

[11]

Nicholas DiGiuseppe and James A. Jones. 2014. Fault density, fault types, and spectra-based fault localization. Empirical Software Engineering 20, 4 (March 2014), 928–967.

Digital Library

[12]

B. Dit, A. Holtzhauer, D. Poshyvanyk, and H. Kagdi. 2013. A dataset from change history to support evaluation of software maintenance tasks. In 2013 10th Working Conference on Mining Software Repositories (MSR). 131–134.

Digital Library

[13]

B. Dit, E. Moritz, M. Linares-Vásquez, and D. Poshyvanyk. 2013. Supporting and Accelerating Reproducible Research in Software Maintenance Using TraceLab Component Library. In 2013 IEEE International Conference on Software Maintenance. 330–339.

Digital Library

[14]

Bogdan Dit, Meghan Revelle, Malcom Gethers, and Denys Poshyvanyk. 2013. Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process 25, 1 (Jan. 2013), 53–95.

[15]

Bogdan Dit, Meghan Revelle, and Denys Poshyvanyk. 2013. Integrating information retrieval, execution and link analysis algorithms to improve feature location in software. Empirical Software Engineering 18, 2 (April 2013), 277–309.

Digital Library

[16]

William B. Frakes and Ricardo Baeza-Yates. 1992. Information Retrieval: Data Structures and Algorithms (1 ed.). Prentice Hall.

Digital Library

[17]

G. Gay, S. Haiduc, A. Marcus, and T. Menzies. 2009. On the use of relevance feedback in IR-based concept location. In 2009 IEEE International Conference on Software Maintenance. 351–360.

[18]

N. Jalbert and W. Weimer. 2008. Automated duplicate detection for bug tracking systems. In IEEE International Conference on Dependable Systems and Networks With FTCS and DCC, DSN 2008. IEEE, 52–61.

[19]

Dongsun Kim, Yida Tao, Sunghun Kim, and A. Zeller. 2013. Where Should We Fix This Bug? A Two-Phase Recommendation Model. IEEE Transactions on Software Engineering 39, 11 (Nov. 2013), 1597–1610.

Digital Library

[20]

Anil Koyuncu, Tegawendé F. Bissyandé, Kui Liu, Dongsun Kim, Jacques Klein, Yves Le Traon, and Martin Monperrus. 2018. D&C: A Divide-and-Conquer, IRbased, Multi-Classifier Approach to Bug Localization. Technical Report TR-2018-SerVAL-01. University of Luxembourg.

[21]

Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon. 2017. Impact of Tool Support in Patch Construction. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2017). ACM, 237–248.

Digital Library

[22]

T. D. B. Le, M. Linares-Vásquez, D. Lo, and D. Poshyvanyk. 2015. RCLinker: Automated Linking of Issue Reports and Commits Leveraging Rich Contextual Information. In 2015 IEEE 23rd International Conference on Program Comprehension. 36–47.

Digital Library

[23]

Tien-Duy B. Le, Ferdian Thung, and David Lo. 2014. Predicting effectiveness of ir-based bug localization techniques. In Software Reliability Engineering (ISSRE), 2014 IEEE 25th International Symposium on. IEEE, 335–345.

Digital Library

[24]

Tien-Duy B. Le, Ferdian Thung, and David Lo. 2016. Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools. Empirical Software Engineering (2016), 1–43.

Digital Library

[25]

Xiaoyong Liu and W. Bruce Croft. 2004. Cluster-based Retrieval Using Language Models. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’04). ACM, New York, NY, USA, 186–193.

Digital Library

[26]

Stacy K. Lukins, Nicholas A. Kraft, and Letha H. Etzkorn. 2010. Bug localization using latent Dirichlet allocation. Information and Software Technology 52, 9 (Sept. 2010), 972–990.

Digital Library

[27]

H. B. Mann. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18, 1 (March 1947), 50–60.

[28]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval (1 edition ed.). Cambridge University Press, New York.

Digital Library

[29]

Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing (1 edition ed.). The MIT Press, Cambridge, Mass.

Digital Library

[30]

L. Moreno, J. J. Treadway, A. Marcus, and Wuwei Shen. 2014. On the Use of Stack Traces to Improve Text Retrieval-Based Bug Localization. In 2014 IEEE International Conference on Software Maintenance and Evolution (ICSME). 151– 160.

Digital Library

[31]

Anh Tuan Nguyen, Tung Thanh Nguyen, J. Al-Kofahi, Hung Viet Nguyen, and T. N. Nguyen. 2011. A topic-based approach for narrowing the search space of buggy files from a bug report. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 263–272.

Digital Library

[32]

Anh Tuan Nguyen, Tung Thanh Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2012. Multi-layered approach for recovering links between bug reports and fixes. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM, New York, NY, USA, 63:1–63:11.

Digital Library

[33]

A. Panichella, B. Dit, R. Oliveto, M. D. Penta, D. Poshyvanyk, and A. D. Lucia. 2016. Parameterizing and Assembling IR-Based Solutions for SE Tasks Using Genetic Algorithms. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. 314–325.

[34]

Chris Parnin and Alessandro Orso. 2011. Are automated debugging techniques actually helping programmers?. In Proceedings of the International Symposium on Software Testing and Analysis. 199–209.

Digital Library

[35]

D. Poshyvanyk, Y. G. Gueheneuc, A. Marcus, G. Antoniol, and V. Rajlich. 2007. Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval. IEEE Transactions on Software Engineering 33, 6 (June 2007), 420–432.

Digital Library

[36]

Shivani Rao and Avinash Kak. 2011. Retrieval from Software Libraries for Bug Localization: A Comparative Study of Generic and Composite Text Models. In Proceedings of the 8th Working Conference on Mining Software Repositories. ACM, New York, NY, USA, 43–52.

Digital Library

[37]

Per Runeson, Magnus Alexandersson, and Oskar Nyholm. 2007. Detection of Duplicate Defect Reports Using Natural Language Processing. In Proceedings of the 29th International Conference on Software Engineering, ICSE 2007 (ICSE ’07). IEEE Computer Society, Washington, DC, USA, 499–510.

Digital Library

[38]

R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry. 2013. Improving bug localization using structured information retrieval. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 345–355.

Digital Library

[39]

Gerard Salton and Michael J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA.

Digital Library

[40]

T Strohman, D Metzler, H Turtle, and WB Croft. 2004. Indri: A language modelbased search engine for complex queries. In Proceedings of the International Conference on Intelligence Analysis. 2–6.

[41]

Chengnian Sun, David Lo, Xiaoyin Wang, Jing Jiang, and Siau-Cheng Khoo. 2010. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE 2010. ACM, Cape Town, South Africa, 45–54. ACM ID: 1806811.

Digital Library

[42]

A. Sureka and P. Jalote. 2010. Detecting Duplicate Bug Report Using Character N-Gram-Based Features. In 17th Asia Pacific Software Engineering Conference, APSEC 2010. IEEE, 366–374.

Digital Library

[43]

G. Tassey. 2002. The Economic Impacts of Inadequate Infrastructure for Software Testing: Final Report. Diane Publishing Company. https://books.google.lu/books? id=juSgPAAACAAJ

[44]

Qianqian Wang, Chris Parnin, and Alessandro Orso. 2015. Evaluating the usefulness of IR-based fault localization techniques. In Proceedings of the 2015 International Symposium on Software Testing and Analysis. ACM, 1–11.

Digital Library

[45]

Shaowei Wang and David Lo. 2014. Version History, Similar Report, and Structure: Putting Them Together for Improved Bug Localization. In Proceedings of the 22Nd International Conference on Program Comprehension. ACM, New York, NY, USA, 53–63.

Digital Library

[46]

Ming Wen, Rongxin Wu, and Shing-Chi Cheung. 2016. Locus: locating bugs from software changes. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, Singapore, Singapore, 262–273.

Digital Library

[47]

C. P. Wong, Y. Xiong, H. Zhang, D. Hao, L. Zhang, and H. Mei. 2014. Boosting Bug-Report-Oriented Fault Localization with Segmentation and Stack-Trace Analysis. In 2014 IEEE International Conference on Software Maintenance and Evolution. ISSTA’18, July 16–21, 2018, Amsterdam, Netherlands J. Lee, D. Kim, T. F. Bissyandé, W. Jung, and Y. Le Traon 181–190.

Digital Library

[48]

W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa. 2016. A Survey on Software Fault Localization. IEEE Transactions on Software Engineering 42, 8 (Aug. 2016), 707–740.

Digital Library

[49]

Rongxin Wu, Hongyu Zhang, Shing-Chi Cheung, and Sunghun Kim. 2014. CrashLocator: Locating Crashing Faults Based on Crash Stacks. In Proceedings of the 2014 International Symposium on Software Testing and Analysis. ACM, New York, NY, USA, 204–214.

Digital Library

[50]

Rongxin Wu, Hongyu Zhang, Sunghun Kim, and Shing-Chi Cheung. 2011. ReLink: recovering links between bugs and changes. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, New York, NY, USA, 15–25.

Digital Library

[51]

Min Xie and Bo Yang. 2003. A study of the effect of imperfect debugging on software development cost. IEEE Transactions on Software Engineering 29, 5 (2003), 471–473.

Digital Library

[52]

X. Ye, R. Bunescu, and C. Liu. 2016. Mapping Bug Reports to Relevant Files: A Ranking Model, a Fine-Grained Benchmark, and Feature Evaluation. IEEE Transactions on Software Engineering 42, 4 (April 2016), 379–402.

Digital Library

[53]

K. C. Youm, J. Ahn, J. Kim, and E. Lee. 2015. Bug Localization Based on Code Change Histories and Bug Reports. In 2015 Asia-Pacific Software Engineering Conference (APSEC). 190–197.

Cited By

Saha ASong YMahmud JZhou YMoran KChaparro OChristakis MPradel M(2024)Toward the Automated Localization of Buggy Mobile App UIs from Bug DescriptionsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680357(1249-1261)Online publication date: 11-Sep-2024
https://doi.org/10.1145/3650212.3680357
Chakraborty PArumugam VNagappan MIzadi MDi Sorbo APanichella S(2024)Aligning Programming Language and Natural Language: Exploring Design Choices in Multi-Modal Transformer-Based Embedding for Bug LocalizationProceedings of the Third ACM/IEEE International Workshop on NL-based Software Engineering10.1145/3643787.3648028(1-8)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643787.3648028
Chakraborty PAlfadel MNagappan M(2024)RLocator: Reinforcement Learning for Bug LocalizationIEEE Transactions on Software Engineering10.1109/TSE.2024.345259550:10(2695-2708)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3452595
Show More Cited By

Index Terms

Bench4BL: reproducibility study on the performance of IR-based bug localization
1. Information systems
  1. Information systems applications
    1. Collaborative and social computing systems and tools
      1. Open source software
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Maintaining software
      2. Software evolution
    2. Software verification and validation
      1. Empirical software validation
      2. Software defect analysis
        Software testing and debugging

Recommendations

Improving IR-based bug localization with context-aware query reformulation
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Recent findings suggest that Information Retrieval (IR)-based bug localization techniques do not perform well if the bug report lacks rich structured information (e.g., relevant program entity names). Conversely, excessive structured information (e.g., ...
Multi-level reranking approach for bug localization

Bug fixing has a key role in software quality evaluation. Bug fixing starts with the bug localization step, in which developers use textual bug information to find location of source codes which have the bug. Bug localization is a tedious and time ...
Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools

Information retrieval (IR) based bug localization approaches process a textual bug report and a collection of source code files to find buggy files. They output a ranked list of files sorted by their likelihood to contain the bug. Recently, several IR-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 2018

379 pages

ISBN:9781450356992

DOI:10.1145/3213846

General Chair:
Frank Tip
Northeastern University, USA
,
Program Chair:
Eric Bodden
University of Paderborn, Germany / Fraunhofer IEM, Germany

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ISSTA '18

Sponsor:

SIGSOFT

ISSTA '18: International Symposium on Software Testing and Analysis

July 16 - 21, 2018

Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

50
Total Citations
View Citations
787
Total Downloads

Downloads (Last 12 months)105
Downloads (Last 6 weeks)9

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Saha ASong YMahmud JZhou YMoran KChaparro OChristakis MPradel M(2024)Toward the Automated Localization of Buggy Mobile App UIs from Bug DescriptionsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680357(1249-1261)Online publication date: 11-Sep-2024
https://doi.org/10.1145/3650212.3680357
Chakraborty PArumugam VNagappan MIzadi MDi Sorbo APanichella S(2024)Aligning Programming Language and Natural Language: Exploring Design Choices in Multi-Modal Transformer-Based Embedding for Bug LocalizationProceedings of the Third ACM/IEEE International Workshop on NL-based Software Engineering10.1145/3643787.3648028(1-8)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643787.3648028
Chakraborty PAlfadel MNagappan M(2024)RLocator: Reinforcement Learning for Bug LocalizationIEEE Transactions on Software Engineering10.1109/TSE.2024.345259550:10(2695-2708)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3452595
Hirsch THofer B(2024)Predictive Reranking using Code Smells for Information Retrieval Fault Localization2024 IEEE 22nd World Symposium on Applied Machine Intelligence and Informatics (SAMI)10.1109/SAMI60510.2024.10432857(000277-000282)Online publication date: 25-Jan-2024
https://doi.org/10.1109/SAMI60510.2024.10432857
Liu WZou WChen BCai BZhang J(2024)Query Quality Prediction for Text Retrieval-based Bug Localization2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00041(340-351)Online publication date: 1-Jul-2024
https://doi.org/10.1109/QRS62785.2024.00041
XU YCHENG M(2024)Multi-View Feature Fusion Model for Software Bug Repair Pattern PredictionWuhan University Journal of Natural Sciences10.1051/wujns/202328649328:6(493-507)Online publication date: 15-Jan-2024
https://doi.org/10.1051/wujns/2023286493
Niu FZhang EMayr-Dorn CAssunção WHuang LGe JLuo BEgyed A(2024)An extensive replication study of the ABLoTS approach for bug localizationEmpirical Software Engineering10.1007/s10664-024-10537-629:6Online publication date: 24-Aug-2024
https://dl.acm.org/doi/10.1007/s10664-024-10537-6
Hassan FMeng NWang X(2023)UniLoc: Unified Fault Localization of Continuous Integration FailuresACM Transactions on Software Engineering and Methodology10.1145/359379932:6(1-31)Online publication date: 8-May-2023
https://dl.acm.org/doi/10.1145/3593799
Guo ZLiu SLiu XLai WMa MZhang XNi CYang YLi YChen LZhou GZhou Y(2023)Code-line-level Bugginess Identification: How Far have We Come, and How Far have We Yet to Go?ACM Transactions on Software Engineering and Methodology10.1145/358257232:4(1-55)Online publication date: 27-May-2023
https://dl.acm.org/doi/10.1145/3582572
zhao YLi XTian QDeng WLi YZhang YSong J(2023)Impact analysis of bug localization accuracy oriented to bug reportSixth International Conference on Advanced Electronic Materials, Computers, and Software Engineering (AEMCSE 2023)10.1117/12.3004582(42)Online publication date: 16-Aug-2023
https://doi.org/10.1117/12.3004582
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten