Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3213846.3213856acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Bench4BL: reproducibility study on the performance of IR-based bug localization

Published: 12 July 2018 Publication History

Abstract

In recent years, the use of Information Retrieval (IR) techniques to automate the localization of buggy files, given a bug report, has shown promising results. The abundance of approaches in the literature, however, contrasts with the reality of IR-based bug localization (IRBL) adoption by developers (or even by the research community to complement other research approaches). Presumably, this situation is due to the lack of comprehensive evaluations for state-of-the-art approaches which offer insights into the actual performance of the techniques.
We report on a comprehensive reproduction study of six state-of-the-art IRBL techniques. This study applies not only subjects used in existing studies (old subjects) but also 46 new subjects (61,431 Java files and 9,459 bug reports) to the IRBL techniques. In addition, the study compares two different version matching (between bug reports and source code files) strategies to highlight some observations related to performance deterioration. We also vary test file inclusion to investigate the effectiveness of IRBL techniques on test files, or its noise impact on performance. Finally, we assess potential performance gain if duplicate bug reports are leveraged.

References

[1]
R. Abreu, P. Zoeteweij, and A.J.C. van Gemund. 2007. On the Accuracy of Spectrum-based Fault Localization. In Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION, 2007. TAICPART-MUTATION 2007.
[2]
[3]
Rui Abreu, Peter Zoeteweij, and Arjan J. C. van Gemund. 2006. An Evaluation of Similarity Coefficients for Software Fault Localization. In 12th Pacific Rim International Symposium on Dependable Computing, 2006. PRDC ’06. IEEE, 39–46.
[4]
N. Bettenburg, R. Premraj, T. Zimmermann, and Sunghun Kim. 2008. Duplicate bug reports considered harmful. .. really?. In IEEE International Conference on Software Maintenance, ICSM 2008. IEEE, 337–345.
[5]
T. F. Bissyandé, D. Lo, L. Jiang, L. Réveillère, J. Klein, and Y. L. Traon. 2013. Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub. In 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). 188–197.
[6]
Tegawendé F Bissyandé, Laurent Réveillère, Julia L Lawall, and Gilles Muller. 2012. Diagnosys: automatic generation of a debugging interface to the linux kernel. In Automated Software Engineering (ASE), 2012 Proceedings of the 27th IEEE/ACM International Conference on. IEEE, 60–69.
[7]
Tegawendé F Bissyandé, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang, and Laurent Reveillere. 2013. Empirical evaluation of bug linking. In 17th European Conference on Software Maintenance and Reengineering (CSMR). IEEE, 89–98.
[8]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3 (March 2003), 993–1022.
[9]
Brendan Cleary, Chris Exton, Jim Buckley, and Michael English. 2009. An empirical analysis of information retrieval based concept location techniques in software comprehension. Empirical Software Engineering 14, 1 (01 Feb 2009), 93–130.
[10]
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6 (Sept. 1990), 391–407.
[11]
Nicholas DiGiuseppe and James A. Jones. 2014. Fault density, fault types, and spectra-based fault localization. Empirical Software Engineering 20, 4 (March 2014), 928–967.
[12]
B. Dit, A. Holtzhauer, D. Poshyvanyk, and H. Kagdi. 2013. A dataset from change history to support evaluation of software maintenance tasks. In 2013 10th Working Conference on Mining Software Repositories (MSR). 131–134.
[13]
B. Dit, E. Moritz, M. Linares-Vásquez, and D. Poshyvanyk. 2013. Supporting and Accelerating Reproducible Research in Software Maintenance Using TraceLab Component Library. In 2013 IEEE International Conference on Software Maintenance. 330–339.
[14]
Bogdan Dit, Meghan Revelle, Malcom Gethers, and Denys Poshyvanyk. 2013. Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process 25, 1 (Jan. 2013), 53–95.
[15]
Bogdan Dit, Meghan Revelle, and Denys Poshyvanyk. 2013. Integrating information retrieval, execution and link analysis algorithms to improve feature location in software. Empirical Software Engineering 18, 2 (April 2013), 277–309.
[16]
William B. Frakes and Ricardo Baeza-Yates. 1992. Information Retrieval: Data Structures and Algorithms (1 ed.). Prentice Hall.
[17]
G. Gay, S. Haiduc, A. Marcus, and T. Menzies. 2009. On the use of relevance feedback in IR-based concept location. In 2009 IEEE International Conference on Software Maintenance. 351–360.
[18]
N. Jalbert and W. Weimer. 2008. Automated duplicate detection for bug tracking systems. In IEEE International Conference on Dependable Systems and Networks With FTCS and DCC, DSN 2008. IEEE, 52–61.
[19]
Dongsun Kim, Yida Tao, Sunghun Kim, and A. Zeller. 2013. Where Should We Fix This Bug? A Two-Phase Recommendation Model. IEEE Transactions on Software Engineering 39, 11 (Nov. 2013), 1597–1610.
[20]
Anil Koyuncu, Tegawendé F. Bissyandé, Kui Liu, Dongsun Kim, Jacques Klein, Yves Le Traon, and Martin Monperrus. 2018. D&C: A Divide-and-Conquer, IRbased, Multi-Classifier Approach to Bug Localization. Technical Report TR-2018-SerVAL-01. University of Luxembourg.
[21]
Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon. 2017. Impact of Tool Support in Patch Construction. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2017). ACM, 237–248.
[22]
T. D. B. Le, M. Linares-Vásquez, D. Lo, and D. Poshyvanyk. 2015. RCLinker: Automated Linking of Issue Reports and Commits Leveraging Rich Contextual Information. In 2015 IEEE 23rd International Conference on Program Comprehension. 36–47.
[23]
Tien-Duy B. Le, Ferdian Thung, and David Lo. 2014. Predicting effectiveness of ir-based bug localization techniques. In Software Reliability Engineering (ISSRE), 2014 IEEE 25th International Symposium on. IEEE, 335–345.
[24]
Tien-Duy B. Le, Ferdian Thung, and David Lo. 2016. Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools. Empirical Software Engineering (2016), 1–43.
[25]
Xiaoyong Liu and W. Bruce Croft. 2004. Cluster-based Retrieval Using Language Models. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’04). ACM, New York, NY, USA, 186–193.
[26]
Stacy K. Lukins, Nicholas A. Kraft, and Letha H. Etzkorn. 2010. Bug localization using latent Dirichlet allocation. Information and Software Technology 52, 9 (Sept. 2010), 972–990.
[27]
H. B. Mann. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18, 1 (March 1947), 50–60.
[28]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval (1 edition ed.). Cambridge University Press, New York.
[29]
Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing (1 edition ed.). The MIT Press, Cambridge, Mass.
[30]
L. Moreno, J. J. Treadway, A. Marcus, and Wuwei Shen. 2014. On the Use of Stack Traces to Improve Text Retrieval-Based Bug Localization. In 2014 IEEE International Conference on Software Maintenance and Evolution (ICSME). 151– 160.
[31]
Anh Tuan Nguyen, Tung Thanh Nguyen, J. Al-Kofahi, Hung Viet Nguyen, and T. N. Nguyen. 2011. A topic-based approach for narrowing the search space of buggy files from a bug report. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 263–272.
[32]
Anh Tuan Nguyen, Tung Thanh Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2012. Multi-layered approach for recovering links between bug reports and fixes. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM, New York, NY, USA, 63:1–63:11.
[33]
A. Panichella, B. Dit, R. Oliveto, M. D. Penta, D. Poshyvanyk, and A. D. Lucia. 2016. Parameterizing and Assembling IR-Based Solutions for SE Tasks Using Genetic Algorithms. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. 314–325.
[34]
Chris Parnin and Alessandro Orso. 2011. Are automated debugging techniques actually helping programmers?. In Proceedings of the International Symposium on Software Testing and Analysis. 199–209.
[35]
D. Poshyvanyk, Y. G. Gueheneuc, A. Marcus, G. Antoniol, and V. Rajlich. 2007. Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval. IEEE Transactions on Software Engineering 33, 6 (June 2007), 420–432.
[36]
Shivani Rao and Avinash Kak. 2011. Retrieval from Software Libraries for Bug Localization: A Comparative Study of Generic and Composite Text Models. In Proceedings of the 8th Working Conference on Mining Software Repositories. ACM, New York, NY, USA, 43–52.
[37]
Per Runeson, Magnus Alexandersson, and Oskar Nyholm. 2007. Detection of Duplicate Defect Reports Using Natural Language Processing. In Proceedings of the 29th International Conference on Software Engineering, ICSE 2007 (ICSE ’07). IEEE Computer Society, Washington, DC, USA, 499–510.
[38]
R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry. 2013. Improving bug localization using structured information retrieval. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 345–355.
[39]
Gerard Salton and Michael J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA.
[40]
T Strohman, D Metzler, H Turtle, and WB Croft. 2004. Indri: A language modelbased search engine for complex queries. In Proceedings of the International Conference on Intelligence Analysis. 2–6.
[41]
Chengnian Sun, David Lo, Xiaoyin Wang, Jing Jiang, and Siau-Cheng Khoo. 2010. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE 2010. ACM, Cape Town, South Africa, 45–54. ACM ID: 1806811.
[42]
A. Sureka and P. Jalote. 2010. Detecting Duplicate Bug Report Using Character N-Gram-Based Features. In 17th Asia Pacific Software Engineering Conference, APSEC 2010. IEEE, 366–374.
[43]
G. Tassey. 2002. The Economic Impacts of Inadequate Infrastructure for Software Testing: Final Report. Diane Publishing Company. https://books.google.lu/books? id=juSgPAAACAAJ
[44]
Qianqian Wang, Chris Parnin, and Alessandro Orso. 2015. Evaluating the usefulness of IR-based fault localization techniques. In Proceedings of the 2015 International Symposium on Software Testing and Analysis. ACM, 1–11.
[45]
Shaowei Wang and David Lo. 2014. Version History, Similar Report, and Structure: Putting Them Together for Improved Bug Localization. In Proceedings of the 22Nd International Conference on Program Comprehension. ACM, New York, NY, USA, 53–63.
[46]
Ming Wen, Rongxin Wu, and Shing-Chi Cheung. 2016. Locus: locating bugs from software changes. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, Singapore, Singapore, 262–273.
[47]
C. P. Wong, Y. Xiong, H. Zhang, D. Hao, L. Zhang, and H. Mei. 2014. Boosting Bug-Report-Oriented Fault Localization with Segmentation and Stack-Trace Analysis. In 2014 IEEE International Conference on Software Maintenance and Evolution. ISSTA’18, July 16–21, 2018, Amsterdam, Netherlands J. Lee, D. Kim, T. F. Bissyandé, W. Jung, and Y. Le Traon 181–190.
[48]
W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa. 2016. A Survey on Software Fault Localization. IEEE Transactions on Software Engineering 42, 8 (Aug. 2016), 707–740.
[49]
Rongxin Wu, Hongyu Zhang, Shing-Chi Cheung, and Sunghun Kim. 2014. CrashLocator: Locating Crashing Faults Based on Crash Stacks. In Proceedings of the 2014 International Symposium on Software Testing and Analysis. ACM, New York, NY, USA, 204–214.
[50]
Rongxin Wu, Hongyu Zhang, Sunghun Kim, and Shing-Chi Cheung. 2011. ReLink: recovering links between bugs and changes. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, New York, NY, USA, 15–25.
[51]
Min Xie and Bo Yang. 2003. A study of the effect of imperfect debugging on software development cost. IEEE Transactions on Software Engineering 29, 5 (2003), 471–473.
[52]
X. Ye, R. Bunescu, and C. Liu. 2016. Mapping Bug Reports to Relevant Files: A Ranking Model, a Fine-Grained Benchmark, and Feature Evaluation. IEEE Transactions on Software Engineering 42, 4 (April 2016), 379–402.
[53]
K. C. Youm, J. Ahn, J. Kim, and E. Lee. 2015. Bug Localization Based on Code Change Histories and Bug Reports. In 2015 Asia-Pacific Software Engineering Conference (APSEC). 190–197.

Cited By

View all
  • (2024)Toward the Automated Localization of Buggy Mobile App UIs from Bug DescriptionsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680357(1249-1261)Online publication date: 11-Sep-2024
  • (2024)Aligning Programming Language and Natural Language: Exploring Design Choices in Multi-Modal Transformer-Based Embedding for Bug LocalizationProceedings of the Third ACM/IEEE International Workshop on NL-based Software Engineering10.1145/3643787.3648028(1-8)Online publication date: 20-Apr-2024
  • (2024)RLocator: Reinforcement Learning for Bug LocalizationIEEE Transactions on Software Engineering10.1109/TSE.2024.345259550:10(2695-2708)Online publication date: 1-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2018
379 pages
ISBN:9781450356992
DOI:10.1145/3213846
  • General Chair:
  • Frank Tip,
  • Program Chair:
  • Eric Bodden
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Reproducibility studies
  2. bug localization
  3. information retrieval

Qualifiers

  • Research-article

Funding Sources

Conference

ISSTA '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)105
  • Downloads (Last 6 weeks)9
Reflects downloads up to 31 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Toward the Automated Localization of Buggy Mobile App UIs from Bug DescriptionsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680357(1249-1261)Online publication date: 11-Sep-2024
  • (2024)Aligning Programming Language and Natural Language: Exploring Design Choices in Multi-Modal Transformer-Based Embedding for Bug LocalizationProceedings of the Third ACM/IEEE International Workshop on NL-based Software Engineering10.1145/3643787.3648028(1-8)Online publication date: 20-Apr-2024
  • (2024)RLocator: Reinforcement Learning for Bug LocalizationIEEE Transactions on Software Engineering10.1109/TSE.2024.345259550:10(2695-2708)Online publication date: 1-Oct-2024
  • (2024)Predictive Reranking using Code Smells for Information Retrieval Fault Localization2024 IEEE 22nd World Symposium on Applied Machine Intelligence and Informatics (SAMI)10.1109/SAMI60510.2024.10432857(000277-000282)Online publication date: 25-Jan-2024
  • (2024)Query Quality Prediction for Text Retrieval-based Bug Localization2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00041(340-351)Online publication date: 1-Jul-2024
  • (2024)Multi-View Feature Fusion Model for Software Bug Repair Pattern PredictionWuhan University Journal of Natural Sciences10.1051/wujns/202328649328:6(493-507)Online publication date: 15-Jan-2024
  • (2024)An extensive replication study of the ABLoTS approach for bug localizationEmpirical Software Engineering10.1007/s10664-024-10537-629:6Online publication date: 24-Aug-2024
  • (2023)UniLoc: Unified Fault Localization of Continuous Integration FailuresACM Transactions on Software Engineering and Methodology10.1145/359379932:6(1-31)Online publication date: 8-May-2023
  • (2023)Code-line-level Bugginess Identification: How Far have We Come, and How Far have We Yet to Go?ACM Transactions on Software Engineering and Methodology10.1145/358257232:4(1-55)Online publication date: 27-May-2023
  • (2023)Impact analysis of bug localization accuracy oriented to bug reportSixth International Conference on Advanced Electronic Materials, Computers, and Software Engineering (AEMCSE 2023)10.1117/12.3004582(42)Online publication date: 16-Aug-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media