Duplicate Bug Report detection using Named Entity Recognition
References
[1]
C. Sun, D. Lo, X. Wang, J. Jing, S.C. Khoo, A discriminative model approach for accurate duplicate bug report retrieval, in: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE 2010, Cape Town, South Africa, 1-8 May 2010, 2010.
[2]
J. He, L. Xu, M. Yan, X. Xia, Y. Lei, Duplicate bug report detection using dual-channel convolutional neural networks, in: Proceedings of the 28th International Conference on Program Comprehension, 2020, pp. 117–127.
[3]
Bugzilla, 2022, https://www.bugzilla.org/.
[4]
[5]
LaunchPad, 2022, https://bugs.launchpad.net/.
[6]
Wang J., Yang Y., Menzies T., Wang Q., isense2. 0: Improving completion-aware crowdtesting management with duplicate tagger and sanity checker, ACM Trans. Softw. Eng. Methodol. (TOSEM) 29 (4) (2020) 1–27.
[7]
Rakha M.S., Bezemer C.-P., Hassan A.E., Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval, Empir. Softw. Eng. 23 (5) (2018) 2597–2621.
[8]
Rakha M.S., Bezemer C.-P., Hassan A., Revisiting the performance evaluation of automated approaches for the retrieval of duplicate issue reports, IEEE Trans. Softw. Eng. 44 (12) (2017) 1245–1268.
[9]
Q. Xie, Z. Wen, J. Zhu, C. Gao, Z. Zheng, Detecting Duplicate Bug Reports with Convolutional Neural Networks, in: 2018 25th Asia-Pacific Software Engineering Conference, APSEC, 2019.
[10]
Meng-Jie Q., Lin Z., Cheng-Zen J., Yang C., Chao-Yuan Z., Lee, Chun-Chang, Chen, Enhancements for duplication detection in bug reports with manifold correlation features, J. Syst. Softw. (2016).
[11]
Nguyen A.T., Nguyen T.T., Nguyen T.N., Lo D., Sun C., Duplicate bug report detection with a combination of information retrieval and topic modeling, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, IEEE, 2012, pp. 70–79.
[12]
D. Hu, M. Chen, T. Wang, J. Chang, Y. Zhang, Recommending Similar Bug Reports: A Novel Approach Using Document Embedding Model, in: 2018 25th Asia-Pacific Software Engineering Conference, APSEC, 2018.
[13]
Lerch J., Mezini M., Finding duplicates of your yet unwritten bug report, in: 2013 17th European Conference on Software Maintenance and Reengineering, IEEE, 2013, pp. 69–78.
[14]
R.P. Gopalan, A. Krishna, Duplicate Bug Report Detection Using Clustering, in: 2014 23rd Australian Software Engineering Conference, ASWEC, 2014.
[15]
Huang S., Chen L., Hui Z., Liu J., Yang S., Chen Q., A method of bug report quality detection based on vector space model, in: 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion, QRS-C, IEEE, 2019, pp. 510–511.
[16]
Sun C., Lo D., Khoo S.-C., Jiang J., Towards more accurate retrieval of duplicate bug reports, in: 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE 2011, IEEE, 2011, pp. 253–262.
[17]
A. Budhiraja, K. Dutta, R. Reddy, M. Shrivastava, DWEN: deep word embedding network for duplicate bug report detection in software repositories, in: 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion Proceedings, ICSE-Companion, 2018.
[18]
Song Y., Wang X., Xie T., Zhang L., Mei H., JDF: detecting duplicate bug reports in Jazz, in: 2010 ACM/IEEE 32nd International Conference on Software Engineering, Vol. 2, IEEE, 2010, pp. 315–316.
[19]
Wu X., Zheng W., Xia X., Lo D., Data quality matters: A case study on data label correctness for security bug report prediction, IEEE Trans. Softw. Eng. 48 (7) (2022) 2541–2556.
[20]
Fan Y., Xin X., David L., Hassan A.E., Chaff from the wheat: Characterizing and determining valid bug reports, IEEE Trans. Softw. Eng. PP (2018) 1.
[21]
Wu X., Zheng W., Chen X., Wang F., Mu D., CVE-assisted large-scale security bug report dataset construction method, J. Syst. Softw. 160 (2020).
[22]
Zheng Z., Li C., Liu Y., Xi Z., A phase-type expansion approach for the performability of composite web services, IEEE Trans. Reliab. 71 (2) (2022) 579–589.
[23]
O. Chaparro, Improving Bug Reporting, Duplicate Detection, and Localization, in: IEEE/ACM International Conference on Software Engineering Companion, 2017.
[24]
Chaparro O., Florez J.M., Singh U., Marcus A., Reformulating queries for duplicate bug report detection, in: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering, SANER, IEEE, 2019, pp. 218–229.
[25]
Ye D., Xing Z., Foo C.Y., Ang Z.Q., Li J., Kapre N., Software-specific named entity recognition in software engineering social content, in: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, Vol. 1, SANER, IEEE, 2016, pp. 90–101.
[26]
J. Lafferty, A. McCallum, F.C. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, in: Proceedings of ICML, 2001.
[27]
S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, P.H. Torr, Conditional Random Fields as Recurrent Neural Networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1529–1537.
[28]
Shao Y., Lin J.C.-W., Srivastava G., Jolfaei A., Guo D., Hu Y., Self-attention-based conditional random fields latent variables model for sequence labeling, Pattern Recognit. Lett. 145 (2021) 157–164.
[29]
Li J., Sun A., Han J., Li C., A survey on deep learning for named entity recognition, IEEE Trans. Knowl. Data Eng. 34 (1) (2020) 50–70.
[30]
Dong C., Zhang J., Zong C., Hattori M., Di H., Character-based LSTM-CRF with radical-level features for Chinese named entity recognition, in: Natural Language Understanding and Intelligent Applications, Springer, 2016, pp. 239–250.
[31]
X. Zhang, C. Li, H. Du, Named Entity Recognition for Terahertz Domain Knowledge Graph based on Albert-BiLSTM-CRF, in: 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference, ITNEC, 2020.
[32]
Ren X., Ye X., Xing Z., Xia X., Xu X., Zhu L., Sun J., API-misuse detection driven by fine-grained API-constraint knowledge graph, in: 2020 35th IEEE/ACM International Conference on Automated Software Engineering, ASE, IEEE, 2020, pp. 461–472.
[33]
Sun J., Xing Z., Peng X., Xu X., Zhu L., Task-oriented api usage examples prompting powered by programming task knowledge graph, in: 2021 IEEE International Conference on Software Maintenance and Evolution, ICSME, IEEE, 2021, pp. 448–459.
[34]
Zheng Z., Trivedi K.S., Wang N., Qiu K., Markov regenerative models of webservers for their user-perceived availability and bottlenecks, IEEE Trans. Dependable Secure Comput. 17 (1) (2017) 92–105.
[35]
K.K. Sabor, A. Hamou-Lhadj, A. Larsson, DURFEX: A Feature Extraction Technique for Efficient Detection of Duplicate Bug Reports, in: 2017 IEEE International Conference on Software Quality, Reliability and Security, QRS, 2017.
[36]
Nguyen H.T., Duong P.H., Cambria E., Learning short-text semantic similarity with word embeddings and external knowledge sources, Knowl.-Based Syst. 182 (2019).
[37]
Wang J., Dong Y., Measurement of text similarity: A survey, Information 11 (9) (2020) 421.
[38]
W. Lu, X. Sun, J. Wang, Y. Duan, B. Li, Construct Bug Knowledge Graph for Bug Resolution, in: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion, ICSE-C, 2017.
[39]
Zheng W., Cheng J., Wu X., Sun R., Wang X., Sun X., Domain knowledge-based security bug reports prediction, Knowl.-Based Syst. 241 (2022).
[40]
J. Deshmukh, K.M. Annervaz, S. Podder, S. Sengupta, N. Dubash, Towards Accurate Duplicate Bug Retrieval Using Deep Learning Techniques, in: IEEE International Conference on Software Maintenance and Evolution, 2017.
[41]
Peters F., Tun T., Yu Y., Nuseibeh B., Text filtering and ranking for security bug report prediction, IEEE Trans. Softw. Eng. PP (99) (2017) 1.
[42]
Ye D., Xing Z., Kapre N., The structure and dynamics of knowledge network in domain-specific q&a sites: a case study of stack overflow, Empir. Softw. Eng. 22 (1) (2017) 375–406.
[43]
D. Mu, Y. Wu, Y. Chen, Z. Lin, C. Yu, X. Xing, G. Wang, An In-depth Analysis of Duplicated Linux Kernel Bug Reports, in: Network and Distributed Systems Security (NDSS) Symposium 2022, 2022.
[44]
Fan Y., Xia X., Da Costa D.A., Lo D., Hassan A.E., Li S., The impact of mislabeled changes by SZZ on just-in-time defect prediction, IEEE Trans. Softw. Eng. 47 (8) (2019) 1559–1586.
[45]
Cohen J., A coefficient of agreement for nominal scales, Educ. Psychol. Meas. 20 (1) (1960) 37–46.
[46]
Manning C.D., Raghavan P., Schütze H., Introduction to Information Retrieval, 2010.
[47]
Breiman C.D., Random forests, Mach. Learn 45 (1) (2001) 5–32.
[48]
Yang B., Huang C., Nevatia R., Learning affinities and dependencies for multi-target tracking using a CRF model, in: The 24th IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2011, pp. 1233–1240.
[49]
Nasar Z., Jaffry S.W., Malik M.K., Named entity recognition and relation extraction: State-of-the-art, ACM Comput. Surv. 54 (1) (2021) 1–39.
[50]
Zhang T., Han D., Vinayakarao V., Irsan I.C., Xu B., Thung F., Lo D., Jiang L., Duplicate bug report detection: How far are we?, ACM Trans. Softw. Eng. Methodol. 32 (4) (2023) 1–32.
[51]
Kucuk B., Hanhan I., Tuzun E., Characterizing duplicate bugs: Perceptions of practitioners and an empirical analysis, J. Softw. Evol. Process (2022).
[52]
A. Lazar, S. Ritchey, B. Sharif, Generating duplicate bug datasets, in: Proceedings of the 11th Working Conference on Mining Software Repositories, 2014, pp. 392–395.
[53]
Messaoud M.B., Miladi A., Jenhani I., Mkaouer M.W., Ghadhab L., Duplicate bug report detection using an attention-based neural language model, IEEE Trans. Reliab. (2022).
[54]
Mahfoodh H., Hammad M., Identifying duplicate bug records using Word2Vec prediction with software risk analysis, Int. J. Comput. Digit. Syst. 11 (1) (2022) 763–773.
[55]
Zheng W., Feng C., Yu T., Yang X., Wu X., Towards understanding bugs in an open source cloud management stack: An empirical study of OpenStack software bugs, J. Syst. Softw. 151 (2019) 210–223.
[56]
Grandini M., Bagli E., Visani G., Metrics for multi-class classification: an overview, 2020, arXiv preprint arXiv:2008.05756.
[57]
J. Romano, Appropriate Statistics for Ordinal Level Data: Should We Really Be Using t-test and Cohen’sd for Evaluating Group Differences on the NSSE and Other Surveys, in: Annual Meeting of the Florida Association of Institutional Research., 2006.
[58]
Yu T., Wei W., Xue H., Hayes J., ConPredictor: Concurrency defect prediction in real-world applications, IEEE Trans. Softw. Eng. PP (99) (2018) 1.
[59]
Zhou C., Li B., Sun X., Guo H., Recognizing software bug-specific named entity in software bug repository, in: 2018 IEEE/ACM 26th International Conference on Program Comprehension, ICPC, IEEE, 2018, pp. 108–10811.
[60]
Zhou C., Li B., Sun X., Improving software bug-specific named entity recognition with deep neural network, J. Syst. Softw. 165 (2020).
[61]
Ni C., Xia X., Lo D., Chen X., Gu Q., Revisiting supervised and unsupervised methods for effort-aware cross-project defect prediction, IEEE Trans. Softw. Eng. (2020).
[62]
Herbold S., Trautsch A., Grabowski J., A comparative study to benchmark cross-project defect prediction approaches, IEEE Trans. Softw. Eng. 44 (9) (2017) 811–833.
[63]
Hindle A., Alipour A., Stroulia E., A contextual approach towards more accurate duplicate bug report detection and ranking, Empir. Softw. Eng. 21 (2) (2016) 368–410.
[64]
Li B., Wei Y., Sun X., Bo L., Chen D., Tao C., Towards the identification of bug entities and relations in bug reports, Autom. Softw. Eng. 29 (1) (2022) 1–31.
[65]
Neysiani B.S., Babamir S.M., Automatic duplicate bug report detection using information retrieval-based versus machine learning-based approaches, in: 2020 6th International Conference on Web Research, ICWR, IEEE, 2020, pp. 288–293.
[66]
Bansal K., Rohil H., Literature review of finding duplicate bugs in open source systems, in: 2021 Fourth International Conference on Computational Intelligence and Communication Technologies, CCICT, IEEE, 2021, pp. 389–396.
[67]
Zheng W., Xun Y., Wu X., Deng Z., Chen X., Sui Y., A comparative study of class rebalancing methods for security bug report classification, IEEE Trans. Reliab. 70 (4) (2021) 1658–1670.
[68]
Y. Song, O. Chaparro, Bee: A tool for structuring and analyzing bug reports, in: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020, pp. 1551–1555.
[69]
Kukkar A., Mohana R., Kumar Y., Nayyar A., Bilal M., Kwak K.-S., Duplicate bug report detection and classification system based on deep learning technique, IEEE Access 8 (2020) 200749–200763.
[70]
Fang F., Wu J., Li Y., Ye X., Aljedaani W., Mkaouer M.W., On the classification of bug reports to improve bug localization, Soft Comput. 25 (2021) 7307–7323.
[71]
”̈Ozt”̈urk C.E., Yilmaz E.H., K”̈oksal ”̈O., Koç A., Software module classification for commercial bug reports, in: 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW, IEEE, 2023, pp. 1–5.
[72]
Gomes L.A.F., da Silva Torres R., Côrtes M.L., Bug report severity level prediction in open source software: A survey and research opportunities, Inf. Softw. Technol. 115 (2019) 58–78.
[73]
Chen R., Guo S.-K., Wang X.-Z., Zhang T.-L., Fusion of multi-RSMOTE with fuzzy integral to classify bug reports with an imbalanced distribution, IEEE Trans. Fuzzy Syst. 27 (12) (2019) 2406–2420.
[74]
X. Yang, D. Lo, X. Xia, L. Bao, J. Sun, Combining Word Embedding with Information Retrieval to Recommend Similar Bug Reports, in: IEEE International Symposium on Software Reliability Engineering, 2016.
[75]
A. Budhiraja, R. Reddy, M. Shrivastava, Poster: LWE: LDA Refined Word Embeddings for Duplicate Bug Report Detection, in: The 40th International Conference, 2018.
[76]
Kukkar A., Mohana R., Kumar Y., Nayyar A., Kwak K.S., Duplicate bug report detection and classification system based on deep learning technique, IEEE Access 8 (2020) 200749–200763.
[77]
G. Xiao, X. Du, Y. Sui, T. Yue, HINDbr: Heterogeneous Information Network Based Duplicate Bug Report Prediction, in: 2020 IEEE 31st International Symposium on Software Reliability Engineering, ISSRE, 2020.
[78]
Collobert R., Weston J., Bottou L., Karlen M., Kavukcuoglu K., Kuksa P., Natural language processing (almost) from scratch, J. Mach. Learn. Res. 12 (ARTICLE) (2011) 2493–2537.
[79]
Alipour A., Hindle A., Stroulia E., A Contextual Approach Towards More Accurate Duplicate Bug Report Detection, 2013.
[80]
Banerjee S., Syed Z., Helmick J., Culp M., Ryan K., Cukic B., Automated triaging of very large bug repositories, Inf. Softw. Technol. (2017).
[81]
S. Banerjee, Z. Syed, J. Helmick, B. Cukic, A fusion approach for classifying duplicate problem reports, in: Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on, 2014.
[82]
Y. Li, R.P. Gopalan, Clustering high dimensional sparse transactional data with constraints, in: IEEE International Conference on Granular Computing, 2006.
Recommendations
Comments
Information & Contributors
Information
Published In
Copyright © 2023.
Publisher
Elsevier Science Publishers B. V.
Netherlands
Publication History
Published: 25 January 2024
Author Tags
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025