Abstract
Technical debt is a metaphor for seeking short-term gains at expense of long-term code quality. Previous studies have shown that self-admitted technical debt, which is introduced intentionally, has strong negative impacts on software development and incurs high maintenance overheads. To help developers identify self-admitted technical debt, researchers have proposed many state-of-the-art methods. However, there is still room for improvement about the effectiveness of the current methods, as self-admitted technical debt comments have the characteristics of length variability, low proportion and style diversity. Therefore, in this paper, we propose a novel approach based on the bidirectional long short-term memory (BiLSTM) networks with the attention mechanism to automatically detect self-admitted technical debt by leveraging source code comments. In BiLSTM, we utilize a balanced cross entropy loss function to overcome the class unbalance problem. We experimentally investigate the performance of our approach on a public dataset including 62, 566 code comments from ten open source projects. Experimental results show that our approach achieves 81.75% in terms of precision, 72.24% in terms of recall and 75.86% in terms of F1-score on average and outperforms the state-of-the-art text mining-based method by 8.14%, 5.49% and 6.64%, respectively.
Similar content being viewed by others
References
Mensah S, Keung J, Svajlenko J, Bennin K E, Mi Q. On the value of a prioritization scheme for resolving self-admitted technical debt. Journal of Systems and Software, 2018, 135: 37–54
Cunningham W. The WyCash portfolio management system. ACM SIG-PLAN OOPS Messenger, 1992, 4(2): 29–30
Lim E, Taksande N, Seaman C. A balancing act: what software practitioners have to say about technical debt. IEEE Software, 2012, 29(6): 22–27
Yli-Huumo J, Maglyas A, Smolander K. How do software development teams manage technical debt? an empirical study. Journal of Systems and Software, 2016, 120: 195–218
Zazworka N, Shaw M A, Shull F, Seaman C. Investigating the impact of design debt on software quality. In: Proceedings of the 2nd Workshop on Managing Technical Debt. 2011, 17–23
Li Z, Avgeriou P, Liang P. A systematic mapping study on technical debt and its management. Journal of Systems and Software, 2015, 101: 193–220
Maldonado E S, Shihab E, Tsantalis N. Using natural language processing to automatically detect self-admitted technical debt. IEEE Transactions on Software Engineering, 2017, 43(11): 1044–1062
Huang Q, Shihab E, Xia X, Lo D, Li S. Identifying self-admitted technical debt in open source projects using text mining. Empirical Software Engineering, 2018, 23(1): 418–451
Potdar A, Shihab E. An exploratory study on self-admitted technical debt. In: Proceedings of 2014 IEEE International Conference on Software Maintenance and Evolution. 2014, 91–100
Maldonado E S, Shihab E. Detecting and quantifying different types of self-admitted technical debt. In: Proceedings of the 7th IEEE International Workshop on Managing Technical Debt. 2015, 9–15
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780
Yu R, Gao J, Yu M, Lu W, Xu T, Zhao M, Zhang J, Zhang R, Zhang Z. LSTM-EFG for wind power forecasting based on sequential correlation features. Future Generation Computer Systems, 2019, 93: 33–42
Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 2005, 18(5–6): 602–610
Zhang S, Zheng D, Hu X, Yang M. Bidirectional long short-term memory networks for relation classification. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. 2015, 73–78
Liu G, Guo J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 2019, 337: 325–338
Zeng D, Liu K, Chen Y, Zhao J. Distant supervision for relation extraction via piecewise convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 1753–1762
Schnabel T, Labutov I, Mimno D, Joachims T. Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 298–307
Pennington J, Socher R, Manning C D. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1532–1543
Kotsiantis S, Kanellopoulos D, Pintelas P. Handling imbalanced datasets: a review. GESTS International Transactions on Computer Science and Engineering, 2006, 30(1): 25–36
Wasikowski M, Chen X. Combating the small sample class imbalance problem using feature selection. IEEE Transactions on Knowledge and Data Engineering, 2009, 22(10): 1388–1400
Xie S, Tu Z. Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1395–1403
Bajpai P, Kumar M. Genetic algorithm-an approach to solve global optimization problems. Indian Journal of Computer Science and Engineering, 2010, 1(3): 199–206
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
Zampetti F, Noiseux C, Antoniol G, Khomh F, Di Penta M. Recommending when design technical debt should be self-admitted. In: Proceedings of 2017 IEEE International Conference on Software Maintenance and Evolution. 2017, 216–226
Liu Z, Huang Q, Xia X, Shihab E, Lo D, Li S. Satd detector: a text-mining-based self-admitted technical debt detection tool. In: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 2018, 9–12
Lee M L, Ling T W, Low W L. IntelliClean: a knowledge-based intelligent data cleaner. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2000, 290–294
Gelfand A E. Model determination using sampling-based methods. Markov chain Monte Carlo in practice, 1996, 145–161
Jiang H, Zhang J, Li X, Ren Z, Lo D. A more accurate model for finding tutorial segments explaining APIs. In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 2016, 157–167
Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324
Sierra G, Shihab E, Kamei Y. A survey of self-admitted technical debt. Journal of Systems and Software, 2019, 152: 70–82
Li Z, Avgeriou P, Liang P. A systematic mapping study on technical debt and its management. Journal of Systems and Software, 2015, 101: 193–220
Fontana F A, Ferme V, Spinelli S. Investigating the impact of code smells debt on quality code evaluation. In: Proceedings of the 3rd International Workshop on Managing Technical Debt. 2012, 15–22
Tom E, Aurum A K, Vidgen R. An exploration of technical debt. Journal of Systems and Software, 2013, 86(6): 1498–1516
Zazworka N, Spínola R O, Vetro’ A, Shull F, Seaman C. A case study on effectively identifying technical debt. In: Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering. 2013, 42–47
Alves N S R, Mendes T S, Mendonça M G, Spínola R O, Shull F, Seaman C. Identification and management of technical debt: a systematic mapping study. Information and Software Technology, 2016, 70: 100–121
Farias M A F, Mendonça M G, Silva A B, Sp nola R O. A contextualized vocabulary model for identifying technical debt on code comments. In: Proceedings of the 7th IEEE International Workshop on Managing Technical Debt. 2015, 25–32
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (Grants Nos. 61100043, 61902096 and 61702144) and Key Project of Science and Technology of Zhejiang Province (2017C01010).
Author information
Authors and Affiliations
Corresponding author
Additional information
Dongjin Yu is currently a professor at Hangzhou Dianzi University, China. His research efforts include intelligent software engineering, data engineering and service computing. He is the director of Big Data Institute, and the director of Computer Software Institute of Hangzhou Dianzi University. He is a member of IEEE, and a senior member of China Computer Federation (CCF). He is also a member of Technical Committee of Software Engineering CCF (TCSE CCF) and a member of Technical Committee of Service Computing CCF (TCSC CCF).
Lin Wang received the Bachelor Degree in 2017 from the School of computer science, Hangzhou Dianzi University, China. She is currently a graduate student at Hangzhou Dianzi University, China. Her current research interests mainly include mining software repositories and software maintenance.
Xin Chen received the PhD degree in software engineering in 2018 from the School of Software, Dalian University of Technology, China. He is currently a lecturer of Hangzhou Dianzi University, China. His research interests include mining software repositories, search based software engineering, and evolutionary computation. He is a member of the CCF and the ACM.
Jie Chen is an assistant professor in the College of Computer Science at Hangzhou Dianzi University, China. She received the PhD degree from the Lab of Internet Software Technologies, Institute of Software, Chinese Academy of Sciences (ISCAS), China in 2016. She was a visiting scholar in the Department of Computer Science, University of Massachusetts Amherst, USA from September 2012 to September 2013. Her research interests are in software process simulation, resource scheduling and code analysis.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Yu, D., Wang, L., Chen, X. et al. Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt. Front. Comput. Sci. 15, 154208 (2021). https://doi.org/10.1007/s11704-020-9281-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-020-9281-z