Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Neural Network-based Detection of Self-Admitted Technical Debt: From Performance to Explainability

Published: 29 July 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Technical debt is a metaphor to reflect the tradeoff software engineers make between short-term benefits and long-term stability. Self-admitted technical debt (SATD), a variant of technical debt, has been proposed to identify debt that is intentionally introduced during software development, e.g., temporary fixes and workarounds. Previous studies have leveraged human-summarized patterns (which represent n-gram phrases that can be used to identify SATD) or text-mining techniques to detect SATD in source code comments. However, several characteristics of SATD features in code comments, such as vocabulary diversity, project uniqueness, length, and semantic variations, pose a big challenge to the accuracy of pattern or traditional text-mining-based SATD detection, especially for cross-project deployment. Furthermore, although traditional text-mining-based method outperforms pattern-based method in prediction accuracy, the text features it uses are less intuitive than human-summarized patterns, which makes the prediction results hard to explain. To improve the accuracy of SATD prediction, especially for cross-project prediction, we propose a Convolutional Neural Network-- (CNN) based approach for classifying code comments as SATD or non-SATD. To improve the explainability of our model’s prediction results, we exploit the computational structure of CNNs to identify key phrases and patterns in code comments that are most relevant to SATD. We have conducted an extensive set of experiments with 62,566 code comments from 10 open-source projects and a user study with 150 comments of another three projects. Our evaluation confirms the effectiveness of different aspects of our approach and its superior performance, generalizability, adaptability, and explainability over current state-of-the-art traditional text-mining-based methods for SATD classification.

    References

    [1]
    Ward Cunningham. 1993. The WyCash portfolio management system. ACM SIGPLAN OOPS Messenger 4, 2 (1993), 29--30.
    [2]
    Nico Zazworka, Michele A Shaw, Forrest Shull, and Carolyn Seaman. 2011. Investigating the impact of design debt on software quality. In Proceedings of the 2nd Workshop on Managing Technical Debt. ACM, 17--23.
    [3]
    Nanette Brown, Yuanfang Cai, Yuepu Guo, Rick Kazman, Miryung Kim, Philippe Kruchten, Erin Lim, Alan MacCormack, Robert Nord, Ipek Ozkaya, et al. 2010. Managing technical debt in software-reliant systems. In Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research. ACM, 47--52.
    [4]
    Yuepu Guo and Carolyn Seaman. 2011. A portfolio approach to technical debt management. In Proceedings of the 2nd Workshop on Managing Technical Debt. ACM, 31--34.
    [5]
    Chris Sterling. 2010. Managing Software Debt: Building for Inevitable Change (Adobe Reader). Addison-Wesley Professional, New York, NY.
    [6]
    Aniket Potdar and Emad Shihab. 2014. An exploratory study on self-admitted technical debt. In Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution (ICSME’14). IEEE, 91--100.
    [7]
    Sultan Wehaibi, Emad Shihab, and Latifa Guerrouj. 2016. Examining the impact of self-admitted technical debt on software quality. In Proceedings of the 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER’16), Vol. 1. IEEE, 179--188.
    [8]
    Radu Marinescu. 2004. Detection strategies: Metrics-based rules for detecting design flaws. In Proceedings of the 20th IEEE International Conference on Software Maintenance 2004. IEEE, 350--359.
    [9]
    Radu Marinescu, George Ganea, and Ioana Verebi. 2010. Incode: Continuous quality assessment and improvement. In Proceedings of the 2010 14th European Conference on Software Maintenance and Reengineering (CSMR’10). IEEE, 274--275.
    [10]
    Qiao Huang, Emad Shihab, Xin Xia, David Lo, and Shanping Li. 2018. Identifying self-admitted technical debt in open source projects using text mining. Empirical Software Engineering 23, 1 (2018), 418--451.
    [11]
    Everton da Silva Maldonado, Emad Shihab, and Nikolaos Tsantalis. 2017. Using natural language processing to automatically detect self-admitted technical debt. IEEE Trans. Softw. Eng. 43, 11 (2017), 1044--1062.
    [12]
    Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14). Association for Computational Linguistics, 1746--1751. https://www.aclweb.org/anthology/D14-1181.
    [13]
    Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014. Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 1. 1555--1565.
    [14]
    Siwei Lai, Kang Liu, Shizhu He, and Jun Zhao. 2016. How to generate a good word embedding. IEEE Intell. Syst. 31, 6 (2016), 5--14.
    [15]
    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.
    [16]
    Michael Kampffmeyer, Arnt-Børre Salberg, and Robert Jenssen. 2016. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’16). IEEE, 680--688.
    [17]
    Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [18]
    Guibin Chen, Chunyang Chen, Zhenchang Xing, and Bowen Xu. 2016. Learning a dual-language vector space for domain-specific cross-lingual question retrieval. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 744--755.
    [19]
    Fabrizio Sebastiani. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1 (2002), 1--47.
    [20]
    Andrew McCallum, Kamal Nigam, et al. 1998. A comparison of event models for naive bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Vol. 752. Citeseer, 41--48.
    [21]
    Simon Tong and Daphne Koller. 2001. Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, (Nov. 2001), 45--66.
    [22]
    Eui-Hong Sam Han, George Karypis, and Vipin Kumar. 2001. Text categorization using weight adjusted k-nearest neighbor classification. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Berlin, 53--65.
    [23]
    Nikolaos Tsantalis, Theodoros Chaikalis, and Alexander Chatzigeorgiou. 2008. JDeodorant: Identification and removal of type-checking bad smells. In Proceedings of the 12th European Conference on Software Maintenance and Reengineering 2008 (CSMR’08). IEEE, 329--331.
    [24]
    Everton da S. Maldonado and Emad Shihab. 2015. Detecting and quantifying different types of self-admitted technical debt. In Proceedings of the IEEE 7th International Workshop on Managing Technical Debt (MTD'15). IEEE, 9--15.
    [25]
    Jacob Cohen. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.Psychol. Bull. 70, 4 (1968), 213.
    [26]
    Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 9 (2009), 1263--1284.
    [27]
    Wei Fu, Tim Menzies, and Xipeng Shen. 2016. Tuning for software analytics: Is it really necessary? Inf. Softw. Technol. 76 (2016), 135--146.
    [28]
    Ye Zhang and Byron Wallace. 2015. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820 (2015).
    [29]
    Gerhard Wohlgenannt and Filip Minic. 2016. Using word2vec to build a simple ontology learning system. In Proceedings of the International Semantic Web Conference (Posters 8 Demos).
    [30]
    Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometr. Bull. 1, 6 (1945), 80--83.
    [31]
    Simon S. Haykin, Simon S. Haykin, Simon S. Haykin, and Simon S. Haykin. 2009. Neural Networks and Learning Machines. Vol. 3. Pearsons, Upper Saddle River, NJ.
    [32]
    Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters.Psychol. Bull. 76, 5 (1971), 378.
    [33]
    Erik Arisholm, Lionel C. Briand, and Magnus Fuglerud. 2007. Data mining techniques for building fault-proneness models in telecom java software. In Proceedings of the 18th IEEE International Symposium on Software Reliability 2007 (ISSRE’07). IEEE, 215--224.
    [34]
    Foyzur Rahman, Daryl Posnett, and Premkumar Devanbu. 2012. Recalling the imprecision of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM, 61.
    [35]
    Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 279--289.
    [36]
    Carolyn Seaman, Yuepu Guo, Clemente Izurieta, Yuanfang Cai, Nico Zazworka, Forrest Shull, and Antonio Vetrò. 2012. Using technical debt data in decision making: Potential decision approaches. In Proceedings of the 3rd International Workshop on Managing Technical Debt. IEEE Press, 45--48.
    [37]
    Philippe Kruchten, Robert L Nord, Ipek Ozkaya, and Davide Falessi. 2013. Technical debt: Towards a crisper definition report on the 4th international workshop on managing technical debt. ACM SIGSOFT Softw. Eng. Not. 38, 5 (2013), 51--54.
    [38]
    Erin Lim, Nitin Taksande, and Carolyn Seaman. 2012. A balancing act: What software practitioners have to say about technical debt. IEEE Softw. 29, 6 (2012), 22--27.
    [39]
    Nico Zazworka, Rodrigo O Spínola, Antonio Vetro, Forrest Shull, and Carolyn Seaman. 2013. A case study on effectively identifying technical debt. In Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering. ACM, 42--47.
    [40]
    Gabriele Bavota and Barbara Russo. 2016. A large-scale empirical study on self-admitted technical debt. In Proceedings of the 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR’16). IEEE, 315--326.
    [41]
    Andrian Marcus and Jonathan I. Maletic. 2003. Recovering documentation-to-source-code traceability links using latent semantic indexing. In Proceedings of the 25th International Conference on Software Engineering. IEEE, 125--135.
    [42]
    Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007./<sup>*</sup> iComment: Bugs or bad comments?<sup>*</sup> In ACM SIGOPS Operating Systems Review, Vol. 41. ACM, 145--158.
    [43]
    Shin Hwei Tan, Darko Marinov, Lin Tan, and Gary T. Leavens. 2012. @ tcomment: Testing javadoc comments to detect comment-code inconsistencies. In Proceedings of the 2012 IEEE 5th International Conference on Software Testing, Verification and Validation (ICST’12). IEEE, 260--269.
    [44]
    Haroon Malik, Istehad Chowdhury, Hsiao-Ming Tsou, Zhen Ming Jiang, and Ahmed E. Hassan. 2008. Understanding the rationale for updating a function’s comment. In Proceedings of the IEEE International Conference on Software Maintenance 2008 (ICSM’08). IEEE, 167--176.
    [45]
    Beat Fluri, Michael Wursch, and Harald C. Gall. 2007. Do code and comments co-evolve? on the relation between source code and comment changes. In Proceedings of the 14th Working Conference on Reverse Engineering 2007 (WCRE’07). IEEE, 70--79.
    [46]
    Ninus Khamis, René Witte, and Juergen Rilling. 2010. Automatic quality assessment of source code comments: The JavadocMiner. In Proceedings of the International Conference on Application of Natural Language to Information Systems. Springer, Berlin, 68--79.
    [47]
    Yoann Padioleau, Lin Tan, and Yuanyuan Zhou. 2009. Listening to programmers taxonomies and characteristics of comments in operating system code. In Proceedings of the 31st International Conference on Software Engineering. IEEE Computer Society, 331--341.
    [48]
    Margaret-Anne Storey, Jody Ryall, R. Ian Bull, Del Myers, and Janice Singer. 2008. TODO or to bug. In Proceedings of the ACM/IEEE 30th International Conference on Software Engineering 2008 (ICSE’08). IEEE, 251--260.
    [49]
    Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 631--642.
    [50]
    Chengnian Sun, David Lo, Xiaoyin Wang, Jing Jiang, and Siau-Cheng Khoo. 2010. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, Volume 1. ACM, 45--54.
    [51]
    Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13). IEEE, 6645--6649.
    [52]
    Bowen Xu, Deheng Ye, Zhenchang Xing, Xin Xia, Guibin Chen, and Shanping Li. 2016. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 51--62.
    [53]
    Lin Ma, Zhengdong Lu, and Hang Li. 2016. Learning to answer questions from image using convolutional neural network. In AAAI, Vol. 3. 16.
    [54]
    Gerard Salton, Anita Wong, and Chung-Shu Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613--620.
    [55]
    Paul S. Jacobs and Uri Zernik. 1988. Acquiring lexical knowledge from text: A case study. In AAAI, Vol. 88. 739--744.
    [56]
    Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3, 6 (2003), 1137--1155.
    [57]
    Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. ACM, 160--167.
    [58]
    Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 384--394.
    [59]
    Ruiji Fu, Jiang Guo, Bing Qin, Wanxiang Che, Haifeng Wang, and Ting Liu. 2014. Learning semantic hierarchies via word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 1. 1199--1209.
    [60]
    Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. 2016. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of the 38th International Conference on Software Engineering. ACM, 404--415.
    [61]
    Trong Duc Nguyen, Anh Tuan Nguyen, Hung Dang Phan, and Tien N Nguyen. 2017. Exploring API embedding for API usages and applications. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE’17). IEEE, 438--449.
    [62]
    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
    [63]
    Chunyang Chen, Zhenchang Xing, and Ximing Wang. 2017. Unsupervised software-specific morphological forms inference from informal discussions. In Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 450--461.
    [64]
    Vivek Nair, Tim Menzies, Norbert Siegmund, and Sven Apel. 2017. Using bad learners to find good configurations. arXiv preprint arXiv:1702.05701 (2017).

    Cited By

    View all
    • (2024)What Makes a Good TODO Comment?ACM Transactions on Software Engineering and Methodology10.1145/366481133:6(1-30)Online publication date: 28-Jun-2024
    • (2024)SATDAUG - A Balanced and Augmented Dataset for Detecting Self-Admitted Technical DebtProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644880(289-293)Online publication date: 15-Apr-2024
    • (2024)Are Prompt Engineering and TODO Comments Friends or Foes? An Evaluation on GitHub CopilotProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639176(1-13)Online publication date: 20-May-2024
    • Show More Cited By

    Index Terms

    1. Neural Network-based Detection of Self-Admitted Technical Debt: From Performance to Explainability

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Software Engineering and Methodology
      ACM Transactions on Software Engineering and Methodology  Volume 28, Issue 3
      July 2019
      278 pages
      ISSN:1049-331X
      EISSN:1557-7392
      DOI:10.1145/3343019
      • Editor:
      • Mauro Pezzè
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 29 July 2019
      Accepted: 01 March 2019
      Revised: 01 March 2019
      Received: 01 June 2018
      Published in TOSEM Volume 28, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Self-admitted technical debt
      2. convolutional neural network
      3. cross project prediction
      4. model adaptability
      5. model explainability
      6. model generalizability

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • NSFC Program
      • Fundamental Research Funds for the Central Universities
      • National Key Research and Development Program of China
      • Project of Science and Technology Research and Development Program of China Railway Corporation

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)126
      • Downloads (Last 6 weeks)16
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)What Makes a Good TODO Comment?ACM Transactions on Software Engineering and Methodology10.1145/366481133:6(1-30)Online publication date: 28-Jun-2024
      • (2024)SATDAUG - A Balanced and Augmented Dataset for Detecting Self-Admitted Technical DebtProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644880(289-293)Online publication date: 15-Apr-2024
      • (2024)Are Prompt Engineering and TODO Comments Friends or Foes? An Evaluation on GitHub CopilotProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639176(1-13)Online publication date: 20-May-2024
      • (2024)Self-Admitted Technical Debts Identification: How Far Are We?2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00087(804-815)Online publication date: 12-Mar-2024
      • (2024)Keeping Deep Learning Models in Check: A History-Based Approach to Mitigate OverfittingIEEE Access10.1109/ACCESS.2024.340254312(70676-70689)Online publication date: 2024
      • (2024)Towards automating self-admitted technical debt repaymentInformation and Software Technology10.1016/j.infsof.2023.107376167(107376)Online publication date: Mar-2024
      • (2024)BiMNet: A Multimodal Data Fusion Network for continuous circular capsulorhexis Action SegmentationExpert Systems with Applications10.1016/j.eswa.2023.121885238(121885)Online publication date: Mar-2024
      • (2024)Detect software vulnerabilities with weight biases via graph neural networksExpert Systems with Applications10.1016/j.eswa.2023.121764238(121764)Online publication date: Mar-2024
      • (2024)Keyword-labeled self-admitted technical debt and static code analysis have significant relationship but limited overlapSoftware Quality Journal10.1007/s11219-023-09655-z32:2(391-429)Online publication date: 1-Jun-2024
      • (2024)Quantifying and characterizing clones of self-admitted technical debt in build systemsEmpirical Software Engineering10.1007/s10664-024-10449-529:2Online publication date: 26-Feb-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media