research-article

What Makes a Good TODO Comment?

Authors:

Xiaohu YangAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 33, Issue 6

Article No.: 165, Pages 1 - 30

https://doi.org/10.1145/3664811

Published: 28 June 2024 Publication History

Abstract

Software development is a collaborative process that involves various interactions among individuals and teams. TODO comments in source code play a critical role in managing and coordinating diverse tasks during this process. However, this study finds that a large proportion of open-source project TODO comments are left unresolved or take a long time to be resolved. About 46.7% of TODO comments in open-source repositories are of low-quality (e.g., TODOs that are ambiguous, lack information, or are useless to developers). This highlights the need for better TODO practices. In this study, we investigate four aspects regarding the quality of TODO comments in open-source projects: (1) the prevalence of low-quality TODO comments; (2) the key characteristics of high-quality TODO comments; (3) how are TODO comments of different quality managed in practice; and (4) the feasibility of automatically assessing TODO comment quality. Examining 2,863 TODO comments from Top100 GitHub Java repositories, we propose criteria to identify high-quality TODO comments and provide insights into their optimal composition. We discuss the lifecycle of TODO comments with varying quality. To assist developers, we construct deep learning-based methods that show promising performance in identifying the quality of TODO comments, potentially enhancing development efficiency and code quality.

References

[1]

Zenodo. 2023. What Makes a Good TODO Comment. Retrieved from https://zenodo.org/records/10878002

[2]

Emery D. Berger, Celeste Hollenbeck, Petr Maj, Olga Vitek, and Jan Vitek. 2019. On the impact of programming languages on code quality: A reproduction study. ACM Trans. Program. Lang. Syst. 41, 4 (2019), 1–24.

Digital Library

[3]

Leo Breiman. 2001. Random forests. Mach. Learn. 45, 1 (2001), 5–32.

Digital Library

[4]

Nathan Cassee, Fiorella Zampetti, Nicole Novielli, Alexander Serebrenik, and Massimiliano Di Penta. 2022. Self-admitted technical debt and comments’ polarity: An empirical study. Empir. Softw. Eng. 27, 6 (2022), 139.

Digital Library

[5]

Yitian Chai, Hongyu Zhang, Beijun Shen, and Xiaodong Gu. 2022. Cross-domain deep code search with meta learning. In Proceedings of the 44th International Conference on Software Engineering. 487–498.

Digital Library

[6]

Jie-Cherng Chen and Sun-Jen Huang. 2009. An empirical analysis of the impact of software development problem factors on software maintainability. J. Syst. Softw. 82, 6 (2009), 981–992.

Digital Library

[7]

Yahui Chen. 2015. Convolutional neural network for sentence classification. Master’s thesis. University of Waterloo.

[8]

Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Edu. Psychol. Measure. 20, 1 (1960), 37–46.

[9]

Sergio Cozzetti B. de Souza, Nicolas Anquetil, and Káthia M. de Oliveira. 2005. A study of the documentation essential to software maintenance. In Proceedings of the 23rd Annual International Conference on Design of Communication: Documenting and Designing for Pervasive Information. 68–75.

Digital Library

[10]

Uri Dekel and James D. Herbsleb. 2009. Reading the documentation of invoked API functions in program comprehension. In Proceedings of the IEEE 17th International Conference on Program Comprehension. IEEE, 168–177.

[11]

Luca Di Grazia and Michael Pradel. 2023. Code search: A survey of techniques for finding code. Comput. Surveys 55, 11 (2023), 1–31.

Digital Library

[12]

Shima Esfandiari and Ashkan Sami. 2023. An exploratory study of the relationship between SATD and other software development activities. In Proceedings of the 13th International Conference on Computer and Knowledge Engineering (ICCKE’23). IEEE, 096–101.

[13]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. Codebert: A pre-trained model for programming and natural languages. Retrieved from https://arXiv:2002.08155

[14]

Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 5 (1971), 378.

[15]

Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29, 5 (2001), 1189–1232.

[16]

Gianmarco Fucci, Fiorella Zampetti, Alexander Serebrenik, and Massimiliano Di Penta. 2020. Who (self) admits technical debt? In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 672–676.

[17]

Shuzheng Gao, Cuiyun Gao, Yulan He, Jichuan Zeng, Lunyiu Nie, Xin Xia, and Michael Lyu. 2023. Code structure-guided transformer for source code summarization. ACM Trans. Softw. Eng. Methodol. 32, 1 (2023), 1–32.

Digital Library

[18]

Zhipeng Gao, Xin Xia, John Grundy, David Lo, and Yuan-Fang Li. 2020. Generating question titles for stack overflow from mined code snippets. ACM Trans. Softw. Eng. Methodol. 29, 4 (2020), 1–37.

Digital Library

[19]

Zhipeng Gao, Xin Xia, David Lo, and John Grundy. 2020. Technical q8a site answer recommendation via question boosting. ACM Trans. Softw. Eng. Methodol. 30, 1 (2020), 1–34.

Digital Library

[20]

Zhipeng Gao, Xin Xia, David Lo, John Grundy, and Thomas Zimmermann. 2021. Automating the removal of obsolete TODO comments. Retrieved from https://arXiv:2108.05846

[21]

GitHub. 2023. The state of open source software. Retrieved from https://octoverse.github.com/

[22]

Dorsaf Haouari, Houari Sahraoui, and Philippe Langlais. 2011. How good is your comment? A study of comments in java programs. In Proceedings of the International Symposium on Empirical Software Engineering and Measurement. IEEE, 137–146.

Digital Library

[23]

Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 2 (1979), 65–70.

[24]

Qiao Huang, Emad Shihab, Xin Xia, David Lo, and Shanping Li. 2018. Identifying self-admitted technical debt in open source projects using text mining. Empir. Softw. Eng. 23, 1 (2018), 418–451.

Digital Library

[25]

JasonEtco. 2022. TODO Bot. Retrieved from https://github.com/apps/todo

[26]

Yasutaka Kamei, Everton da S. Maldonado, Emad Shihab, and Naoyasu Ubayashi. 2016. Using analytics to quantify interest of self-admitted technical debt. In Proceedings of the International Workshop on Quantitative Approaches to Software Quality and International Workshop on Technical Debt Analytics at the Asia-Pacific Software Engineering Conference (QuASoQ/TDA@ APSEC’16). 68–71.

[27]

Ninus Khamis, René Witte, and Juergen Rilling. 2010. Automatic quality assessment of source code comments: The JavadocMiner. In Proceedings of the 15th International Conference on Applications of Natural Language to Information Systems (NLDB’10). Springer, 68–79.

[28]

Miikka Kuutila, Mika Mäntylä, Umar Farooq, and Maelick Claes. 2020. Time pressure in software engineering: A systematic review. Info. Softw. Technol. 121 (2020), 106257.

Digital Library

[29]

Zengyang Li, Paris Avgeriou, and Peng Liang. 2015. A systematic mapping study on technical debt and its management. J. Syst. Softw. 101 (2015), 193–220.

Digital Library

[30]

Zhong Li, Minxue Pan, Yu Pei, Tian Zhang, Linzhang Wang, and Xuandong Li. 2024. Empirically revisiting and enhancing automatic classification of bug and non-bug issues. Front. Comput. Sci. 18, 5 (2024), 1–20.

Digital Library

[31]

Jinfeng Lin, Yalin Liu, Qingkai Zeng, Meng Jiang, and Jane Cleland-Huang. 2021. Traceability transformed: Generating more accurate links with pre-trained BERT models. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE’21). IEEE, 324–335.

Digital Library

[32]

Zhiyong Liu, Huanchao Chen, Xiangping Chen, Xiaonan Luo, and Fan Zhou. 2018. Automatic detection of outdated comments during code changes. In Proceedings of the IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC’18), Vol. 1. IEEE, 154–163.

[33]

Zhongxin Liu, Xin Xia, Meng Yan, and Shanping Li. 2020. Automating just-in-time comment updating. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 585–597.

Digital Library

[34]

Innobuilt Software LLC. 2019. All your TODO comments in one place. Retrieved from https://imdone.io

[35]

Edward Loper and Steven Bird. 2002. Nltk: The natural language toolkit. Retrieved from https://cs/0205028

[36]

Ilya Loshchilov and Frank Hutter. 2017. Fixing weight decay regularization in adam. ArXiv abs/1711.05101 (2017).

[37]

Everton da S. Maldonado, Rabe Abdalkareem, Emad Shihab, and Alexander Serebrenik. 2017. An empirical study on the removal of self-admitted technical debt. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME’17). IEEE, 238–248.

[38]

Everton da S. Maldonado and Emad Shihab. 2015. Detecting and quantifying different types of self-admitted technical debt. In Proceedings of the IEEE 7th International Workshop on Managing Technical Debt (MTD’15). IEEE, 9–15.

[39]

André N. Meyer, Earl T. Barr, Christian Bird, and Thomas Zimmermann. 2019. Today was a good day: The daily life of software developers. IEEE Trans. Softw. Eng. 47, 5 (2019), 863–880.

[40]

Hamid Mohayeji, Felipe Ebert, Eric Arts, Eleni Constantinou, and Alexander Serebrenik. 2022. On the adoption of a TODO bot on GitHub: A preliminary study. In Proceedings of the 4th International Workshop on Bots in Software Engineering. 23–27.

[41]

Pengyu Nie, Junyi Jessy Li, Sarfraz Khurshid, Raymond Mooney, and Milos Gligoric. 2018. Natural language processing and program analysis for supporting todo comments as software evolves. In Proceedings of the Workshops at the 32nd AAAI Conference on Artificial Intelligence.

[42]

Pengyu Nie, Rishabh Rai, Junyi Jessy Li, Sarfraz Khurshid, Raymond J. Mooney, and Milos Gligoric. 2019. A framework for writing trigger-action todo comments in executable format. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 385–396.

Digital Library

[43]

Robert L. Nord, Ipek Ozkaya, Philippe Kruchten, and Marco Gonzalez-Rojas. 2012. In search of a metric for managing architectural technical debt. In Proceedings of the Joint Working IEEE/IFIP Conference on Software Architecture and European Conference on Software Architecture. IEEE, 91–100.

Digital Library

[44]

Leif E. Peterson. 2009. K-nearest neighbor. Scholarpedia 4, 2 (2009), 1883.

[45]

Reinhold Plösch, Andreas Dautovic, and Matthias Saft. 2014. The value of software documentation quality. In Proceedings of the 14th International Conference on Quality Software. IEEE, 333–342.

Digital Library

[46]

Aniket Potdar and Emad Shihab. 2014. An exploratory study on self-admitted technical debt. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution. IEEE, 91–100.

Digital Library

[47]

Sawan Rai, Ramesh Chandra Belwal, and Atul Gupta. 2022. A review on source code documentation. ACM Trans. Intell. Syst. Technol. 13, 5 (2022), 1–44.

Digital Library

[48]

Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in GitHub. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 155–165.

Digital Library

[49]

Xiaoxue Ren, Zhenchang Xing, Xin Xia, David Lo, Xinyu Wang, and John Grundy. 2019. Neural network-based detection of self-admitted technical debt: From performance to explainability. ACM Trans. Softw. Eng. Methodol. 28, 3 (2019), 1–45.

Digital Library

[50]

The replicate package. 2023. scikit-learn. Retrieved from https://scikit-learn.org/stable/

[51]

Martin Riedmiller and A. Lernen. 2014. Multi layer perceptron. Machine Learning Lab Special Lecture, University of Freiburg (2014), 7–24.

[52]

Barbara Russo, Matteo Camilli, and Moritz Mock. 2022. WeakSATD: Detecting weak self-admitted technical debt. In Proceedings of the 19th International Conference on Mining Software Repositories. 448–453.

Digital Library

[53]

Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 11 (1997), 2673–2681.

Digital Library

[54]

Hinrich Schütze, Christopher D. Manning, and Prabhakar Raghavan. 2008. Introduction to Information Retrieval. Vol. 39. Cambridge University Press, Cambridge, UK.

[55]

Justin Searls. 2019. todo_or_die. Retrieved from https://github.com/searls/todo_or_die

[56]

Giriprasad Sridhara. 2016. Automatically detecting the up-to-date status of ToDo comments in Java programs. In Proceedings of the 9th India Software Engineering Conference. 16–25.

Digital Library

[57]

Daniela Steidl, Benjamin Hummel, and Elmar Juergens. 2013. Quality analysis of source code comments. In Proceedings of the 21st International Conference on Program Comprehension (ICPC’13). Ieee, 83–92.

[58]

Margaret-Anne Storey, Jody Ryall, R Ian Bull, Del Myers, and Janice Singer. 2008. Todo or to bug: Exploring how task annotations play a role in the work practices of software developers. In Proceedings of the 30th International Conference on Software Engineering. 251–260.

Digital Library

[59]

Adam Svensson. 2015. Reducing outdated and inconsistent code comments during software development: The comment validator program. https://api.semanticscholar.org/CorpusID:61830050

[60]

Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. iComment: Bugs or bad comments? In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles. 145–158.

Digital Library

[61]

Yingchen Tian, Yuxia Zhang, Klaas-Jan Stol, Lin Jiang, and Hui Liu. 2022. What makes a good commit message? In Proceedings of the 44th International Conference on Software Engineering. 2389–2401.

Digital Library

[62]

Juliana Tolles and William J. Meurer. 2016. Logistic regression: Relating patient characteristics to outcomes. JAMA 316, 5 (2016), 533–534.

[63]

Sultan Wehaibi, Emad Shihab, and Latifa Guerrouj. 2016. Examining the impact of self-admitted technical debt on software quality. In Proceedings of the IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER’16), Vol. 1. IEEE, 179–188.

[64]

Frank Wilcoxon. 1992. Individual comparisons by ranking methods. In Breakthroughs in Statistics. Springer, 196–202.

[65]

Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in Software Engineering. Springer Science & Business Media.

[66]

Laerte Xavier, Fabio Ferreira, Rodrigo Brito, and Marco Tulio Valente. 2020. Beyond the code: Mining self-admitted technical debt in issue tracker systems. In Proceedings of the 17th International Conference on Mining Software Repositories. 137–146.

Digital Library

[67]

Bowen Xu, Thong Hoang, Abhishek Sharma, Chengran Yang, Xin Xia, and David Lo. 2021. Post2vec: Learning distributed representations of Stack Overflow posts. IEEE Trans. Softw. Eng. 48, 9 (2021), 3423–3441.

[68]

Bowen Xu, Zhenchang Xing, Xin Xia, and David Lo. 2017. AnswerBot: Automated generation of answer summary to developers’ technical questions. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). IEEE, 706–716.

[69]

Meng Yan, Xin Xia, Yuanrui Fan, Ahmed E. Hassan, David Lo, and Shanping Li. 2020. Just-in-time defect identification and localization: A two-phase framework. IEEE Transactions on Software Engineering 48, 1 (2020), 82–101.

[70]

Bai Yang, Zhang Liping, and Zhao Fengrong. 2019. A survey on research of code comment. In Proceedings of the 3rd International Conference on Management Engineering, Software Engineering and Service Sciences. 45–51.

Digital Library

[71]

Jerin Yasmin, Mohammad Sadegh Sheikhaei, and Yuan Tian. 2022. A first look at duplicate and near-duplicate self-admitted technical debt comments. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension. 614–618.

Digital Library

[72]

Annie T. T. Ying, James L. Wright, and Steven Abrams. 2005. Source code that talks: An exploration of Eclipse task comments and their implication to repository mining. ACM SIGSOFT Softw. Eng. Notes 30, 4 (2005), 1–5.

Digital Library

[73]

Sarim Zafar, Muhammad Zubair Malik, and Gursimran Singh Walia. 2019. Towards standardizing and improving classification of bug-fix commits. In Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM’19). IEEE, 1–6.

[74]

Fiorella Zampetti, Gianmarco Fucci, Alexander Serebrenik, and Massimiliano Di Penta. 2021. Self-admitted technical debt practices: A comparison between industry and open-source. Empir. Softw. Eng. 26 (2021), 1–32.

Digital Library

[75]

Fiorella Zampetti, Alexander Serebrenik, and Massimiliano Di Penta. 2018. Was self-admitted technical debt removal a real removal? an in-depth perspective. In Proceedings of the 15th International Conference on Mining Software Repositories. 526–536.

Digital Library

[76]

Fiorella Zampetti, Alexander Serebrenik, and Massimiliano Di Penta. 2020. Automatically learning patterns for self-admitted technical debt removal. In Proceedings of the IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER’20). IEEE, 355–366.

[77]

Nico Zazworka, Rodrigo O. Spínola, Antonio Vetro’, Forrest Shull, and Carolyn Seaman. 2013. A case study on effectively identifying technical debt. In Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering. 42–47.

Digital Library

[78]

Xin Zhang, Yang Chen, Yongfeng Gu, Weiqin Zou, Xiaoyuan Xie, Xiangyang Jia, and Jifeng Xuan. 2018. How do multiple pull requests change the same code: A study of competing pull requests in GitHub. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME’18). IEEE, 228–239.

[79]

Ye Zhang and Byron Wallace. 2015. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. Retrieved from https://arXiv:1510.03820

[80]

Jiayuan Zhou, Michael Pacheco, Zhiyuan Wan, Xin Xia, David Lo, Yuan Wang, and Ahmed E. Hassan. 2021. Finding a needle in a haystack: Automated mining of silent vulnerability fixes. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). IEEE, 705–716.

Digital Library

[81]

Yu Zhou, Ruihang Gu, Taolue Chen, Zhiqiu Huang, Sebastiano Panichella, and Harald Gall. 2017. Analyzing APIs documentation and code to detect directive defects. In Proceedings of the IEEE/ACM 39th International Conference on Software Engineering (ICSE’17). IEEE, 27–37.

Digital Library

Cited By

Wang HGao ZHu XLo DGrundy JWang X(2024)Just-In-Time TODO-Missed Commits DetectionIEEE Transactions on Software Engineering10.1109/TSE.2024.340500550:11(2732-2752)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3405005

Index Terms

What Makes a Good TODO Comment?
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Documentation

Recommendations

Analyzing the co-evolution of comments and source code

Source code comments are a valuable instrument to preserve design decisions and to communicate the intent of the code to programmers and maintainers. Nevertheless, commenting source code and keeping comments up-to-date is often neglected for reasons of ...
How Good is Your Comment? A Study of Comments in Java Programs
ESEM '11: Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement

Comments are very useful to developers during maintenance tasks and are useful as well to help structuring a code at development time. They convey useful information about the system functionalities as well as the state of mind of a developer. Comments ...
Speech-To-Story: Gerando Histórias de Usuário
SBES '21: Proceedings of the XXXV Brazilian Symposium on Software Engineering

In agile environments, requirement engineers conduct interviews with clients to create user stories manually. However, a manual approach is subject to human failures such as ambiguity, communication problems, and incomplete requirements. These failures ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 33, Issue 6

July 2024

951 pages

EISSN:1557-7392

DOI:10.1145/3613693

Editor:
Mauro Pezzé
USI Universitá della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2024

Online AM: 13 May 2024

Accepted: 03 May 2024

Revised: 20 March 2024

Received: 21 November 2023

Published in TOSEM Volume 33, Issue 6

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Zhejiang Provincial Natural Science Foundation of China
ARC Laureate Fellowship
Zhejiang Province “JianBingLingYan+X” Research and Development Plan
Joint Funds of the Zhejiang Provincial Natural Science Foundation of China
Starry Night Science Fund of Zhejiang University Shanghai Institute for Advanced Study
Shanghai Sailing Program
Zhejiang Provincial Engineering Research Center for Real-time SmartTech in Urban Security Governance

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
290
Total Downloads

Downloads (Last 12 months)290
Downloads (Last 6 weeks)25

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang HGao ZHu XLo DGrundy JWang X(2024)Just-In-Time TODO-Missed Commits DetectionIEEE Transactions on Software Engineering10.1109/TSE.2024.340500550:11(2732-2752)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3405005

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents