research-article

Code Structure–Guided Transformer for Source Code Summarization

Authors:

Michael LyuAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 32, Issue 1

Article No.: 23, Pages 1 - 32

https://doi.org/10.1145/3522674

Published: 13 February 2023 Publication History

Abstract

Code summaries help developers comprehend programs and reduce their time to infer the program functionalities during software maintenance. Recent efforts resort to deep learning techniques such as sequence-to-sequence models for generating accurate code summaries, among which Transformer-based approaches have achieved promising performance. However, effectively integrating the code structure information into the Transformer is under-explored in this task domain. In this article, we propose a novel approach named SG-Trans to incorporate code structural properties into Transformer. Specifically, we inject the local symbolic information (e.g., code tokens and statements) and global syntactic structure (e.g., dataflow graph) into the self-attention module of Transformer as inductive bias. To further capture the hierarchical characteristics of code, the local information and global structure are designed to distribute in the attention heads of lower layers and high layers of Transformer. Extensive evaluation shows the superior performance of SG-Trans over the state-of-the-art approaches. Compared with the best-performing baseline, SG-Trans still improves 1.4% and 2.0% on two benchmark datasets, respectively, in terms of METEOR score, a metric widely used for measuring generation quality.

References

[1]

Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A transformer-based approach for source code summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Online, July 5–10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 4998–5007.

[2]

Miltiadis Allamanis. 2019. The adverse effects of code duplication in machine learning models of code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! 2019, Athens, Greece, October 23–24, 2019, Hidehiko Masuhara and Tomas Petricek (Eds.). ACM, 143–153.

Digital Library

[3]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to represent programs with graphs. In 6th International Conference on Learning Representations (ICLR’18), Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.

[4]

Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019. code2seq: Generating sequences from structured representations of code. In 7th International Conference on Learning Representations (ICLR’19), New Orleans, LA, May 6–9, 2019. OpenReview.net.

[5]

Bang An, Jie Lyu, Zhenyi Wang, Chunyuan Li, Changwei Hu, Fei Tan, Ruiyi Zhang, Yifan Hu, and Changyou Chen. 2020. Repulsive attention: Rethinking multi-head attention as Bayesian inference. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20), Online, November 16–20, 2020, Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 236–255.

[6]

Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. CoRR abs/1607.06450 (2016).

[7]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization@ACL 2005, Ann Arbor, Mi, June 29, 2005, Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare R. Voss (Eds.). Association for Computational Linguistics, 65–72.

[8]

Antonio Valerio Miceli Barone and Rico Sennrich. 2017. A parallel corpus of Python functions and documentation strings for automated code documentation and code generation. arXiv preprint arXiv:1707.02275 (2017).

[9]

Jie-Cherng Chen and Sun-Jen Huang. 2009. An empirical analysis of the impact of software development problem factors on software maintainability. J. Syst. Softw. 82, 6 (2009), 981–992.

Digital Library

[10]

Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). ACL, 1724–1734.

[11]

YunSeok Choi, JinYeong Bak, CheolWon Na, and Jee-Hyong Lee. 2021. Learning sequential and structural information for source code summarization. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1–6, 2021 (Findings of ACL’21), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.), Vol. ACL/IJCNLP 2021. Association for Computational Linguistics, 2842–2851.

[12]

Milan Cvitkovic, Badal Singh, and Animashree Anandkumar. 2019. Open vocabulary learning on source code with a graph-structured cache. In Proceedings of the 36th International Conference on Machine Learning (ICML’19), 9–15 June 2019, Long Beach, CA (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, 1475–1485.

[13]

Sergio Cozzetti B. de Souza, Nicolas Anquetil, and Káthia Marçal de Oliveira. 2005. A study of the documentation essential to software maintenance. In Proceedings of the 23rd Annual International Conference on Design of Communication: Documenting & Designing for Pervasive Information (SIGDOC’05), Coventry, UK, September 21–23, 2005, Scott R. Tilley and Robert M. Newman (Eds.). ACM, 68–75.

Digital Library

[14]

Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-sequence attentional neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), August 7–12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics, 823–833.

[15]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings (EMNLP’20), Online Event, 16–20 November 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547.

[16]

Patrick Fernandes, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Structured neural summarization. In 7th International Conference on Learning Representations (ICLR’19), New Orleans, LA, May 6–9, 2019. OpenReview.net.

[17]

Golara Garousi, Vahid Garousi-Yusifoglu, Günther Ruhe, Junji Zhi, Mahmood Moussavi, and Brian Smith. 2015. Usage and usefulness of technical software documentation: An industrial case study. Inf. Softw. Technol. 57 (2015), 664–682.

[18]

Zi Gong, Cuiyun Gao, Yasheng Wang, Wenchao Gu, Yun Peng, and Zenglin Xu. 2022. Source code summarization with structural relative position guided transformer. CoRR abs/2202.06521 (2022).

[19]

Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O. K. Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), August 7–12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics, 1631–1640.

[20]

Wenchao Gu, Zongjie Li, Cuiyun Gao, Chaozheng Wang, Hongyu Zhang, Zenglin Xu, and Michael R. Lyu. 2021. CRaDLe: Deep code retrieval based on semantic dependency learning. Neural Networks 141 (2021), 385–394.

Digital Library

[21]

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin B. Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2020. GraphCodeBERT: Pre-training code representations with data flow. CoRR abs/2009.08366 (2020).

[22]

Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, and Zheng Zhang. 2019. Star-transformer. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, June 2–7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 1315–1325.

[23]

Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, and Zheng Zhang. 2020. Multi-scale self-attention for text classification. In 34th AAAI Conference on Artificial Intelligence (AAAI’20), 32nd Innovative Applications of Artificial Intelligence Conference (IAAI’20), 10th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’20), New York, NY, February 7–12, 2020. AAAI Press, 7847–7854.

[24]

Sonia Haiduc, Jairo Aponte, and Andrian Marcus. 2010. Supporting program comprehension with source code summarization. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2 (ICSE’10), Cape Town, South Africa, 1–8 May 2010, Jeff Kramer, Judith Bishop, Premkumar T. Devanbu, and Sebastián Uchitel (Eds.). ACM, 223–226.

Digital Library

[25]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16), Las Vegas, NV, June 27–30, 2016. IEEE Computer Society, 770–778.

[26]

Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2020. Global relational models of source code. In 8th International Conference on Learning Representations (ICLR’20), Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net.

[27]

Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension (ICPC’18), Gothenburg, Sweden, May 27–28, 2018, Foutse Khomh, Chanchal K. Roy, and Janet Siegmund (Eds.). ACM, 200–210.

Digital Library

[28]

Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing source code with transferred API knowledge. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18), July 13–19, 2018, Stockholm, Sweden, Jérôme Lang (Ed.). ijcai.org, 2269–2275.

[29]

Xing Hu, Xin Xia, David Lo, Zhiyuan Wan, Qiuyuan Chen, and Tom Zimmermann. 2022. Practitioners’ expectations on automated code comment generation. In Proceedings of the 44th ACM/IEEE International Conference on Software Engineering(ICSE’22).

Digital Library

[30]

Chidubem Iddianozie and Gavin McArdle. 2020. Improved graph neural networks for spatial networks using structure-aware sampling. ISPRS Int. J. Geo Inf. 9, 11 (2020), 674.

[31]

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), August 7–12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics.

[32]

Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph E. Gonzalez, and Ion Stoica. 2020. Contrastive code representation learning. CoRR abs/2007.04973 (2020).

[33]

Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17), Urbana, IL, October 30 - November 03, 2017, Grigore Rosu, Massimiliano Di Penta, and Tien N. Nguyen (Eds.). IEEE Computer Society, 135–146.

[34]

Rafael-Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, and Andrea Janes. 2020. Big code != big vocabulary: Open-vocabulary models for source code. In 42nd International Conference on Software Engineering(ICSE’20), Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 1073–1085.

Digital Library

[35]

Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved code summarization via a graph neural network. In 28th International Conference on Program Comprehension (ICPC ’20), Seoul, Republic of Korea, July 13–15, 2020. ACM, 184–195.

Digital Library

[36]

Yuding Liang and Kenny Qili Zhu. 2018. Automatic generation of text descriptive comments for code blocks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, February 2–7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 5229–5236.

[37]

Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81.

[38]

Chia-Wei Liu, Ryan Lowe, Iulian Serban, Michael Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16), Austin, TX, November 1–4, 2016, Jian Su, Xavier Carreras, and Kevin Duh (Eds.). The Association for Computational Linguistics, 2122–2132.

[39]

Shangqing Liu, Yu Chen, Xiaofei Xie, Jing Kai Siow, and Yang Liu. 2021. Retrieval-augmented generation for code summarization via hybrid GNN. In 9th International Conference on Learning Representations (ICLR’21), Virtual Event, Austria, May 3–7, 2021. OpenReview.net.

[40]

Zhongxin Liu, Xin Xia, Christoph Treude, David Lo, and Shanping Li. 2019. Automatic generation of pull request descriptions. In 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19), San Diego, CA, November 11–15, 2019. IEEE, 176–188.

Digital Library

[41]

Paul W. McBurney and Collin McMillan. 2016. Automatic source code summarization of context for Java methods. IEEE Trans. Software Eng. 42, 2 (2016), 103–119.

Digital Library

[42]

Roberto Minelli, Andrea Mocci, and Michele Lanza. 2015. I know what you did last summer: An investigation of how developers spend their time. In Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension (ICPC’15), Florence, Italy, May 16–24, 2015, Andrea De Lucia, Christian Bird, and Rocco Oliveto (Eds.). IEEE Computer Society, 25–35.

Digital Library

[43]

Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori L. Pollock, and K. Vijay-Shanker. 2013. Automatic generation of natural language summaries for Java classes. In IEEE 21st International Conference on Program Comprehension (ICPC’13), San Francisco, CA, 20–21 May, 2013. IEEE Computer Society, 23–32.

[44]

Dana Movshovitz-Attias and William W. Cohen. 2013. Natural language models for predicting programming comments. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13), 4–9 August 2013, Sofia, Bulgaria, Volume 2: Short Papers. The Association for Computer Linguistics, 35–40.

[45]

Lun Yiu Nie, Cuiyun Gao, Zhicong Zhong, Wai Lam, Yang Liu, and Zenglin Xu. 2020. Contextualized code representation learning for commit message generation. CoRR abs/2007.06934 (2020).

[46]

Kyosuke Nishida, Itsumi Saito, Kosuke Nishida, Kazutoshi Shinoda, Atsushi Otsuka, Hisako Asano, and Junji Tomita. 2019. Multi-style generative reading comprehension. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL’19), Florence, Italy, July 28-August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 2273–2284.

[47]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6–12, 2002, Philadelphia, PA. ACL, 311–318.

[48]

Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17), Vancouver, Canada, July 30-August 4, Volume 1: Long Papers, Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, 1073–1083.

[49]

Lin Shi, Hao Zhong, Tao Xie, and Mingshu Li. 2011. An empirical study on evolution of API documentation. In Fundamental Approaches to Software Engineering - 14th International Conference (FASE’11), Held as Part of the Joint European Conferences on Theory and Practice of Software (ETAPS’11), Saarbrücken, Germany, March 26-April 3, 2011. Proceedings (Lecture Notes in Computer Science), Dimitra Giannakopoulou and Fernando Orejas (Eds.), Vol. 6603. Springer, 416–431.

[50]

Yusuke Shido, Yasuaki Kobayashi, Akihiro Yamamoto, Atsushi Miyamoto, and Tadayuki Matsumura. 2019. Automatic source code summarization with extended tree-LSTM. In International Joint Conference on Neural Networks (IJCNN’19), Budapest, Hungary, July 14–19, 2019. IEEE, 1–8.

[51]

Kai Song, Kun Wang, Heng Yu, Yue Zhang, Zhongqiang Huang, Weihua Luo, Xiangyu Duan, and Min Zhang. 2020. Alignment-enhanced transformer for constraining NMT with pre-specified translations. In 34th AAAI Conference on Artificial Intelligence (AAAI’20), 32nd Innovative Applications of Artificial Intelligence Conference (IAAI’20), 10th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’20), New York, NY, February 7–12, 2020. AAAI Press, 8886–8893.

[52]

Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori L. Pollock, and K. Vijay-Shanker. 2010. Towards automatically generating summary comments for Java methods. In 25th IEEE/ACM International Conference on Automated Software Engineering (ASE’10), Antwerp, Belgium, September 20–24, 2010, Charles Pecheur, Jamie Andrews, and Elisabetta Di Nitto (Eds.). ACM, 43–52.

Digital Library

[53]

Sean Stapleton, Yashmeet Gambhir, Alexander LeClair, Zachary Eberhart, Westley Weimer, Kevin Leach, and Yu Huang. 2020. A human study of comprehension and code summarization. In 28th International Conference on Program Comprehension(ICPC ’20), Seoul, Republic of Korea, July 13–15, 2020. ACM, 2–13.

Digital Library

[54]

Zoltán Gendler Szabó. 2020. Compositionality. In The Stanford Encyclopedia of Philosophy, Edward N. Zalta (Ed.), Metaphysics Research Lab, Stanford University.

[55]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998–6008.

[56]

Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. 2019. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL’19), Florence, Italy, July 28 - August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 5797–5808.

[57]

Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and Philip S. Yu. 2018. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE’18), Montpellier, France, September 3–7, 2018, Marianne Huchard, Christian Kästner, and Gordon Fraser (Eds.). ACM, 397–407.

Digital Library

[58]

Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. 2020. RAT-SQL: Relation-aware schema encoding and linking for text-to-SQL parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20), Online, July 5–10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 7567–7578.

[59]

Wenhan Wang, Ge Li, Bo Ma, Xin Xia, and Zhi Jin. 2020. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2020, London, ON, Canada, February 18-21, 2020. IEEE, 261–271.

[60]

Bolin Wei, Ge Li, Xin Xia, Zhiyi Fu, and Zhi Jin. 2019. Code generation as a dual task of code summarization. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 (NeurIPS’19), December 8–14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 6559–6569.

[61]

Bolin Wei, Yongmin Li, Ge Li, Xin Xia, and Zhi Jin. 2020. Retrieve and refine: Exemplar-based neural comment generation. In 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’20), Melbourne, Australia, September 21–25, 2020. IEEE, 349–360. DOI:

Digital Library

[62]

Edmund Wong, Taiyue Liu, and Lin Tan. 2015. CloCom: Mining existing source code for automatic comment generation. In 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER’15), Montreal, QC, Canada, March 2–6, 2015, Yann-Gaël Guéhéneuc, Bram Adams, and Alexander Serebrenik (Eds.). IEEE Computer Society, 380–389.

[63]

Hongqiu Wu, Hai Zhao, and Min Zhang. 2021. Code summarization with structure-induced transformer. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1–6, 2021 (Findings of ACL’21), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.), Vol. ACL/IJCNLP 2021. Association for Computational Linguistics, 1078–1090.

[64]

Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E. Hassan, and Shanping Li. 2018. Measuring program comprehension: A large-scale field study with professionals. IEEE Trans. Software Eng. 44, 10 (2018), 951–976.

Digital Library

[65]

Rui Xie, Wei Ye, Jinan Sun, and Shikun Zhang. 2021. Exploiting method names to improve code summarization: A deliberation multi-task learning approach. In 29th IEEE/ACM International Conference on Program Comprehension (ICPC’21), Madrid, Spain, May 20–21, 2021. IEEE, 138–148.

[66]

Jingyi Zhang, Masao Utiyama, Eiichiro Sumita, Graham Neubig, and Satoshi Nakamura. 2018. Guiding neural machine translation with retrieved translation pieces. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18), New Orleans, LA, June 1–6, 2018, Volume 1 (Long Papers). 1325–1335.

[67]

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-based neural source code summarization. In 42nd International Conference on Software Engineering (ICSE’20), Seoul, South Korea, 27 June–19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 1385–1397.

Digital Library

[68]

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A novel neural source code representation based on abstract syntax tree. In Proceedings of the 41st International Conference on Software Engineering (ICSE’19), Montreal, QC, Canada, May 25-31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 783–794.

Digital Library

[69]

Xiangyu Zhao, Longbiao Wang, Ruifang He, Ting Yang, Jinxin Chang, and Ruifang Wang. 2020. Multiple knowledge syncretic transformer for natural dialogue generation. In The Web Conference 2020 (WWW’20), Taipei, Taiwan, April 20–24, 2020, Yennun Huang, Irwin King, Tie-Yan Liu, and Maarten van Steen (Eds.). ACM / IW3C2, 752–762.

Digital Library

[70]

Daniel Zügner, Tobias Kirschstein, Michele Catasta, Jure Leskovec, and Stephan Günnemann. 2021. Language-agnostic representation learning of source code from structure and context. In 9th International Conference on Learning Representations (ICLR’21) Virtual Event, Austria, May 3-7, 2021. OpenReview.net.

Cited By

Zou WLi QLi CGe JChen XHuang LLuo B(2025)Improving Source Code Pre-Training via Type-Specific MaskingACM Transactions on Software Engineering and Methodology10.1145/369959934:3(1-34)Online publication date: 20-Feb-2025
https://doi.org/10.1145/3699599
Li YQi SGao CPeng YLo DLyu MXu Z(2025)Understanding the Robustness of Transformer-Based Code Intelligence via Code Transformation: Challenges and OpportunitiesIEEE Transactions on Software Engineering10.1109/TSE.2024.352446151:2(521-547)Online publication date: Feb-2025
https://doi.org/10.1109/TSE.2024.3524461
Huang YHuang JChen XZheng Z(2025)Towards Improving the Performance of Comment Generation Models by Using Bytecode InformationIEEE Transactions on Software Engineering10.1109/TSE.2024.352371351:2(503-520)Online publication date: Feb-2025
https://doi.org/10.1109/TSE.2024.3523713
Show More Cited By

Index Terms

Code Structure–Guided Transformer for Source Code Summarization
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques

Recommendations

StructCoder: Structure-Aware Transformer for Code Generation
There has been a recent surge of interest in automating software engineering tasks using deep learning. This article addresses the problem of code generation, in which the goal is to generate target code given source code in a different language or a ...
An data augmentation method for source code summarization
Abstract
Code comments improve the readability and intelligibility of codes, Unfortunately, code comments are often missing, or outdated in software projects, which negatively affects the efficiency of developers to infer the functionality from source ...
Autofolding for Source Code Summarization

Developers spend much of their time reading and browsing source code, raising new opportunities for summarization methods. Indeed, modern code editors provide code folding, which allows one to selectively hide blocks of code. However this is impractical ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 32, Issue 1

January 2023

954 pages

ISSN:1049-331X

EISSN:1557-7392

DOI:10.1145/3572890

Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 February 2023

Online AM: 15 July 2022

Accepted: 24 February 2022

Revised: 19 February 2022

Received: 06 June 2021

Published in TOSEM Volume 32, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China
Stable Support Plan for Colleges and Universities in Shenzhen
Research Grants Council of the Hong Kong Special Administrative Region, China
UK Engineering and Physical Sciences Research Council
UK Research and Innovation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

43
Total Citations
View Citations
2,475
Total Downloads

Downloads (Last 12 months)886
Downloads (Last 6 weeks)55

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zou WLi QLi CGe JChen XHuang LLuo B(2025)Improving Source Code Pre-Training via Type-Specific MaskingACM Transactions on Software Engineering and Methodology10.1145/369959934:3(1-34)Online publication date: 20-Feb-2025
https://doi.org/10.1145/3699599
Li YQi SGao CPeng YLo DLyu MXu Z(2025)Understanding the Robustness of Transformer-Based Code Intelligence via Code Transformation: Challenges and OpportunitiesIEEE Transactions on Software Engineering10.1109/TSE.2024.352446151:2(521-547)Online publication date: Feb-2025
https://doi.org/10.1109/TSE.2024.3524461
Huang YHuang JChen XZheng Z(2025)Towards Improving the Performance of Comment Generation Models by Using Bytecode InformationIEEE Transactions on Software Engineering10.1109/TSE.2024.352371351:2(503-520)Online publication date: Feb-2025
https://doi.org/10.1109/TSE.2024.3523713
Vu TBui TDo TNguyen TVo HNguyen S(2025)Automated description generation for software patchesInformation and Software Technology10.1016/j.infsof.2024.107543177(107543)Online publication date: Jan-2025
https://doi.org/10.1016/j.infsof.2024.107543
Gao ZSu YHu XXia X(2024)Automating TODO-missed Methods Detection and PatchingACM Transactions on Software Engineering and Methodology10.1145/370079334:1(1-28)Online publication date: 19-Dec-2024
https://doi.org/10.1145/3700793
Wang CGao SGao CWang WChong CGao SLyu MFilkov VRay BZhou M(2024)A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and HowProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695004(281-293)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695004
Wang HGao ZBi TGrundy JWang XWu MYang X(2024)What Makes a Good TODO Comment?ACM Transactions on Software Engineering and Methodology10.1145/366481133:6(1-30)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3664811
Gao ZXue ZHu XShang WXia Xd'Amorim M(2024)Easy over Hard: A Simple Baseline for Test Failures Causes PredictionCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663850(306-317)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663850
Mai YGao ZHu XBao LLiu YSun J(2024)Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context LearningProceedings of the ACM on Software Engineering10.1145/36608111:FSE(2355-2377)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660811
Zhang YLi JKaras ZBansal ALi TMcMillan CLeach KHuang Y(2024)EyeTrans: Merging Human and Machine Attention for Neural Code SummarizationProceedings of the ACM on Software Engineering10.1145/36437321:FSE(115-136)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643732
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents