From Code to Natural Language: Type-Aware Sketch-Based Seq2Seq Learning

Deng, Yuhang; Huang, Hao; Chen, Xu; Liu, Zuopeng; Wu, Sai; Xuan, Jifeng; Li, Zongpeng

doi:10.1007/978-3-030-59410-7_25

Yuhang Deng¹⁴,
Hao Huang¹⁴,
Xu Chen¹⁴,
Zuopeng Liu¹⁵,
Sai Wu¹⁶,
Jifeng Xuan¹⁴ &
…
Zongpeng Li¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12112))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3011 Accesses

Abstract

Code comment generation aims to translate existing source code into natural language explanations. It provides an easy-to-understand description for developers who are unfamiliar with the functionality of source code. Existing approaches to code comment generation focus on summarizing multiple lines of code with a short text, but often cannot effectively explain a single line of code. In this paper, we propose an asynchronous learning model, which learns the code semantics and generates a fine-grained natural language explanation for each line of code. Different from a coarse-grained code comment generation, this fine-grained explanation can help developers better understand the functionality line-by-line. The proposed model adopts a type-aware sketch-based sequence-to-sequence learning method to generate natural language explanations for source code. This method incorporates the type of source code and the mask mechanism with the Long Short Term Memory (LSTM) network via encoding and decoding phases. We empirically compare the proposed model with state-of-the-art approaches on real data sets of source code and description in Python. Experimental results demonstrate that our model can outperform existing approaches on commonly used metrics for neural machine translation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Code comment generation based on graph neural network enhanced transformer model for code understanding in open-source software ecosystems

Article 14 June 2022

Deep code comment generation with hybrid lexical and syntactical information

Article 18 June 2019

The Code Generation Method Based on Gated Attention and InterAction-LSTM

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Google Scholar
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Dong, L., Lapata, M.: Coarse-to-fine decoding for neural semantic parsing. arXiv preprint arXiv:1805.04793 (2018)
Eddy, B.P., Robinson, J.A., Kraft, N.A., Carver, J.C.: Evaluating source code summarization techniques: Replication and expansion. In: Proceedings of the 21st IEEE International Conference on Program Comprehension, pp. 13–22 (2013)
Google Scholar
Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393 (2016)
Haiduc, S., Aponte, J., Marcus, A.: Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 2, pp. 223–226 (2010)
Google Scholar
Haiduc, S., Aponte, J., Moreno, L., Marcus, A.: On the use of automated text summarization techniques for summarizing source code. In: Proceedings of the 17th Working Conference on Reverse Engineering, pp. 35–44 (2010)
Google Scholar
Hill, E., Pollock, L., Vijay-Shanker, K.: Automatically capturing source code context of NL-queries for software maintenance and reuse. In: Proceedings of the 31st International Conference on Software Engineering, pp. 232–242 (2009)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hu, X., Li, G., Xia, X., Lo, D., Jin, Z.: Deep code comment generation. In: Proceedings of the 26th Conference on Program Comprehension, pp. 200–210 (2018)
Google Scholar
Hu, X., Li, G., Xia, X., Lo, D., Lu, S., Jin, Z.: Summarizing source code with transferred API knowledge (2018)
Google Scholar
Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2073–2083 (2016)
Google Scholar
Koehn, P.: Statistical Machine Translation. Cambridge University Press, New York (2009)
Book Google Scholar
Liang, Y., Zhu, K.Q.: Automatic generation of text descriptive comments for code blocks. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pp. 5229–5236 (2018)
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Oda, Y., et al.: Learning to generate pseudo-code from source code using statistical machine translation (t). In: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering, pp. 574–584 (2015)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Paszke, A., et al.: Automatic differentiation in PyTorch (2017)
Google Scholar
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Wan, Y., et al.: Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 397–407 (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Download references

Acknowledgments

This work was supported in part by the National Key R&D Program of China (2018YFB1003901), the NSFC Grants (61976163, 61872315 and 61872273), the Advance Research Projects of Civil Aerospace Technology (“Intelligent Distribution Technology for Domestic Satellite Information”), the Technological Innovation Major Projects of Hubei Province (2017AAA125), the Natural Science Foundation of Hubei Province (ZRMS2020000714), the High-End Industry Development Fund of Beijing City (“Smart Home-Oriented AI Open Innovation Platform”), the Science and Technology Program of Wuhan City (2018010401011288), the Open Funding of CETC Key Laboratory of Aerospace Information Applications, and the Xiaomi-WHU AI Lab.

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, China
Yuhang Deng, Hao Huang, Xu Chen, Jifeng Xuan & Zongpeng Li
Xiaomi Technology Co., Ltd., Beijing, China
Zuopeng Liu
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Sai Wu

Authors

Yuhang Deng
View author publications
You can also search for this author in PubMed Google Scholar
Hao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zuopeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Sai Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jifeng Xuan
View author publications
You can also search for this author in PubMed Google Scholar
Zongpeng Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu Chen .

Editor information

Editors and Affiliations

Dankook University, Yongin, Korea (Republic of)
Yunmook Nah
Peking University, Haidian, China
Bin Cui
Sungkyunkwan University, Suwon, Korea (Republic of)
Sang-Won Lee
Department of System Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, Hong Kong
Jeffrey Xu Yu
Kangwon National University, Chunchon, Korea (Republic of)
Yang-Sae Moon
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Steven Euijong Whang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deng, Y. et al. (2020). From Code to Natural Language: Type-Aware Sketch-Based Seq2Seq Learning. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12112. Springer, Cham. https://doi.org/10.1007/978-3-030-59410-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-59410-7_25
Published: 18 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59409-1
Online ISBN: 978-3-030-59410-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

From Code to Natural Language: Type-Aware Sketch-Based Seq2Seq Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Code comment generation based on graph neural network enhanced transformer model for code understanding in open-source software ecosystems

Deep code comment generation with hybrid lexical and syntactical information

The Code Generation Method Based on Gated Attention and InterAction-LSTM

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

From Code to Natural Language: Type-Aware Sketch-Based Seq2Seq Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Code comment generation based on graph neural network enhanced transformer model for code understanding in open-source software ecosystems

Deep code comment generation with hybrid lexical and syntactical information

The Code Generation Method Based on Gated Attention and InterAction-LSTM

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation