Abstract
Code comment generation aims to translate existing source code into natural language explanations. It provides an easy-to-understand description for developers who are unfamiliar with the functionality of source code. Existing approaches to code comment generation focus on summarizing multiple lines of code with a short text, but often cannot effectively explain a single line of code. In this paper, we propose an asynchronous learning model, which learns the code semantics and generates a fine-grained natural language explanation for each line of code. Different from a coarse-grained code comment generation, this fine-grained explanation can help developers better understand the functionality line-by-line. The proposed model adopts a type-aware sketch-based sequence-to-sequence learning method to generate natural language explanations for source code. This method incorporates the type of source code and the mask mechanism with the Long Short Term Memory (LSTM) network via encoding and decoding phases. We empirically compare the proposed model with state-of-the-art approaches on real data sets of source code and description in Python. Experimental results demonstrate that our model can outperform existing approaches on commonly used metrics for neural machine translation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Dong, L., Lapata, M.: Coarse-to-fine decoding for neural semantic parsing. arXiv preprint arXiv:1805.04793 (2018)
Eddy, B.P., Robinson, J.A., Kraft, N.A., Carver, J.C.: Evaluating source code summarization techniques: Replication and expansion. In: Proceedings of the 21st IEEE International Conference on Program Comprehension, pp. 13–22 (2013)
Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393 (2016)
Haiduc, S., Aponte, J., Marcus, A.: Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 2, pp. 223–226 (2010)
Haiduc, S., Aponte, J., Moreno, L., Marcus, A.: On the use of automated text summarization techniques for summarizing source code. In: Proceedings of the 17th Working Conference on Reverse Engineering, pp. 35–44 (2010)
Hill, E., Pollock, L., Vijay-Shanker, K.: Automatically capturing source code context of NL-queries for software maintenance and reuse. In: Proceedings of the 31st International Conference on Software Engineering, pp. 232–242 (2009)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hu, X., Li, G., Xia, X., Lo, D., Jin, Z.: Deep code comment generation. In: Proceedings of the 26th Conference on Program Comprehension, pp. 200–210 (2018)
Hu, X., Li, G., Xia, X., Lo, D., Lu, S., Jin, Z.: Summarizing source code with transferred API knowledge (2018)
Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2073–2083 (2016)
Koehn, P.: Statistical Machine Translation. Cambridge University Press, New York (2009)
Liang, Y., Zhu, K.Q.: Automatic generation of text descriptive comments for code blocks. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pp. 5229–5236 (2018)
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, pp. 74–81 (2004)
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Oda, Y., et al.: Learning to generate pseudo-code from source code using statistical machine translation (t). In: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering, pp. 574–584 (2015)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)
Paszke, A., et al.: Automatic differentiation in PyTorch (2017)
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Wan, Y., et al.: Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 397–407 (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Acknowledgments
This work was supported in part by the National Key R&D Program of China (2018YFB1003901), the NSFC Grants (61976163, 61872315 and 61872273), the Advance Research Projects of Civil Aerospace Technology (“Intelligent Distribution Technology for Domestic Satellite Information”), the Technological Innovation Major Projects of Hubei Province (2017AAA125), the Natural Science Foundation of Hubei Province (ZRMS2020000714), the High-End Industry Development Fund of Beijing City (“Smart Home-Oriented AI Open Innovation Platform”), the Science and Technology Program of Wuhan City (2018010401011288), the Open Funding of CETC Key Laboratory of Aerospace Information Applications, and the Xiaomi-WHU AI Lab.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Deng, Y. et al. (2020). From Code to Natural Language: Type-Aware Sketch-Based Seq2Seq Learning. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12112. Springer, Cham. https://doi.org/10.1007/978-3-030-59410-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-59410-7_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59409-1
Online ISBN: 978-3-030-59410-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)