Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

From Code to Natural Language: Type-Aware Sketch-Based Seq2Seq Learning

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12112))

Included in the following conference series:

  • 3011 Accesses

Abstract

Code comment generation aims to translate existing source code into natural language explanations. It provides an easy-to-understand description for developers who are unfamiliar with the functionality of source code. Existing approaches to code comment generation focus on summarizing multiple lines of code with a short text, but often cannot effectively explain a single line of code. In this paper, we propose an asynchronous learning model, which learns the code semantics and generates a fine-grained natural language explanation for each line of code. Different from a coarse-grained code comment generation, this fine-grained explanation can help developers better understand the functionality line-by-line. The proposed model adopts a type-aware sketch-based sequence-to-sequence learning method to generate natural language explanations for source code. This method incorporates the type of source code and the mask mechanism with the Long Short Term Memory (LSTM) network via encoding and decoding phases. We empirically compare the proposed model with state-of-the-art approaches on real data sets of source code and description in Python. Experimental results demonstrate that our model can outperform existing approaches on commonly used metrics for neural machine translation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  2. Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)

    Google Scholar 

  3. Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)

  4. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  5. Dong, L., Lapata, M.: Coarse-to-fine decoding for neural semantic parsing. arXiv preprint arXiv:1805.04793 (2018)

  6. Eddy, B.P., Robinson, J.A., Kraft, N.A., Carver, J.C.: Evaluating source code summarization techniques: Replication and expansion. In: Proceedings of the 21st IEEE International Conference on Program Comprehension, pp. 13–22 (2013)

    Google Scholar 

  7. Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393 (2016)

  8. Haiduc, S., Aponte, J., Marcus, A.: Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 2, pp. 223–226 (2010)

    Google Scholar 

  9. Haiduc, S., Aponte, J., Moreno, L., Marcus, A.: On the use of automated text summarization techniques for summarizing source code. In: Proceedings of the 17th Working Conference on Reverse Engineering, pp. 35–44 (2010)

    Google Scholar 

  10. Hill, E., Pollock, L., Vijay-Shanker, K.: Automatically capturing source code context of NL-queries for software maintenance and reuse. In: Proceedings of the 31st International Conference on Software Engineering, pp. 232–242 (2009)

    Google Scholar 

  11. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  12. Hu, X., Li, G., Xia, X., Lo, D., Jin, Z.: Deep code comment generation. In: Proceedings of the 26th Conference on Program Comprehension, pp. 200–210 (2018)

    Google Scholar 

  13. Hu, X., Li, G., Xia, X., Lo, D., Lu, S., Jin, Z.: Summarizing source code with transferred API knowledge (2018)

    Google Scholar 

  14. Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2073–2083 (2016)

    Google Scholar 

  15. Koehn, P.: Statistical Machine Translation. Cambridge University Press, New York (2009)

    Book  Google Scholar 

  16. Liang, Y., Zhu, K.Q.: Automatic generation of text descriptive comments for code blocks. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pp. 5229–5236 (2018)

    Google Scholar 

  17. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, pp. 74–81 (2004)

    Google Scholar 

  18. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)

  19. Oda, Y., et al.: Learning to generate pseudo-code from source code using statistical machine translation (t). In: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering, pp. 574–584 (2015)

    Google Scholar 

  20. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  21. Paszke, A., et al.: Automatic differentiation in PyTorch (2017)

    Google Scholar 

  22. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)

    Google Scholar 

  23. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  24. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

    Google Scholar 

  25. Wan, Y., et al.: Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 397–407 (2018)

    Google Scholar 

  26. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Download references

Acknowledgments

This work was supported in part by the National Key R&D Program of China (2018YFB1003901), the NSFC Grants (61976163, 61872315 and 61872273), the Advance Research Projects of Civil Aerospace Technology (“Intelligent Distribution Technology for Domestic Satellite Information”), the Technological Innovation Major Projects of Hubei Province (2017AAA125), the Natural Science Foundation of Hubei Province (ZRMS2020000714), the High-End Industry Development Fund of Beijing City (“Smart Home-Oriented AI Open Innovation Platform”), the Science and Technology Program of Wuhan City (2018010401011288), the Open Funding of CETC Key Laboratory of Aerospace Information Applications, and the Xiaomi-WHU AI Lab.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Deng, Y. et al. (2020). From Code to Natural Language: Type-Aware Sketch-Based Seq2Seq Learning. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12112. Springer, Cham. https://doi.org/10.1007/978-3-030-59410-7_25

Download citation

Publish with us

Policies and ethics