Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ASE.2015.36acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Learning to generate pseudo-code from source code using statistical machine translation

Published: 09 November 2015 Publication History

Abstract

Pseudo-code written in natural language can aid the comprehension of source code in unfamiliar programming languages. However, the great majority of source code has no corresponding pseudo-code, because pseudo-code is redundant and laborious to create. If pseudo-code could be generated automatically and instantly from given source code, we could allow for on-demand production of pseudo-code without human effort. In this paper, we propose a method to automatically generate pseudo-code from source code, specifically adopting the statistical machine translation (SMT) framework. SMT, which was originally designed to translate between two natural languages, allows us to automatically learn the relationship between source code/pseudo-code pairs, making it possible to create a pseudo-code generator with less human effort. In experiments, we generated English or Japanese pseudo-code from Python statements using SMT, and find that the generated pseudo-code is largely accurate, and aids code understanding.

References

[1]
R. DeLine, G. Venolia, and K. Rowan, "Software development with code maps," Commun. ACM, vol. 53, no. 8, pp. 48--54, 2010.
[2]
M. M. Rahman and C. K. Roy, "Surfclipse: Context-aware meta search in the ide," in Proc. ICSME, 2014, pp. 617--620.
[3]
M.-A. Storey, "Theories, tools and research methods in program comprehension: Past, present and future," Software Quality Journal, vol. 14, no. 3, pp. 187--208, 2006.
[4]
G. Sridhara, E. Hill, D. Muppaneni, L. Pollock, and K. Vijay-Shanker, "Towards automatically generating summary comments for java methods," in Proc. ASE, 2010, pp. 43--52.
[5]
G. Sridhara, L. Pollock, and K. Vijay-Shanker, "Automatically detecting and describing high level actions within methods," in Proc. ICSE, 2011, pp. 101--110.
[6]
R. P. Buse and W. R. Weimer, "Automatic documentation inference for exceptions," in Proc. ISSTA, 2008, pp. 273--282.
[7]
L. Moreno, J. Aponte, G. Sridhara, A. Marcus, L. Pollock, and K. Vijay-Shanker, "Automatic generation of natural language summaries for java classes," in Proc. ICPC, 2013, pp. 23--32.
[8]
E. Wong, J. Yang, and L. Tan, "Autocomment: Mining question and answer sites for automatic comment generation," in Proc. ASE, 2013, pp. 562--567.
[9]
S. Haiduc, J. Aponte, L. Moreno, and A. Marcus, "On the use of automated text summarization techniques for summarizing source code," in Proc. WCRE, 2010, pp. 35--44.
[10]
B. P. Eddy, J. A. Robinson, N. A. Kraft, and J. C. Carver, "Evaluating source code summarization techniques: Replication and expansion," in Proc. ICPC, 2013, pp. 13--22.
[11]
P. Rodeghero, C. McMillan, P. W. McBurney, N. Bosch, and S. D'Mello, "Improving automated source code summarization via an eye-tracking study of programmers," in Proc. ICSE, 2014, pp. 390--401.
[12]
P. Koehn, Statistical Machine Translation. Cambridge University Press, 2010.
[13]
A. Lopez, "Statistical machine translation," ACM Computing Surveys, vol. 40, no. 3, pp. 8:1--8:49, 2008.
[14]
P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer, "The mathematics of statistical machine translation: Parameter estimation," Computational Linguistics, vol. 19, no. 2, pp. 263--311, 1993.
[15]
P. Koehn, F. J. Och, and D. Marcu, "Statistical phrase-based translation," in Proc. NAACL-HLT, 2003, pp. 48--54.
[16]
S. Karaivanov, V. Raychev, and M. Vechev, "Phrase-based statistical translation of programming languages," in Proc. Onward!, 2014, pp. 173--184.
[17]
F. J. Och and H. Ney, "The alignment template approach to statistical machine translation," Computational Linguistics, vol. 30, no. 4, pp. 417--449, 2004.
[18]
L. Huang, K. Knight, and A. Joshi, "Statistical syntax-directed translation with extended domain of locality," in Proc. AMTA, vol. 2006, 2006, pp. 223--226.
[19]
D. Klein and C. D. Manning, "Accurate unlexicalized parsing," in Proc. ACL, 2003, pp. 423--430.
[20]
S. Petrov, L. Barrett, R. Thibaux, and D. Klein, "Learning accurate, compact, and interpretable tree annotation," in Proceedings of COLING-ACL, 2006, pp. 433--440.
[21]
P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer, "The mathematics of statistical machine translation: Parameter estimation," Computational Linguistics, vol. 19, no. 2, pp. 263--311, Jun. 1993.
[22]
G. Neubig, T. Watanabe, E. Sumita, S. Mori, and T. Kawahara, "An unsupervised model for joint phrase alignment and extraction," in Proc. ACL-HLT, Portland, Oregon, USA, 6 2011, pp. 632--641.
[23]
M. Galley, M. Hopkins, K. Knight, and D. Marcu, "What's in a translation rule?" in Proc. NAACL-HLT, 2004, pp. 273--280.
[24]
R. Kneser and H. Ney, "Improved backing-off for m-gram language modeling," in Proc. ICASSP, 1995, pp. 181--184.
[25]
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu, "On the naturalness of software," in Proc. ICSE, 2012, pp. 837--847.
[26]
T. T. Nguyen, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen, "A statistical semantic language model for source code," in Proc. FSE, 2013, pp. 532--542.
[27]
Z. Tu, Z. Su, and P. Devanbu, "On the localness of software," in Proc. FSE, 2014, pp. 269--280.
[28]
A. T. Nguyen and T. N. Nguyen, "Graph-based statistical language model for code," in Proc. ICSE, 2015.
[29]
T. Kudo, K. Yamamoto, and Y. Matsumoto, "Applying conditional random fields to Japanese morphological analysis." in Proc. EMNLP, vol. 4, 2004, pp. 230--237.
[30]
K. Heafield, I. Pouzyrevsky, J. H. Clark, and P. Koehn, "Scalable modified Kneser-Ney language model estimation," in Proc. ACL, Sofia, Bulgaria, August 2013, pp. 690--696.
[31]
P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst, "Moses: Open source toolkit for statistical machine translation," in Proc. ACL, 2007, pp. 177--180.
[32]
G. Neubig, "Travatar: A forest-to-string machine translation engine based on tree transducers," in Proc. ACL, Sofia, Bulgaria, August 2013, pp. 91--96.
[33]
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, "Bleu: A method for automatic evaluation of machine translation," in Proc. ACL, 2002, pp. 311--318.
[34]
I. Goto, K. P. Chow, B. Lu, E. Sumita, and B. K. Tsou, "Overview of the patent machine translation task at the ntcir-10 workshop," in NTCIR-10, 2013.
[35]
O. Bojar, C. Buck, C. Federmann, B. Haddow, P. Koehn, J. Leveling, C. Monz, P. Pecina, M. Post, H. Saint-Amand, R. Soricut, L. Specia, and A. Tamchyna, "Findings of the 2014 workshop on statistical machine translation," in Proc. WMT, 2014, pp. 12--58.
[36]
P. Koehn, "Statistical significance tests for machine translation evaluation," in Proc. EMNLP, 2004, pp. 388--395.

Cited By

View all
  • (2024)Leveraging Statistical Machine Translation for Code SearchProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661233(191-200)Online publication date: 18-Jun-2024
  • (2024)Exploring and Unleashing the Power of Large Language Models in Automated Code TranslationProceedings of the ACM on Software Engineering10.1145/36607781:FSE(1585-1608)Online publication date: 12-Jul-2024
  • (2024)Enhancing Source Code Representations for Deep Learning with Static AnalysisProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644396(64-68)Online publication date: 15-Apr-2024
  • Show More Cited By
  1. Learning to generate pseudo-code from source code using statistical machine translation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASE '15: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering
      November 2015
      935 pages
      ISBN:9781509000241

      Sponsors

      In-Cooperation

      • IEEE CS

      Publisher

      IEEE Press

      Publication History

      Published: 09 November 2015

      Check for updates

      Author Tags

      1. algorithms
      2. education
      3. statistical approach

      Qualifiers

      • Research-article

      Conference

      ASE '15
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 82 of 337 submissions, 24%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 30 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Leveraging Statistical Machine Translation for Code SearchProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661233(191-200)Online publication date: 18-Jun-2024
      • (2024)Exploring and Unleashing the Power of Large Language Models in Automated Code TranslationProceedings of the ACM on Software Engineering10.1145/36607781:FSE(1585-1608)Online publication date: 12-Jul-2024
      • (2024)Enhancing Source Code Representations for Deep Learning with Static AnalysisProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644396(64-68)Online publication date: 15-Apr-2024
      • (2024)How Important Are Good Method Names in Neural Code Generation? A Model Robustness PerspectiveACM Transactions on Software Engineering and Methodology10.1145/363001033:3(1-35)Online publication date: 14-Mar-2024
      • (2024)Fast Deterministic Black-box Context-free Grammar InferenceProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639214(1-12)Online publication date: 20-May-2024
      • (2024)GrammarT5: Grammar-Integrated Pretrained Encoder-Decoder Neural Model for CodeProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639125(1-13)Online publication date: 20-May-2024
      • (2024)Code Search is All You Need? Improving Code Suggestions with Code SearchProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639085(1-13)Online publication date: 20-May-2024
      • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
      • (2023)The Good, the Bad, and the Missing: Neural Code Generation for Machine Learning TasksACM Transactions on Software Engineering and Methodology10.1145/363000933:2(1-24)Online publication date: 22-Dec-2023
      • (2023)COME: Commit Message Generation with Modification EmbeddingProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598096(792-803)Online publication date: 12-Jul-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media