Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2950290.2950334acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Deep API learning

Published: 01 November 2016 Publication History

Abstract

Developers often wonder how to implement a certain functionality (e.g., how to parse XML files) using APIs. Obtaining an API usage sequence based on an API-related natural language query is very helpful in this regard. Given a query, existing approaches utilize information retrieval models to search for matching API sequences. These approaches treat queries and APIs as bags-of-words and lack a deep understanding of the semantics of the query. We propose DeepAPI, a deep learning based approach to generate API usage sequences for a given natural language query. Instead of a bag-of-words assumption, it learns the sequence of words in a query and the sequence of associated APIs. DeepAPI adapts a neural language model named RNN Encoder-Decoder. It encodes a word sequence (user query) into a fixed-length context vector, and generates an API sequence based on the context vector. We also augment the RNN Encoder-Decoder by considering the importance of individual APIs. We empirically evaluate our approach with more than 7 million annotated code snippets collected from GitHub. The results show that our approach generates largely accurate API sequences and outperforms the related approaches.

References

[1]
Eclipse JDT. http://www.eclipse.org/jdt/.
[2]
Github. https://github.com.
[3]
Github search. https://github.com/search?type=code.
[4]
Lucene. https://lucene.apache.org/.
[5]
M. Allamanis, E. T. Barr, C. Bird, and C. Sutton. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14), pages 281–293. ACM, 2014.
[6]
M. Allamanis, H. Peng, and C. Sutton. A convolutional attention network for extreme summarization of source code. In Proceedings of the International Conference on Machine Learning (ICML’16), 2016.
[7]
M. Allamanis, D. Tarlow, A. Gordon, and Y. Wei. Bimodal modelling of source code and natural language. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15), pages 2123–2132, 2015.
[8]
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
[9]
L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT’10), pages 177–186. Springer, 2010.
[10]
J. Brandt, M. Dontcheva, M. Weskamp, and S. R. Klemmer. Example-centric programming: integrating web search into the development environment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI’10, pages 513–522. ACM, 2010.
[11]
P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai. Class-based n-gram models of natural language. Computational linguistics, 18(4):467–479, 1992.
[12]
P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 19(2):263–311, 1993.
[13]
W.-K. Chan, H. Cheng, and D. Lo. Searching connected API subgraph via text phrases. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE’12, pages 10:1–10:11. ACM, 2012.
[14]
K. Cho, B. Van Merri¨ enboer, ¸ C. Gül¸ cehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN Encoder–Decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), pages 1724–1734, Doha, Qatar, Oct. 2014. Association for Computational Linguistics.
[15]
J. Fowkes and C. Sutton. Parameter-free probabilistic API mining at github scale. In Proceedings of the ACM SIGSOFT 24th International Symposium on the Foundations of Software Engineering (FSE’16). ACM, 2016.
[16]
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In Proceedings of the 34th International Conference on Software Engineering (ICSE’12), pages 837–847. IEEE, 2012.
[17]
R. Holmes, R. Cottrell, R. J. Walker, and J. Denzinger. The end-to-end use of source code examples: An exploratory study. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM’09), pages 555–558. IEEE, 2009.
[18]
M. J. Howard, S. Gupta, L. Pollock, and K. Vijay-Shanker. Automatically mining software-based, semantically-similar words from comment-code mappings. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR’13), pages 377–386. IEEE Press, 2013.
[19]
I. Keivanloo, J. Rilling, and Y. Zou. Spotting working code examples. In Proceedings of the 36th International Conference on Software Engineering (ICSE’14), pages 664–675. ACM, 2014.
[20]
J. Kim, S. Lee, S. Hwang, and S. Kim. Towards an intelligent code search engine. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI’10), pages 1358–1363, 2010.
[21]
P. Koehn. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In Machine translation: From real users to research, pages 115–124. Springer, 2004.
[22]
M. Li, T. Zhang, Y. Chen, and A. J. Smola. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’14), pages 661–670. ACM, 2014.
[23]
W. Ling, E. Grefenstette, K. M. Hermann, T. Kocisky, A. Senior, F. Wang, and P. Blunsom. Latent predictor networks for code generation. arXiv preprint arXiv:1603.06744, 2016.
[24]
E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18:300–336, 2009.
[25]
F. Lv, H. Zhang, J. Lou, S. Wang, D. Zhang, and J. Zhao. CodeHow: Effective code search based on API understanding and extended boolean model. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE’15), pages 260–270. IEEE, 2015.
[26]
C. Maddison and D. Tarlow. Structured generative models of natural source code. In Proceedings of the 31st International Conference on Machine Learning (ICML’14), pages 649–657, 2014.
[27]
C. McMillan, M. Grechanik, and D. Poshyvanyk. Detecting similar software applications. In Proceedings of the 34th International Conference on Software Engineering (ICSE’12), pages 364–374. IEEE, 2012.
[28]
C. McMillan, M. Grechanik, D. Poshyvanyk, Q. Xie, and C. Fu. Portfolio: finding relevant functions and their usage. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11), pages 111–120. IEEE, 2011.
[29]
T. Mikolov, M. Karafiát, L. Burget, J. Cernock` y, and S. Khudanpur. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH’10), pages 1045–1048, 2010.
[30]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (NIPS’13), pages 3111–3119, 2013.
[31]
E. Moritz, M. Linares-Vásquez, D. Poshyvanyk, M. Grechanik, C. McMillan, and M. Gethers. Export: Detecting and visualizing API usages in large source code repositories. In Proceedings of the IEEE/ACM 28th International Conference on Automated Software Engineering (ASE’13), pages 646–651. IEEE, 2013.
[32]
L. Mou, G. Li, Y. Liu, H. Peng, Z. Jin, Y. Xu, and L. Zhang. Building program vector representations for deep learning. arXiv preprint arXiv:1409.3358, 2014.
[33]
L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI’16), 2016.
[34]
L. Mou, R. Men, G. Li, L. Zhang, and Z. Jin. On end-to-end program generation from user intention by deep neural networks. arXiv, 2015.
[35]
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (ACL’02), pages 311–318. Association for Computational Linguistics, 2002.
[36]
M. Raghothaman, Y. Wei, and Y. Hamadi. SWIM: synthesizing what I mean: code search and idiomatic snippet synthesis. In Proceedings of the 38th International Conference on Software Engineering (ICSE’16), pages 357–367. ACM, 2016.
[37]
V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In ACM SIGPLAN Notices, volume 49, pages 419–428. ACM, 2014.
[38]
M. P. Robillard. What makes APIs hard to learn? answers from developers. IEEE Software, 26(6):27–34, 2009.
[39]
M. P. Robillard and R. DeLine. A field study of API learning obstacles. Empirical Software Engineering, 16(6):703–732, 2010.
[40]
D. Shepherd, Z. P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker. Using natural language program analysis to locate and understand action-oriented concerns. In Proceedings of the 6th international conference on Aspect-oriented software development, pages 212–224. ACM, 2007.
[41]
G. Sridhara, E. Hill, L. Pollock, and K. Vijay-Shanker. Identifying word relations in software: A comparative study of semantic similarity tools. In Proceedings of the 16th IEEE International Conference on Program Comprehension (ICPC’08), pages 123–132. IEEE, 2008.
[42]
J. Stylos and B. A. Myers. Mica: A web-search tool for finding API components and examples. In Proceedings of the Visual Languages and Human-Centric Computing (VLHCC’06), pages 195–202, 2006.
[43]
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems (NIPS’14), pages 3104–3112, 2014.
[44]
Z. Tu, Z. Su, and P. Devanbu. On the localness of software. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14), pages 269–280. ACM, 2014.
[45]
L. Van Der Maaten. Accelerating t-sne using tree-based algorithms. The Journal of Machine Learning Research, 15(1):3221–3245, 2014.
[46]
J. Wang, Y. Dang, H. Zhang, K. Chen, T. Xie, and D. Zhang. Mining succinct and high-coverage API usage patterns from source code. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR’13), pages 319–328. IEEE Press, 2013.
[47]
S. Wang, T. Liu, and L. Tan. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering (ICSE’16), pages 297–308. ACM, 2016.
[48]
M. White, C. Vendome, M. Linares-Vásquez, and D. Poshyvanyk. Toward deep learning software repositories. In Proceedings of the IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR’15), pages 334–345. IEEE, 2015.
[49]
T. Xie and J. Pei. MAPO: Mining API usages from open source repositories. In Proceedings of the 2006 international workshop on Mining software repositories (MSR’06), pages 54–57. ACM, 2006.
[50]
J. Yang and L. Tan. Inferring semantically related words from software context. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories (MSR’12), pages 161–170. IEEE Press, 2012.
[51]
M. D. Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.

Cited By

View all
  • (2024)Harnessing Test-Oriented Knowledge Graphs for Enhanced Test Function RecommendationElectronics10.3390/electronics1308154713:8(1547)Online publication date: 18-Apr-2024
  • (2024)C2B: A Semantic Source Code Retrieval Model Using CodeT5 and Bi-LSTMApplied Sciences10.3390/app1413579514:13(5795)Online publication date: 2-Jul-2024
  • (2024)Intelligent code search aids edge software developmentJournal of Cloud Computing10.1186/s13677-024-00629-513:1Online publication date: 1-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering
November 2016
1156 pages
ISBN:9781450342186
DOI:10.1145/2950290
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. API
  2. API usage
  3. RNN
  4. code search
  5. deep learning

Qualifiers

  • Research-article

Conference

FSE'16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)241
  • Downloads (Last 6 weeks)13
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Harnessing Test-Oriented Knowledge Graphs for Enhanced Test Function RecommendationElectronics10.3390/electronics1308154713:8(1547)Online publication date: 18-Apr-2024
  • (2024)C2B: A Semantic Source Code Retrieval Model Using CodeT5 and Bi-LSTMApplied Sciences10.3390/app1413579514:13(5795)Online publication date: 2-Jul-2024
  • (2024)Intelligent code search aids edge software developmentJournal of Cloud Computing10.1186/s13677-024-00629-513:1Online publication date: 1-Apr-2024
  • (2024)DeciX: Explain Deep Learning Based Code Generation ApplicationsProceedings of the ACM on Software Engineering10.1145/36608141:FSE(2424-2446)Online publication date: 12-Jul-2024
  • (2024)Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context LearningProceedings of the ACM on Software Engineering10.1145/36608111:FSE(2355-2377)Online publication date: 12-Jul-2024
  • (2024)A Survey of Source Code Search: A 3-Dimensional PerspectiveACM Transactions on Software Engineering and Methodology10.1145/365634133:6(1-51)Online publication date: 28-Jun-2024
  • (2024)Compositional API Recommendation for Library-Oriented Code GenerationProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644403(87-98)Online publication date: 15-Apr-2024
  • (2024)Representation Learning for Stack Overflow Posts: How Far Are We?ACM Transactions on Software Engineering and Methodology10.1145/363571133:3(1-24)Online publication date: 15-Mar-2024
  • (2024)PTM-APIRec: Leveraging Pre-trained Models of Source Code in API RecommendationACM Transactions on Software Engineering and Methodology10.1145/363274533:3(1-30)Online publication date: 15-Mar-2024
  • (2024)Measurement of Embedding Choices on Cryptographic API Completion TasksACM Transactions on Software Engineering and Methodology10.1145/362529133:3(1-30)Online publication date: 15-Mar-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media