research-article

Deep API learning

Authors:

Sunghun KimAuthors Info & Claims

FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

Pages 631 - 642

https://doi.org/10.1145/2950290.2950334

Published: 01 November 2016 Publication History

Abstract

Developers often wonder how to implement a certain functionality (e.g., how to parse XML files) using APIs. Obtaining an API usage sequence based on an API-related natural language query is very helpful in this regard. Given a query, existing approaches utilize information retrieval models to search for matching API sequences. These approaches treat queries and APIs as bags-of-words and lack a deep understanding of the semantics of the query. We propose DeepAPI, a deep learning based approach to generate API usage sequences for a given natural language query. Instead of a bag-of-words assumption, it learns the sequence of words in a query and the sequence of associated APIs. DeepAPI adapts a neural language model named RNN Encoder-Decoder. It encodes a word sequence (user query) into a fixed-length context vector, and generates an API sequence based on the context vector. We also augment the RNN Encoder-Decoder by considering the importance of individual APIs. We empirically evaluate our approach with more than 7 million annotated code snippets collected from GitHub. The results show that our approach generates largely accurate API sequences and outperforms the related approaches.

References

[1]

Eclipse JDT. http://www.eclipse.org/jdt/.

[2]

Github. https://github.com.

[3]

Github search. https://github.com/search?type=code.

[4]

Lucene. https://lucene.apache.org/.

[5]

M. Allamanis, E. T. Barr, C. Bird, and C. Sutton. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14), pages 281–293. ACM, 2014.

Digital Library

[6]

M. Allamanis, H. Peng, and C. Sutton. A convolutional attention network for extreme summarization of source code. In Proceedings of the International Conference on Machine Learning (ICML’16), 2016.

[7]

M. Allamanis, D. Tarlow, A. Gordon, and Y. Wei. Bimodal modelling of source code and natural language. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15), pages 2123–2132, 2015.

[8]

D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.

[9]

L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT’10), pages 177–186. Springer, 2010.

[10]

J. Brandt, M. Dontcheva, M. Weskamp, and S. R. Klemmer. Example-centric programming: integrating web search into the development environment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI’10, pages 513–522. ACM, 2010.

Digital Library

[11]

P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai. Class-based n-gram models of natural language. Computational linguistics, 18(4):467–479, 1992.

Digital Library

[12]

P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 19(2):263–311, 1993.

Digital Library

[13]

W.-K. Chan, H. Cheng, and D. Lo. Searching connected API subgraph via text phrases. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE’12, pages 10:1–10:11. ACM, 2012.

Digital Library

[14]

K. Cho, B. Van Merri¨ enboer, ¸ C. Gül¸ cehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN Encoder–Decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), pages 1724–1734, Doha, Qatar, Oct. 2014. Association for Computational Linguistics.

[15]

J. Fowkes and C. Sutton. Parameter-free probabilistic API mining at github scale. In Proceedings of the ACM SIGSOFT 24th International Symposium on the Foundations of Software Engineering (FSE’16). ACM, 2016.

Digital Library

[16]

A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In Proceedings of the 34th International Conference on Software Engineering (ICSE’12), pages 837–847. IEEE, 2012.

Digital Library

[17]

R. Holmes, R. Cottrell, R. J. Walker, and J. Denzinger. The end-to-end use of source code examples: An exploratory study. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM’09), pages 555–558. IEEE, 2009.

[18]

M. J. Howard, S. Gupta, L. Pollock, and K. Vijay-Shanker. Automatically mining software-based, semantically-similar words from comment-code mappings. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR’13), pages 377–386. IEEE Press, 2013.

Digital Library

[19]

I. Keivanloo, J. Rilling, and Y. Zou. Spotting working code examples. In Proceedings of the 36th International Conference on Software Engineering (ICSE’14), pages 664–675. ACM, 2014.

Digital Library

[20]

J. Kim, S. Lee, S. Hwang, and S. Kim. Towards an intelligent code search engine. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI’10), pages 1358–1363, 2010.

Digital Library

[21]

P. Koehn. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In Machine translation: From real users to research, pages 115–124. Springer, 2004.

[22]

M. Li, T. Zhang, Y. Chen, and A. J. Smola. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’14), pages 661–670. ACM, 2014.

Digital Library

[23]

W. Ling, E. Grefenstette, K. M. Hermann, T. Kocisky, A. Senior, F. Wang, and P. Blunsom. Latent predictor networks for code generation. arXiv preprint arXiv:1603.06744, 2016.

[24]

E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18:300–336, 2009.

Digital Library

[25]

F. Lv, H. Zhang, J. Lou, S. Wang, D. Zhang, and J. Zhao. CodeHow: Effective code search based on API understanding and extended boolean model. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE’15), pages 260–270. IEEE, 2015.

[26]

C. Maddison and D. Tarlow. Structured generative models of natural source code. In Proceedings of the 31st International Conference on Machine Learning (ICML’14), pages 649–657, 2014.

[27]

C. McMillan, M. Grechanik, and D. Poshyvanyk. Detecting similar software applications. In Proceedings of the 34th International Conference on Software Engineering (ICSE’12), pages 364–374. IEEE, 2012.

Digital Library

[28]

C. McMillan, M. Grechanik, D. Poshyvanyk, Q. Xie, and C. Fu. Portfolio: finding relevant functions and their usage. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11), pages 111–120. IEEE, 2011.

Digital Library

[29]

T. Mikolov, M. Karafiát, L. Burget, J. Cernock` y, and S. Khudanpur. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH’10), pages 1045–1048, 2010.

[30]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (NIPS’13), pages 3111–3119, 2013.

Digital Library

[31]

E. Moritz, M. Linares-Vásquez, D. Poshyvanyk, M. Grechanik, C. McMillan, and M. Gethers. Export: Detecting and visualizing API usages in large source code repositories. In Proceedings of the IEEE/ACM 28th International Conference on Automated Software Engineering (ASE’13), pages 646–651. IEEE, 2013.

[32]

L. Mou, G. Li, Y. Liu, H. Peng, Z. Jin, Y. Xu, and L. Zhang. Building program vector representations for deep learning. arXiv preprint arXiv:1409.3358, 2014.

[33]

L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI’16), 2016.

Digital Library

[34]

L. Mou, R. Men, G. Li, L. Zhang, and Z. Jin. On end-to-end program generation from user intention by deep neural networks. arXiv, 2015.

[35]

K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (ACL’02), pages 311–318. Association for Computational Linguistics, 2002.

Digital Library

[36]

M. Raghothaman, Y. Wei, and Y. Hamadi. SWIM: synthesizing what I mean: code search and idiomatic snippet synthesis. In Proceedings of the 38th International Conference on Software Engineering (ICSE’16), pages 357–367. ACM, 2016.

Digital Library

[37]

V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In ACM SIGPLAN Notices, volume 49, pages 419–428. ACM, 2014.

Digital Library

[38]

M. P. Robillard. What makes APIs hard to learn? answers from developers. IEEE Software, 26(6):27–34, 2009.

Digital Library

[39]

M. P. Robillard and R. DeLine. A field study of API learning obstacles. Empirical Software Engineering, 16(6):703–732, 2010.

Digital Library

[40]

D. Shepherd, Z. P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker. Using natural language program analysis to locate and understand action-oriented concerns. In Proceedings of the 6th international conference on Aspect-oriented software development, pages 212–224. ACM, 2007.

Digital Library

[41]

G. Sridhara, E. Hill, L. Pollock, and K. Vijay-Shanker. Identifying word relations in software: A comparative study of semantic similarity tools. In Proceedings of the 16th IEEE International Conference on Program Comprehension (ICPC’08), pages 123–132. IEEE, 2008.

Digital Library

[42]

J. Stylos and B. A. Myers. Mica: A web-search tool for finding API components and examples. In Proceedings of the Visual Languages and Human-Centric Computing (VLHCC’06), pages 195–202, 2006.

Digital Library

[43]

I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems (NIPS’14), pages 3104–3112, 2014.

Digital Library

[44]

Z. Tu, Z. Su, and P. Devanbu. On the localness of software. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14), pages 269–280. ACM, 2014.

Digital Library

[45]

L. Van Der Maaten. Accelerating t-sne using tree-based algorithms. The Journal of Machine Learning Research, 15(1):3221–3245, 2014.

Digital Library

[46]

J. Wang, Y. Dang, H. Zhang, K. Chen, T. Xie, and D. Zhang. Mining succinct and high-coverage API usage patterns from source code. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR’13), pages 319–328. IEEE Press, 2013.

Digital Library

[47]

S. Wang, T. Liu, and L. Tan. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering (ICSE’16), pages 297–308. ACM, 2016.

Digital Library

[48]

M. White, C. Vendome, M. Linares-Vásquez, and D. Poshyvanyk. Toward deep learning software repositories. In Proceedings of the IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR’15), pages 334–345. IEEE, 2015.

Digital Library

[49]

T. Xie and J. Pei. MAPO: Mining API usages from open source repositories. In Proceedings of the 2006 international workshop on Mining software repositories (MSR’06), pages 54–57. ACM, 2006.

Digital Library

[50]

J. Yang and L. Tan. Inferring semantically related words from software context. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories (MSR’12), pages 161–170. IEEE Press, 2012.

Digital Library

[51]

M. D. Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.

Cited By

Liu KWu JSun QYang HWan R(2024)Harnessing Test-Oriented Knowledge Graphs for Enhanced Test Function RecommendationElectronics10.3390/electronics1308154713:8(1547)Online publication date: 18-Apr-2024
https://doi.org/10.3390/electronics13081547
Bibi NMaqbool ARana TAfzal FKhan A(2024)C2B: A Semantic Source Code Retrieval Model Using CodeT5 and Bi-LSTMApplied Sciences10.3390/app1413579514:13(5795)Online publication date: 2-Jul-2024
https://doi.org/10.3390/app14135795
Zhang FLi MWu HWu T(2024)Intelligent code search aids edge software developmentJournal of Cloud Computing10.1186/s13677-024-00629-513:1Online publication date: 1-Apr-2024
https://doi.org/10.1186/s13677-024-00629-5
Show More Cited By

Index Terms

Deep API learning
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Reusability

Recommendations

On the effectiveness of pretrained models for API learning
ICPC '22: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension

Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language ...
Eclipse API usage: the good and the bad

Today, when constructing software systems, many developers build their systems on top of frameworks. Eclipse is such a framework that has been in existence for over a decade. Like many other evolving software systems, the Eclipse platform has both ...
Deep API learning revisited
ICPC '22: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension

Understanding the correct API usage sequences is one of the most important tasks for programmers when they work with unfamiliar libraries. However, programmers often encounter obstacles to finding the appropriate information due to either poor quality ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

November 2016

1156 pages

ISBN:9781450342186

DOI:10.1145/2950290

General Chair:
Thomas Zimmermann
Microsoft Research, USA
,
Program Chairs:
Jane Cleland-Huang
University of Notre Dame, USA
,
Zhendong Su
University of California at Davis, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FSE'16

Sponsor:

SIGSOFT

FSE'16: 24nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering

November 13 - 18, 2016

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

358
Total Citations
View Citations
3,097
Total Downloads

Downloads (Last 12 months)241
Downloads (Last 6 weeks)13

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu KWu JSun QYang HWan R(2024)Harnessing Test-Oriented Knowledge Graphs for Enhanced Test Function RecommendationElectronics10.3390/electronics1308154713:8(1547)Online publication date: 18-Apr-2024
https://doi.org/10.3390/electronics13081547
Bibi NMaqbool ARana TAfzal FKhan A(2024)C2B: A Semantic Source Code Retrieval Model Using CodeT5 and Bi-LSTMApplied Sciences10.3390/app1413579514:13(5795)Online publication date: 2-Jul-2024
https://doi.org/10.3390/app14135795
Zhang FLi MWu HWu T(2024)Intelligent code search aids edge software developmentJournal of Cloud Computing10.1186/s13677-024-00629-513:1Online publication date: 1-Apr-2024
https://doi.org/10.1186/s13677-024-00629-5
Chen SLi ZYang WLiu C(2024)DeciX: Explain Deep Learning Based Code Generation ApplicationsProceedings of the ACM on Software Engineering10.1145/36608141:FSE(2424-2446)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660814
Mai YGao ZHu XBao LLiu YSun J(2024)Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context LearningProceedings of the ACM on Software Engineering10.1145/36608111:FSE(2355-2377)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660811
Sun WFang CGe YHu YChen YZhang QGe XLiu YChen Z(2024)A Survey of Source Code Search: A 3-Dimensional PerspectiveACM Transactions on Software Engineering and Methodology10.1145/365634133:6(1-51)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3656341
Ma ZAn SXie BLin ZBaysal OLinares-Vasquez MMoran KSteinmacher I(2024)Compositional API Recommendation for Library-Oriented Code GenerationProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644403(87-98)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643916.3644403
He JZhou XXu BZhang TKim KYang ZThung FIrsan ILo D(2024)Representation Learning for Stack Overflow Posts: How Far Are We?ACM Transactions on Software Engineering and Methodology10.1145/363571133:3(1-24)Online publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1145/3635711
Li ZLi CTang ZHuang WGe JLuo BNg VWang THu YZhang X(2024)PTM-APIRec: Leveraging Pre-trained Models of Source Code in API RecommendationACM Transactions on Software Engineering and Methodology10.1145/363274533:3(1-30)Online publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1145/3632745
Xiao YSong WAhmed SGe XViswanath BMeng NYao D(2024)Measurement of Embedding Choices on Cryptographic API Completion TasksACM Transactions on Software Engineering and Methodology10.1145/362529133:3(1-30)Online publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1145/3625291
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents