Google Scholar

From word embeddings to document similarities for improved information retrieval in software engineering

X Ye, H Shen, X Ma, R Bunescu, C Liu - Proceedings of the 38th …, 2016 - dl.acm.org

X Ye, H Shen, X Ma, R Bunescu, C Liu

Proceedings of the 38th international conference on software engineering, 2016•dl.acm.org

The application of information retrieval techniques to search tasks in software engineering is made difficult by the lexical gap between search queries, usually expressed in natural language (e.g. English), and retrieved documents, usually expressed in code (e.g. programming languages). This is often the case in bug and feature location, community question answering, or more generally the communication between technical personnel and non-technical stake holders in a software project. In this paper, we propose bridging the lexical gap by projecting natural language statements and code snippets as meaning vectors in a shared representation space. In the proposed architecture, word embeddings are first trained on API documents, tutorials, and reference documents, and then aggregated in order to estimate semantic similarities between documents. Empirical evaluations show that the learned vector space embeddings lead to improvements in a previously explored bug localization task and a newly defined task of linking API documents to computer programming questions.

ACM Digital Library

Show moreShow less

Save Cite Cited by 376 Related articles All 6 versions

Cite

Advanced search

Saved to My library

From word embeddings to document similarities for improved information retrieval in software engineering