abstract

Augmenting and structuring user queries to support efficient free-form code search

Authors:

Raphael Sirres,

Tegawendé F. Bissyandé,

Yves Le TraonAuthors Info & Claims

ICSE '18: Proceedings of the 40th International Conference on Software Engineering

Page 945

https://doi.org/10.1145/3180155.3182513

Published: 27 May 2018 Publication History

Get Access

Abstract

Motivation: Code search is an important activity in software development since developers are regularly searching [6] for code examples dealing with diverse programming concepts, APIs, and specific platform peculiarities. To help developers search for source code, several Internet-scale code search engines, such as OpenHub [5] and Codota [1] have been proposed. Unfortunately, these Internet-scale code search engines have limited performance since they treat source code as natural language documents. To improve the performance of search engines, the construction of the search space index as well as the mapping process of querying must address the challenge that "no single word can be chosen to describe a programming concept in the best way" [2]. This is known in the literature as the vocabulary mismatch problem [3].

Approach: We propose a novel approach to augmenting user queries in a free-form code search scenario. This approach aims at improving the quality of code examples returned by Internet-scale code search engines by building a Code voCaBulary (CoCaBu) [7]. The originality of CoCaBu is that it addresses the vocabulary mismatch problem, by expanding/enriching/re-targeting a user's free-form query, building on similar questions in Q&A sites so that a code search engine can find highly relevant code in source code repositories. Figure 1 provides an overview of our approach.

The search process begins with a free-form query from a user,

i.e., a sentence written in a natural language:

(a) For a given query, CoCaBu first searches for relevant posts in Q&A forums. The role of the Search Proxy is then to forward developer free-form queries to web search engines that can collect and rank entries in Q&A with the most relevant documents for the query.

(b) CoCaBu then generates an augmented query based on the information in the relevant posts. It mainly leverages code snippets in the previously identified posts. The Code Query Generator then creates another query which includes not only the initial user query terms but also program elements. To accelerate this step in the search process, CoCaBu builds upfront a snippet index for Q&A posts.

(c) Once the augmented query is constructed, CoCaBu searches source files for code locations that match the query terms. For this step, we crawl a large number of repositories and build upfront a code index of program elements in the source code.

Contributions:

• CoCaBu approach to the vocabulary mismatch problem: We propose a technique for finding relevant code with freeform query terms that describe programming tasks, with no a-priori knowledge on the API keywords to search for.

• GitSearch free-form search engine for GitHub: We instantiate the CoCaBu approach based on indices of Java files built from GitHub and Q&A posts from Stack Overflow to find the most relevant code examples for developer queries.

• Empirical user evaluation: Comparison with popular code search engines further shows that GitSearch is more effective in returning acceptable code search results. In addition, Comparison against web search engines indicates that GitSearch is a competitive alternative. Finally, via a live study, we show that users on Q&A sites may find GitSearch's real code examples acceptable as answers to developer questions.

Concluding remarks: As a follow-up work, we have also leveraged Stack Overflow data to build a practical, novel, and efficient code-to-code search engine [4].

References

[1]

Codota. 2016. http://www.codota.com. (Mar. 2016). last accessed 12.03.2016.

Google Scholar

[2]

G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. 1987. The Vocabulary Problem in Human-system Communication. Communications of the ACM (CACM) 30, 11 (Nov. 1987), 964--971.

Digital Library

Google Scholar

[3]

Sonia Haiduc, Gabriele Bavota, Andrian Marcus, Rocco Oliveto, Andrea De Lucia, and Tim Menzies. 2013. Automatic Query Reformulations for Text Retrieval in Software Engineering. In Proceedings of the 35th International Conference on Software Engineering (ICSE). 842--851.

Digital Library

Google Scholar

[4]

Kisub Kim, Dongsun Kim, Tegawende F. Bissyande, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY - A Code-to-Code Search Engine. In Proceedings of the 40th International Conference on Software Engineering (ICSE).

Digital Library

Google Scholar

[5]

OpenHub. 2016. http://code.openhub.net. (Mar. 2016). last accessed 12.03.2016.

Google Scholar

[6]

Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum. 2015. How Developers Search for Code: A Case Study. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (FSE). 191--201.

Digital Library

Google Scholar

[7]

Raphael Sirres, Tegawende F. Bissyande, Dongsun Kim, David Lo, Jacques Klein, Kisub Kim, and Yves Le Traon. 2018. Augmenting and structuring user queries to support efficient free-form code search. Empirical Software Engineering (Jan. 2018).

Google Scholar

Cited By

View all

Kong XLv ZChen CChang HLi NZhang F(2024)Code Recommendation for Schema Evolution of Mimic Storage SystemsInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450049935:01(89-110)Online publication date: 28-Oct-2024
https://doi.org/10.1142/S0218194024500499
Xie YLin JDong HZhang LWu Z(2023)Survey of Code Search Based on Deep LearningACM Transactions on Software Engineering and Methodology10.1145/362816133:2(1-42)Online publication date: 23-Dec-2023
https://dl.acm.org/doi/10.1145/3628161
Di Grazia LPradel M(2023)Code Search: A Survey of Techniques for Finding CodeACM Computing Surveys10.1145/356597155:11(1-31)Online publication date: 9-Feb-2023
https://dl.acm.org/doi/10.1145/3565971
Show More Cited By

Recommendations

Augmenting and structuring user queries to support efficient free-form code search

Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present COde voCABUlary (CoCaBu), an ...
Learning to rank code examples for code search engines

Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user'...
Active code search: incorporating user feedback to improve code search relevance
ASE '14: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering

Code search techniques return relevant code fragments given a user query. They typically work in a passive mode: given a user query, a static list of code fragments sorted by the relevance scores decided by a code search technique is returned to the ...

Comments

Information & Contributors

Information

Published In

ICSE '18: Proceedings of the 40th International Conference on Software Engineering

May 2018

1307 pages

ISBN:9781450356381

DOI:10.1145/3180155

Conference Chair:
Michel Chaudron
Chalmers University of Technology, University of Gothenburg, Sweden
,
General Chair:
Ivica Crnkovic
Chalmers University of Technology, University of Gothenburg, Sweden
,
Program Chairs:
Marsha Chechik
University of Toronto, Canada
,
Mark Harman
Facebook and University College London, United Kingdom

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Check for updates

Qualifiers

Abstract

Funding Sources

Fonds National de la Recherche (FNR)
Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant

Conference

ICSE '18

Sponsor:

SIGSOFT
IEEE-CS

ICSE '18: 40th International Conference on Software Engineering

May 27 - June 3, 2018

Gothenburg, Sweden

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
311
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)3

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kong XLv ZChen CChang HLi NZhang F(2024)Code Recommendation for Schema Evolution of Mimic Storage SystemsInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450049935:01(89-110)Online publication date: 28-Oct-2024
https://doi.org/10.1142/S0218194024500499
Xie YLin JDong HZhang LWu Z(2023)Survey of Code Search Based on Deep LearningACM Transactions on Software Engineering and Methodology10.1145/362816133:2(1-42)Online publication date: 23-Dec-2023
https://dl.acm.org/doi/10.1145/3628161
Di Grazia LPradel M(2023)Code Search: A Survey of Techniques for Finding CodeACM Computing Surveys10.1145/356597155:11(1-31)Online publication date: 9-Feb-2023
https://dl.acm.org/doi/10.1145/3565971
Peng YLi SGu WLi YWang WGao CLyu M(2023)Revisiting, Benchmarking and Exploring API Recommendation: How Far Are We?IEEE Transactions on Software Engineering10.1109/TSE.2022.319706349:4(1876-1897)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TSE.2022.3197063
Khomh FMasudur Rahman MBarbez A(2023)Intelligent Software MaintenanceOptimising the Software Development Process with Artificial Intelligence10.1007/978-981-19-9948-2_9(241-275)Online publication date: 20-Jul-2023
https://doi.org/10.1007/978-981-19-9948-2_9
Bader JSeohyun Kim SSifei Luan FChandra SMeijer E(2021)AI in Software Engineering at FacebookIEEE Software10.1109/MS.2021.306166438:4(52-61)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1109/MS.2021.3061664
Premtoon VKoppel JSolar-Lezama ADonaldson ATorlak E(2020)Semantic code search via equational reasoningProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386001(1066-1082)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3386001
Liu BDong WLiu JZhang YWang D(2020)ProSy: API-Based Synthesis with Probabilistic ModelJournal of Computer Science and Technology10.1007/s11390-020-0520-435:6(1234-1257)Online publication date: 1-Nov-2020
https://dl.acm.org/doi/10.1007/s11390-020-0520-4
Rahman MMussbacher GAtlee JBultan T(2019)Supporting code search with context-aware, analytics-driven, effective query reformulationProceedings of the 41st International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion.2019.00088(226-229)Online publication date: 25-May-2019
https://dl.acm.org/doi/10.1109/ICSE-Companion.2019.00088
Rahman MRoy CLo D(2019)Automatic query reformulation for code search using crowdsourced knowledgeEmpirical Software Engineering10.1007/s10664-018-9671-024:4(1869-1924)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/s10664-018-9671-0
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Recommendations