Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3180155.3182513acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
abstract

Augmenting and structuring user queries to support efficient free-form code search

Published: 27 May 2018 Publication History

Abstract

Motivation: Code search is an important activity in software development since developers are regularly searching [6] for code examples dealing with diverse programming concepts, APIs, and specific platform peculiarities. To help developers search for source code, several Internet-scale code search engines, such as OpenHub [5] and Codota [1] have been proposed. Unfortunately, these Internet-scale code search engines have limited performance since they treat source code as natural language documents. To improve the performance of search engines, the construction of the search space index as well as the mapping process of querying must address the challenge that "no single word can be chosen to describe a programming concept in the best way" [2]. This is known in the literature as the vocabulary mismatch problem [3].
Approach: We propose a novel approach to augmenting user queries in a free-form code search scenario. This approach aims at improving the quality of code examples returned by Internet-scale code search engines by building a Code voCaBulary (CoCaBu) [7]. The originality of CoCaBu is that it addresses the vocabulary mismatch problem, by expanding/enriching/re-targeting a user's free-form query, building on similar questions in Q&A sites so that a code search engine can find highly relevant code in source code repositories. Figure 1 provides an overview of our approach.
The search process begins with a free-form query from a user,
i.e., a sentence written in a natural language:
(a) For a given query, CoCaBu first searches for relevant posts in Q&A forums. The role of the Search Proxy is then to forward developer free-form queries to web search engines that can collect and rank entries in Q&A with the most relevant documents for the query.
(b) CoCaBu then generates an augmented query based on the information in the relevant posts. It mainly leverages code snippets in the previously identified posts. The Code Query Generator then creates another query which includes not only the initial user query terms but also program elements. To accelerate this step in the search process, CoCaBu builds upfront a snippet index for Q&A posts.
(c) Once the augmented query is constructed, CoCaBu searches source files for code locations that match the query terms. For this step, we crawl a large number of repositories and build upfront a code index of program elements in the source code.
Contributions:
• CoCaBu approach to the vocabulary mismatch problem: We propose a technique for finding relevant code with freeform query terms that describe programming tasks, with no a-priori knowledge on the API keywords to search for.
• GitSearch free-form search engine for GitHub: We instantiate the CoCaBu approach based on indices of Java files built from GitHub and Q&A posts from Stack Overflow to find the most relevant code examples for developer queries.
Empirical user evaluation: Comparison with popular code search engines further shows that GitSearch is more effective in returning acceptable code search results. In addition, Comparison against web search engines indicates that GitSearch is a competitive alternative. Finally, via a live study, we show that users on Q&A sites may find GitSearch's real code examples acceptable as answers to developer questions.
Concluding remarks: As a follow-up work, we have also leveraged Stack Overflow data to build a practical, novel, and efficient code-to-code search engine [4].

References

[1]
Codota. 2016. http://www.codota.com. (Mar. 2016). last accessed 12.03.2016.
[2]
G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. 1987. The Vocabulary Problem in Human-system Communication. Communications of the ACM (CACM) 30, 11 (Nov. 1987), 964--971.
[3]
Sonia Haiduc, Gabriele Bavota, Andrian Marcus, Rocco Oliveto, Andrea De Lucia, and Tim Menzies. 2013. Automatic Query Reformulations for Text Retrieval in Software Engineering. In Proceedings of the 35th International Conference on Software Engineering (ICSE). 842--851.
[4]
Kisub Kim, Dongsun Kim, Tegawende F. Bissyande, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY - A Code-to-Code Search Engine. In Proceedings of the 40th International Conference on Software Engineering (ICSE).
[5]
OpenHub. 2016. http://code.openhub.net. (Mar. 2016). last accessed 12.03.2016.
[6]
Caitlin Sadowski, Kathryn T. Stolee, and Sebastian Elbaum. 2015. How Developers Search for Code: A Case Study. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (FSE). 191--201.
[7]
Raphael Sirres, Tegawende F. Bissyande, Dongsun Kim, David Lo, Jacques Klein, Kisub Kim, and Yves Le Traon. 2018. Augmenting and structuring user queries to support efficient free-form code search. Empirical Software Engineering (Jan. 2018).

Cited By

View all
  • (2024)Code Recommendation for Schema Evolution of Mimic Storage SystemsInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450049935:01(89-110)Online publication date: 28-Oct-2024
  • (2023)Survey of Code Search Based on Deep LearningACM Transactions on Software Engineering and Methodology10.1145/362816133:2(1-42)Online publication date: 23-Dec-2023
  • (2023)Code Search: A Survey of Techniques for Finding CodeACM Computing Surveys10.1145/356597155:11(1-31)Online publication date: 9-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '18: Proceedings of the 40th International Conference on Software Engineering
May 2018
1307 pages
ISBN:9781450356381
DOI:10.1145/3180155
  • Conference Chair:
  • Michel Chaudron,
  • General Chair:
  • Ivica Crnkovic,
  • Program Chairs:
  • Marsha Chechik,
  • Mark Harman
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Check for updates

Qualifiers

  • Abstract

Funding Sources

  • Fonds National de la Recherche (FNR)
  • Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant

Conference

ICSE '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)3
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Code Recommendation for Schema Evolution of Mimic Storage SystemsInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450049935:01(89-110)Online publication date: 28-Oct-2024
  • (2023)Survey of Code Search Based on Deep LearningACM Transactions on Software Engineering and Methodology10.1145/362816133:2(1-42)Online publication date: 23-Dec-2023
  • (2023)Code Search: A Survey of Techniques for Finding CodeACM Computing Surveys10.1145/356597155:11(1-31)Online publication date: 9-Feb-2023
  • (2023)Revisiting, Benchmarking and Exploring API Recommendation: How Far Are We?IEEE Transactions on Software Engineering10.1109/TSE.2022.319706349:4(1876-1897)Online publication date: 1-Apr-2023
  • (2023)Intelligent Software MaintenanceOptimising the Software Development Process with Artificial Intelligence10.1007/978-981-19-9948-2_9(241-275)Online publication date: 20-Jul-2023
  • (2021)AI in Software Engineering at FacebookIEEE Software10.1109/MS.2021.306166438:4(52-61)Online publication date: 1-Jul-2021
  • (2020)Semantic code search via equational reasoningProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386001(1066-1082)Online publication date: 11-Jun-2020
  • (2020)ProSy: API-Based Synthesis with Probabilistic ModelJournal of Computer Science and Technology10.1007/s11390-020-0520-435:6(1234-1257)Online publication date: 1-Nov-2020
  • (2019)Supporting code search with context-aware, analytics-driven, effective query reformulationProceedings of the 41st International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion.2019.00088(226-229)Online publication date: 25-May-2019
  • (2019)Automatic query reformulation for code search using crowdsourced knowledgeEmpirical Software Engineering10.1007/s10664-018-9671-024:4(1869-1924)Online publication date: 1-Aug-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media