research-article

Codecatch: extracting source code snippets from online sources

Authors:

Themistoklis Diamantopoulos,

Georgios Karagiannopoulos,

Andreas L. SymeonidisAuthors Info & Claims

RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering

Pages 21 - 27

https://doi.org/10.1145/3194104.3194107

Published: 28 May 2018 Publication History

Abstract

Nowadays, developers rely on online sources to find example snippets that address the programming problems they are trying to solve. However, contemporary API usage mining methods are not suitable for locating easily reusable snippets, as they provide usage examples for specific APIs, thus requiring the developer to know which library to use beforehand. On the other hand, the approaches that retrieve snippets from online sources usually output a list of examples, without aiding the developer to distinguish among different implementations and without offering any insight on the quality and the reusability of the proposed snippets. In this work, we present CodeCatch, a system that receives queries in natural language and extracts snippets from multiple online sources. The snippets are assessed both for their quality and for their usefulness/preference by the developers, while they are also clustered according to their API calls to allow the developer to select among the different implementations. Preliminary evaluation of CodeCatch in a set of indicative programming problems indicates that it can be a useful tool for the developer.

References

[1]

Charu C. Aggarwal and ChengXiang Zhai. 2012. A Survey of Text Clustering Algorithms. Springer US, Boston, MA, 77--128.

[2]

Karan Aggarwal, Abram Hindle, and Eleni Stroulia. 2014. Co-evolution of Project Documentation and Popularity Within Github. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR '14). ACM, New York, NY, USA, 360--363.

Digital Library

[3]

Joel Brandt, Mira Dontcheva, Marcos Weskamp, and Scott R. Klemmer. 2010. Example-centric Programming: Integrating Web Search into the Development Environment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 513--522.

Digital Library

[4]

Raymond P. L. Buse and Westley Weimer. 2012. Synthesizing API Usage Examples. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). IEEE Press, Piscataway, NJ, USA, 782--792.

Digital Library

[5]

Raymond P. L. Buse and Westley R. Weimer. 2010. Learning a Metric for Code Readability. IEEE Trans. Softw. Eng. 36, 4 (2010), 546--558.

Digital Library

[6]

Themistoklis Diamantopoulos and Andreas L. Symeonidis. 2015. Employing Source Code Information to Improve Question-answering in Stack Overflow. In Proceedings of the 12th Working Conference on Mining Software Repositories (MSR '15). IEEE Press, Piscataway, NJ, USA, 454--457.

Digital Library

[7]

Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos, and Andreas Symeonidis. 2017. Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics. In Proceedings of the 12th International Joint Conference on Software Technologies (ICSOFT 2017). SciTePress, Setúbal, Portugal, 73--84.

[8]

Jaroslav Fowkes and Charles Sutton. 2016. Parameter-free probabilistic API mining across GitHub. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 254--265.

Digital Library

[9]

Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. 2007. DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones. In Proceedings of the 29th International Conference on Software Engineering (ICSE '07). IEEE Computer Society, Washington, DC, USA, 96--105.

Digital Library

[10]

Iman Keivanloo, Juergen Rilling, and Ying Zou. 2014. Spotting Working Code Examples. In Proceedings of the 36th International Conference on Software Engineering (ICSE '14). ACM, New York, NY, USA, 664--675.

Digital Library

[11]

Jinhan Kim, Sanghoon Lee, Seung-won Hwang, and Sunghun Kim. 2010. Towards an Intelligent Code Search Engine. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI '10). AAAI Press, Palo Alto, CA, USA, 1358--1363.

Digital Library

[12]

David Mandelin, Lin Xu, Rastislav Bodík, and Doug Kimelman. 2005. Jungloid Mining: Helping to Navigate the API Jungle. SIGPLAN Not. 40, 6 (2005), 48--61.

Digital Library

[13]

João Eduardo Montandon, Hudson Borges, Daniel Felix, and Marco Tulio Valente. 2013. Documenting APIs with examples: Lessons learned with the APIMiner platform. In Proceedings of the 20th Working Conference on Reverse Engineering (WCRE 2013). IEEE Computer Society, Piscataway, NJ, USA, 401--408.

[14]

Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrian Marcus. 2015. How Can I Use This Method?. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15). IEEE Press, Piscataway, NJ, USA, 880--890.

Digital Library

[15]

Michail Papamichail, Themistoklis Diamantopoulos, and Andreas L. Symeonidis. 2016. User-Perceived Source Code Quality Estimation based on Static Analysis Metrics. In Proceedings of the 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS 2016). IEEE, Piscataway, NJ, USA, 100--107.

[16]

Luca Ponzanelli, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Michele Lanza. 2014. Mining StackOverflow to Turn the IDE into a Self-confident Programming Prompter. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR '14). ACM, New York, NY, USA, 102--111.

Digital Library

[17]

Suresh Thummalapenta and Tao Xie. 2007. PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE 07). ACM, New York, NY, USA, 204--213.

Digital Library

[18]

Jue Wang, Yingnong Dang, Hongyu Zhang, Kai Chen, Tao Xie, and Dongmei Zhang. 2013. Mining succinct and high-coverage API usage patterns from source code. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR '13). IEEE Press, Piscataway, NJ, USA, 319--328.

Digital Library

[19]

Jianyong Wang and Jiawei Han. 2004. BIDE: Efficient Mining of Frequent Closed Sequences. In Proceedings of the 20th International Conference on Data Engineering (ICDE '04). IEEE Computer Society, Washington, DC, USA, 79--90.

Digital Library

[20]

Yi Wei, Nirupama Chandrasekaran, Sumit Gulwani, and Youssef Hamadi. 2015. Building Bing Developer Assistant. Technical Report MSR-TR-2015--36. Microsoft Research.

[21]

Doug Wightman, Zi Ye, Joel Brandt, and Roel Vertegaal. 2012. SnipMatch: Using Source Code Context to Enhance Snippet Retrieval and Parameterization. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (UIST '12). ACM, New York, NY, USA, 219--228.

Digital Library

[22]

Tao Xie and Jian Pei. 2006. MAPO: Mining API Usages from Open Source Repositories. In Proceedings of the 2006 International Workshop on MiningSoftware Repositories (MSR '06). ACM, New York, NY, USA, 54--57.

Digital Library

Cited By

Bibi NRana TMaqbool AAfzal FAkgül ADe la Sen M(2023)An Intelligent Platform for Software Component Mining and RetrievalSensors10.3390/s2301052523:1(525)Online publication date: 3-Jan-2023
https://doi.org/10.3390/s23010525
Di Grazia LPradel M(2023)Code Search: A Survey of Techniques for Finding CodeACM Computing Surveys10.1145/356597155:11(1-31)Online publication date: 9-Feb-2023
https://dl.acm.org/doi/10.1145/3565971
Rocha AMaia M(2023)Mining relevant solutions for programming tasks from search engine resultsIET Software10.1049/sfw2.1212717:4(455-471)Online publication date: 14-Jun-2023
https://dl.acm.org/doi/10.1049/sfw2.12127
Show More Cited By

Index Terms

Codecatch: extracting source code snippets from online sources
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems
  2. Information systems applications
    1. Data mining
      1. Clustering
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Reusability

Recommendations

Exploiting code search engines to improve programmer productivity
OOPSLA '07: Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion

Code Search Engines (CSE) can serve as powerful resources of open source code, as they can search in billions of lines of open source code available on the web. The strength of CSEs can be used for several tasks like searching relevant code samples, ...
Identifying the Concepts That Are Searchable with Keywords in Code Search Engines
New Frontiers in Artificial Intelligence
Abstract
The (extended position) paper discusses the reason why keyword-based search engines may not be effective in code search, and shows an case study where which kind of concepts in source code can be effectively searched by keyword code search ...
SpotWeb: detecting framework hotspots via mining open source repositories on the web
MSR '08: Proceedings of the 2008 international working conference on Mining software repositories

The essentials of modern software development (such as low cost and high efficiency) demand software developers to make intensive reuse of existing open source frameworks or libraries (generally referred as frameworks) available on the web. However, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering

May 2018

67 pages

ISBN:9781450357234

DOI:10.1145/3194104

Conference Chairs:
Walter F. Tichy
Karlsruhe Institute of Technology, Germany
,
Leandro Minku
University of Leicester, UK

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICSE '18

Sponsor:

SIGSOFT
IEEE-CS

ICSE '18: 40th International Conference on Software Engineering

May 28 - 29, 2018

Gothenburg, Sweden

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
174
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bibi NRana TMaqbool AAfzal FAkgül ADe la Sen M(2023)An Intelligent Platform for Software Component Mining and RetrievalSensors10.3390/s2301052523:1(525)Online publication date: 3-Jan-2023
https://doi.org/10.3390/s23010525
Di Grazia LPradel M(2023)Code Search: A Survey of Techniques for Finding CodeACM Computing Surveys10.1145/356597155:11(1-31)Online publication date: 9-Feb-2023
https://dl.acm.org/doi/10.1145/3565971
Rocha AMaia M(2023)Mining relevant solutions for programming tasks from search engine resultsIET Software10.1049/sfw2.1212717:4(455-471)Online publication date: 14-Jun-2023
https://dl.acm.org/doi/10.1049/sfw2.12127
Diamantopoulos TOikonomou NSymeonidis A(2020)Extracting Semantics from Question-Answering Services for Snippet ReuseFundamental Approaches to Software Engineering10.1007/978-3-030-45234-6_6(119-139)Online publication date: 25-Apr-2020
https://dl.acm.org/doi/10.1007/978-3-030-45234-6_6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten