Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Enriching Documents with Examples: A Corpus Mining Approach

Published: 01 January 2013 Publication History

Abstract

Software developers increasingly rely on information from the Web, such as documents or code examples on application programming interfaces (APIs), to facilitate their development processes. However, API documents often do not include enough information for developers to fully understand how to use the APIs, and searching for good code examples requires considerable effort.
To address this problem, we propose a novel code example recommendation system that combines the strength of browsing documents and searching for code examples and returns API documents embedded with high-quality code example summaries mined from the Web. Our evaluation results show that our approach provides code examples with high precision and boosts programmer productivity.

References

[1]
Agrawal, R., Gollapudi, S., Halverson, A., and Ieong, S. 2009. Diversifying search results. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM’09).
[2]
ASF. 2010. The Apache Software Foundation. http://www.apache.org/.
[3]
Bajracharya, S. and Lopes, C. 2009. Mining search topics from a code search engine log. In Proceedings of the 6th Working Conference on Mining Software Repositories (MSR’09).
[4]
Balabanović, M. and Shoham, Y. 1997. Fab: Content-based, collaborative recommendation. Commun. ACM.
[5]
Campbell, N. A. 1984. Some aspects of allocation and discrimination. In Multivariate Statistical Methods in Physical Anthropology. Springer, Berlin. 177--192.
[6]
Devanbu, P., Karstu, S., Melo, W., and Thomas, W. 1996. Analytical and empirical evaluation of software reuse metrics. In Proceedings of the International Conference on Software Engineering (ICSE’96).
[7]
Erkan, G. and Radev, D. R. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. In J. Artif. Intell. Res.
[8]
Fleiss. 2010. Fleiss’ kappa. http://en.wikipedia.org/wiki/Fleiss’_kappa.
[9]
Foody, G. M., Campbell, N., Trod, N., and Wood, T. 1992. Derivation and applications of probabilistic measures of class membership from the maximum likelihood classification. Photogramm. Eng. Remote Sens. 58, 1335--1341.
[10]
Gaffney, J. E. and Durek, T. A. 1989. Software reuse---key to enhanced productivity: Some quantitative models. Inf. Softw. Technol.
[11]
Google. 2010. Google Code Search. http://www.google.com/codesearch.
[12]
Guéhéneuc, Y.-G. and Antoniol, G. 2008. DeMIMA: A multilayered approach for design pattern identification. IEEE Trans. Softw. Eng.
[13]
Holmes, R. and Murphy, G. C. 2005. Using structural context to recommend source code examples. In Proceedings of the International Conference on Software Engineering (ICSE’05).
[14]
Holmes, R. and Walker, R. J. 2007. Informing Eclipse API production and consumption. In Proceedings of the ETX-Eclipse Technology Exchange Conference.
[15]
Holmes, R. and Walker, R. J. 2008. A newbie’s guide to eclipse APIs. In Proceedings of the 5th Working Conference on Mining Software Repositories (MSR’08).
[16]
Horwitz, S., Reps, T., and Binkley, D. 1988. Interprocedural slicing using dependence graphs. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI’88).
[17]
Jaccard, P. 1901. E’tude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin del la Socie’te’ Vaudoise des Sciences Naturelles.
[18]
Java Examples. 2010. Example source code. http://java2s.com/.
[19]
Java2Xml. 2010. Java2XML Project Home Page. https://java2xml.dev.java.net/.
[20]
Jiang, L., Misherghi, G., Su, Z., and Glondu, S. 2007. DECKARD: Scalable and accurate tree-based detection of code clones. In Proceedings of the International Conference on Software Engineering (ICSE’07).
[21]
Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’02).
[22]
Kim, J., Lee, S., Hwang, S.-W., and Kim, S. 2009. Adding examples into java documents. In Proceedings of the International Conference on Automated Software Engineering (ASE’09).
[23]
Kim, J., Lee, S., Hwang, S.-W., and Kim, S. 2010. Towards an intelligent code search engine. In Proceedings of the National Conference of the American Association for Artificial Intelligence (AAAI’10).
[24]
Kim, S., Pan, K., and Whitehead, Jr., E. E. J. 2006. Memories of bug fixes. In Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering.
[25]
KodeJava. 2010. Learn Java Programming by Examples. http://www.kodejava.org/.
[26]
Koders. 2010. Open Source Code Search Engine. http://www.koders.com.
[27]
Leopard. 2010. Leopard Reference Library. http://developer.apple.com/referencelibrary/index.html.
[28]
Lim, W. C. 1994. Effects of reuse on quality, productivity, and economics. IEEE Softw.
[29]
MSDN. 2010. MSDN Library. http://msdn.microsoft.com/en-us/library/default.aspx.
[30]
Page, L., Brin, S., Motwani, R., and Winograd, T. 1999. The pagerank citation ranking: Bringing order to the web. Tech. rep.
[31]
Pelleg, D. and Moore, A. 2000. X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the International Conference on Machine Learning (ICML’00).
[32]
PHP. 2010. Hypertext Preprocessor. http://www.php.net.
[33]
Radlinski, F. and Joachims, T. 2007. Active exploration for learning rankings from clickthrough data. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’07).
[34]
Radlinski, F., Kurup, M., and Joachims, T. 2008. How does clickthrough data reflect retrieval quality? In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM’08).
[35]
Sahavechaphan, N. and Claypool, K. T. 2006. XSnippet: Mining for sample code. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’06).
[36]
Thummalapenta, S. and Xie, T. 2008. SpotWeb: Detecting framework hotspots via mining open source repositories on the web. In Proceedings of the Working Conference on Mining Software Repositories (MSR’08).
[37]
Weiser, M. 1981. Program slicing. In Proceedings of the International Conference on Software Engineering (ICSE’81).
[38]
Xie, T. and Pei, J. 2006. MAPO: Mining API usages from open source repositories. In Proceedings of the Working Conference on Mining Software Repositories (MSR’06).
[39]
Xu, R. and Wunsch, I. 2005. Survey of clustering algorithms. IEEE Trans. Neural Networks.
[40]
Zhong, H., Xie, T., Zhang, L., Pei, J., and Mei, H. 2009. MAPO: Mining and recommending API usage patterns. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP’09).

Cited By

View all
  • (2024)Towards Generating Maintainable and Comprehensible API Code Examples2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00090(830-834)Online publication date: 12-Mar-2024
  • (2024)Richen: Automated enrichment of Git documentation with usage examples and scenariosJournal of Software: Evolution and Process10.1002/smr.2662Online publication date: 13-Mar-2024
  • (2022)Highlighting Current Issues in API Usage Mining to Enhance Software ReusabilityWSEAS TRANSACTIONS ON COMPUTER RESEARCH10.37394/232018.2022.10.410(29-34)Online publication date: 22-Mar-2022
  • Show More Cited By

Index Terms

  1. Enriching Documents with Examples: A Corpus Mining Approach

    Recommendations

    Reviews

    Scott Arthur Moody

    For a compiler fan, it's great seeing a system that parses code into abstract syntax trees (ASTs) to find, relate, and then generate semantically relevant and illustrative code samples. This paper describes a new data mining approach, called eXoaDocs, which automatically generates and relates these code samples to application programming interface (API) program descriptions, resulting in enriched example-based programming documents. By automatically creating semantically relevant code samples, the authors' system omits irrelevant code, but also organizes based on various criteria such as representativeness, frequency, conciseness, and correctness. Their browser also supports popularity ranking to help end users find the best code examples. This extensive paper provides detailed descriptions of their algorithms for organizing code samples, while contrasting clustering, ranking, and hybrid approaches. Although other successful documentation approaches rely on manually developed, high-quality examples, when dealing with massive magnitudes of code, an automation approach would be valuable. eXoaDocs is compared to other code search engines and documentation approaches. As a test, it was run on the extensive Java Development Kit (JDK) 5 source. Illustrative code documentation samples were generated for 75 percent of the code (27,000 methods). In contrast, the traditional Java documents (JavaDocs) toolset only generated illustrative samples for 2 percent of the same code. To validate their approach, a user study was conducted where numerous students were given sample problems to program. Those that had access to the eXoaDocs semantic examples had measurable productivity gains. The authors also nicely identify areas where the validity of their approach could be threatened, but it looks like this approach could play a role in future code documentation and browsing tools. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 31, Issue 1
    January 2013
    163 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/2414782
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 January 2013
    Accepted: 01 September 2012
    Revised: 01 July 2012
    Received: 01 June 2011
    Published in TOIS Volume 31, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. API document
    2. Clustering
    3. code search
    4. ranking

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 01 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Towards Generating Maintainable and Comprehensible API Code Examples2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00090(830-834)Online publication date: 12-Mar-2024
    • (2024)Richen: Automated enrichment of Git documentation with usage examples and scenariosJournal of Software: Evolution and Process10.1002/smr.2662Online publication date: 13-Mar-2024
    • (2022)Highlighting Current Issues in API Usage Mining to Enhance Software ReusabilityWSEAS TRANSACTIONS ON COMPUTER RESEARCH10.37394/232018.2022.10.410(29-34)Online publication date: 22-Mar-2022
    • (2022)Synthesising Linear API Usage Examples for API Documentation2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME55016.2022.00084(607-611)Online publication date: Oct-2022
    • (2021)Enriching API Documentation with Code Samples and Usage Scenarios from Crowd KnowledgeIEEE Transactions on Software Engineering10.1109/TSE.2019.291930447:6(1299-1314)Online publication date: 1-Jun-2021
    • (2021)APIzationProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE51524.2021.9678576(542-554)Online publication date: 15-Nov-2021
    • (2021)Characterizing top ranked code examples in GoogleJournal of Systems and Software10.1016/j.jss.2021.110971178(110971)Online publication date: Aug-2021
    • (2021)Discovering API Directives from API Specifications with Text ClassificationJournal of Computer Science and Technology10.1007/s11390-021-0235-136:4(922-943)Online publication date: 30-Jul-2021
    • (2021)An Empirical Comparison Between Tutorials and Crowd Documentation of Application Programming InterfaceJournal of Computer Science and Technology10.1007/s11390-020-0042-036:4(856-876)Online publication date: 30-Jul-2021
    • (2020)ERF: An Empirical Recommender Framework for Ascertaining Appropriate Learning Materials from Stack Overflow DiscussionsComputers10.3390/computers90300579:3(57)Online publication date: 20-Jul-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media