research-article

Enriching Documents with Examples: A Corpus Mining Approach

Authors:

Jinhan Kim,

Sanghoon Lee,

Seung-Won Hwang,

Sunghun KimAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 31, Issue 1

Article No.: 1, Pages 1 - 27

https://doi.org/10.1145/2414782.2414783

Published: 01 January 2013 Publication History

Get Access

Abstract

Software developers increasingly rely on information from the Web, such as documents or code examples on application programming interfaces (APIs), to facilitate their development processes. However, API documents often do not include enough information for developers to fully understand how to use the APIs, and searching for good code examples requires considerable effort.

To address this problem, we propose a novel code example recommendation system that combines the strength of browsing documents and searching for code examples and returns API documents embedded with high-quality code example summaries mined from the Web. Our evaluation results show that our approach provides code examples with high precision and boosts programmer productivity.

References

[1]

Agrawal, R., Gollapudi, S., Halverson, A., and Ieong, S. 2009. Diversifying search results. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM’09).

Digital Library

Google Scholar

[2]

ASF. 2010. The Apache Software Foundation. http://www.apache.org/.

Google Scholar

[3]

Bajracharya, S. and Lopes, C. 2009. Mining search topics from a code search engine log. In Proceedings of the 6th Working Conference on Mining Software Repositories (MSR’09).

Digital Library

Google Scholar

[4]

Balabanović, M. and Shoham, Y. 1997. Fab: Content-based, collaborative recommendation. Commun. ACM.

Digital Library

Google Scholar

[5]

Campbell, N. A. 1984. Some aspects of allocation and discrimination. In Multivariate Statistical Methods in Physical Anthropology. Springer, Berlin. 177--192.

Google Scholar

[6]

Devanbu, P., Karstu, S., Melo, W., and Thomas, W. 1996. Analytical and empirical evaluation of software reuse metrics. In Proceedings of the International Conference on Software Engineering (ICSE’96).

Digital Library

Google Scholar

[7]

Erkan, G. and Radev, D. R. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. In J. Artif. Intell. Res.

Digital Library

Google Scholar

[8]

Fleiss. 2010. Fleiss’ kappa. http://en.wikipedia.org/wiki/Fleiss’_kappa.

Google Scholar

[9]

Foody, G. M., Campbell, N., Trod, N., and Wood, T. 1992. Derivation and applications of probabilistic measures of class membership from the maximum likelihood classification. Photogramm. Eng. Remote Sens. 58, 1335--1341.

Google Scholar

[10]

Gaffney, J. E. and Durek, T. A. 1989. Software reuse---key to enhanced productivity: Some quantitative models. Inf. Softw. Technol.

Digital Library

Google Scholar

[11]

Google. 2010. Google Code Search. http://www.google.com/codesearch.

Google Scholar

[12]

Guéhéneuc, Y.-G. and Antoniol, G. 2008. DeMIMA: A multilayered approach for design pattern identification. IEEE Trans. Softw. Eng.

Digital Library

Google Scholar

[13]

Holmes, R. and Murphy, G. C. 2005. Using structural context to recommend source code examples. In Proceedings of the International Conference on Software Engineering (ICSE’05).

Digital Library

Google Scholar

[14]

Holmes, R. and Walker, R. J. 2007. Informing Eclipse API production and consumption. In Proceedings of the ETX-Eclipse Technology Exchange Conference.

Digital Library

Google Scholar

[15]

Holmes, R. and Walker, R. J. 2008. A newbie’s guide to eclipse APIs. In Proceedings of the 5th Working Conference on Mining Software Repositories (MSR’08).

Digital Library

Google Scholar

[16]

Horwitz, S., Reps, T., and Binkley, D. 1988. Interprocedural slicing using dependence graphs. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI’88).

Digital Library

Google Scholar

[17]

Jaccard, P. 1901. E’tude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin del la Socie’te’ Vaudoise des Sciences Naturelles.

Google Scholar

[18]

Java Examples. 2010. Example source code. http://java2s.com/.

Google Scholar

[19]

Java2Xml. 2010. Java2XML Project Home Page. https://java2xml.dev.java.net/.

Google Scholar

[20]

Jiang, L., Misherghi, G., Su, Z., and Glondu, S. 2007. DECKARD: Scalable and accurate tree-based detection of code clones. In Proceedings of the International Conference on Software Engineering (ICSE’07).

Digital Library

Google Scholar

[21]

Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’02).

Digital Library

Google Scholar

[22]

Kim, J., Lee, S., Hwang, S.-W., and Kim, S. 2009. Adding examples into java documents. In Proceedings of the International Conference on Automated Software Engineering (ASE’09).

Digital Library

Google Scholar

[23]

Kim, J., Lee, S., Hwang, S.-W., and Kim, S. 2010. Towards an intelligent code search engine. In Proceedings of the National Conference of the American Association for Artificial Intelligence (AAAI’10).

Google Scholar

[24]

Kim, S., Pan, K., and Whitehead, Jr., E. E. J. 2006. Memories of bug fixes. In Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering.

Digital Library

Google Scholar

[25]

KodeJava. 2010. Learn Java Programming by Examples. http://www.kodejava.org/.

Google Scholar

[26]

Koders. 2010. Open Source Code Search Engine. http://www.koders.com.

Google Scholar

[27]

Leopard. 2010. Leopard Reference Library. http://developer.apple.com/referencelibrary/index.html.

Google Scholar

[28]

Lim, W. C. 1994. Effects of reuse on quality, productivity, and economics. IEEE Softw.

Digital Library

Google Scholar

[29]

MSDN. 2010. MSDN Library. http://msdn.microsoft.com/en-us/library/default.aspx.

Google Scholar

[30]

Page, L., Brin, S., Motwani, R., and Winograd, T. 1999. The pagerank citation ranking: Bringing order to the web. Tech. rep.

Google Scholar

[31]

Pelleg, D. and Moore, A. 2000. X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the International Conference on Machine Learning (ICML’00).

Digital Library

Google Scholar

[32]

PHP. 2010. Hypertext Preprocessor. http://www.php.net.

Google Scholar

[33]

Radlinski, F. and Joachims, T. 2007. Active exploration for learning rankings from clickthrough data. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’07).

Digital Library

Google Scholar

[34]

Radlinski, F., Kurup, M., and Joachims, T. 2008. How does clickthrough data reflect retrieval quality? In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM’08).

Digital Library

Google Scholar

[35]

Sahavechaphan, N. and Claypool, K. T. 2006. XSnippet: Mining for sample code. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’06).

Digital Library

Google Scholar

[36]

Thummalapenta, S. and Xie, T. 2008. SpotWeb: Detecting framework hotspots via mining open source repositories on the web. In Proceedings of the Working Conference on Mining Software Repositories (MSR’08).

Digital Library

Google Scholar

[37]

Weiser, M. 1981. Program slicing. In Proceedings of the International Conference on Software Engineering (ICSE’81).

Digital Library

Google Scholar

[38]

Xie, T. and Pei, J. 2006. MAPO: Mining API usages from open source repositories. In Proceedings of the Working Conference on Mining Software Repositories (MSR’06).

Digital Library

Google Scholar

[39]

Xu, R. and Wunsch, I. 2005. Survey of clustering algorithms. IEEE Trans. Neural Networks.

Digital Library

Google Scholar

[40]

Zhong, H., Xie, T., Zhang, L., Pei, J., and Mei, H. 2009. MAPO: Mining and recommending API usage patterns. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP’09).

Digital Library

Google Scholar

Cited By

View all

Alharbi SKolovos DMatragkas N(2024)Towards Generating Maintainable and Comprehensible API Code Examples2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00090(830-834)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00090
Shen CYang WJia HPan MZhou Y(2024)Richen: Automated enrichment of Git documentation with usage examples and scenariosJournal of Software: Evolution and Process10.1002/smr.2662Online publication date: 13-Mar-2024
https://doi.org/10.1002/smr.2662
M. Ishag MPark HLi DRyu K(2022)Highlighting Current Issues in API Usage Mining to Enhance Software ReusabilityWSEAS TRANSACTIONS ON COMPUTER RESEARCH10.37394/232018.2022.10.410(29-34)Online publication date: 22-Mar-2022
https://doi.org/10.37394/232018.2022.10.4
Show More Cited By

Index Terms

Enriching Documents with Examples: A Corpus Mining Approach
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Adding Examples into Java Documents
ASE '09: Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering

Code examples play an important role to explain the usage of Application Programming Interfaces (APIs), but most API documents do not provide sufficient code examples. For example, for the JDK 5 documents (JavaDocs), only 2% of APIs have code examples. ...
Learning to rank code examples for code search engines

Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user'...
Portfolio: finding relevant functions and their usage
ICSE '11: Proceedings of the 33rd International Conference on Software Engineering

Different studies show that programmers are more interested in finding definitions of functions and their uses than variables, statements, or arbitrary code fragments [30, 29, 31]. Therefore, programmers require support in finding relevant functions and ...

Reviews

Reviewer: Scott Arthur Moody

For a compiler fan, it's great seeing a system that parses code into abstract syntax trees (ASTs) to find, relate, and then generate semantically relevant and illustrative code samples. This paper describes a new data mining approach, called eXoaDocs, which automatically generates and relates these code samples to application programming interface (API) program descriptions, resulting in enriched example-based programming documents. By automatically creating semantically relevant code samples, the authors' system omits irrelevant code, but also organizes based on various criteria such as representativeness, frequency, conciseness, and correctness. Their browser also supports popularity ranking to help end users find the best code examples. This extensive paper provides detailed descriptions of their algorithms for organizing code samples, while contrasting clustering, ranking, and hybrid approaches. Although other successful documentation approaches rely on manually developed, high-quality examples, when dealing with massive magnitudes of code, an automation approach would be valuable. eXoaDocs is compared to other code search engines and documentation approaches. As a test, it was run on the extensive Java Development Kit (JDK) 5 source. Illustrative code documentation samples were generated for 75 percent of the code (27,000 methods). In contrast, the traditional Java documents (JavaDocs) toolset only generated illustrative samples for 2 percent of the same code. To validate their approach, a user study was conducted where numerous students were given sample problems to program. Those that had access to the eXoaDocs semantic examples had measurable productivity gains. The authors also nicely identify areas where the validity of their approach could be threatened, but it looks like this approach could play a role in future code documentation and browsing tools. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

ACM Transactions on Information Systems Volume 31, Issue 1

January 2013

163 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/2414782

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2013

Accepted: 01 September 2012

Revised: 01 July 2012

Received: 01 June 2011

Published in TOIS Volume 31, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Ministry of Education, Science and Technology

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

47
Total Citations
View Citations
820
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Alharbi SKolovos DMatragkas N(2024)Towards Generating Maintainable and Comprehensible API Code Examples2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00090(830-834)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00090
Shen CYang WJia HPan MZhou Y(2024)Richen: Automated enrichment of Git documentation with usage examples and scenariosJournal of Software: Evolution and Process10.1002/smr.2662Online publication date: 13-Mar-2024
https://doi.org/10.1002/smr.2662
M. Ishag MPark HLi DRyu K(2022)Highlighting Current Issues in API Usage Mining to Enhance Software ReusabilityWSEAS TRANSACTIONS ON COMPUTER RESEARCH10.37394/232018.2022.10.410(29-34)Online publication date: 22-Mar-2022
https://doi.org/10.37394/232018.2022.10.4
Alharbi SKolovos DMatragkas N(2022)Synthesising Linear API Usage Examples for API Documentation2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME55016.2022.00084(607-611)Online publication date: Oct-2022
https://doi.org/10.1109/ICSME55016.2022.00084
Zhang JJiang HRen ZZhang THuang Z(2021)Enriching API Documentation with Code Samples and Usage Scenarios from Crowd KnowledgeIEEE Transactions on Software Engineering10.1109/TSE.2019.291930447:6(1299-1314)Online publication date: 1-Jun-2021
https://doi.org/10.1109/TSE.2019.2919304
Terragni VSalza PGrundy J(2021)APIzationProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE51524.2021.9678576(542-554)Online publication date: 15-Nov-2021
https://dl.acm.org/doi/10.1109/ASE51524.2021.9678576
Hora A(2021)Characterizing top ranked code examples in GoogleJournal of Systems and Software10.1016/j.jss.2021.110971178(110971)Online publication date: Aug-2021
https://doi.org/10.1016/j.jss.2021.110971
Zhang JTao CHuang ZChen X(2021)Discovering API Directives from API Specifications with Text ClassificationJournal of Computer Science and Technology10.1007/s11390-021-0235-136:4(922-943)Online publication date: 30-Jul-2021
https://doi.org/10.1007/s11390-021-0235-1
Tang YRen ZJiang HLi XKong W(2021)An Empirical Comparison Between Tutorials and Crowd Documentation of Application Programming InterfaceJournal of Computer Science and Technology10.1007/s11390-020-0042-036:4(856-876)Online publication date: 30-Jul-2021
https://doi.org/10.1007/s11390-020-0042-0
Iqbal AKhatun SArefin MDewan M(2020)ERF: An Empirical Recommender Framework for Ascertaining Appropriate Learning Materials from Stack Overflow DiscussionsComputers10.3390/computers90300579:3(57)Online publication date: 20-Jul-2020
https://doi.org/10.3390/computers9030057
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Adding Examples into Java Documents

Learning to rank code examples for code search engines

Portfolio: finding relevant functions and their usage

Reviews

Access critical reviews of Computing literature here

Comments

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Adding Examples into Java Documents

Learning to rank code examples for code search engines

Portfolio: finding relevant functions and their usage

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations