Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Correlation-based software search by leveraging software term database

Published: 01 October 2018 Publication History

Abstract

Internet-scale open source software (OSS) production in various communities generates abundant reusable resources for software developers. However, finding the desired and mature software with keyword queries from a considerable number of candidates, especially for the fresher, is a significant challenge because current search services often fail to understand the semantics of user queries. In this paper, we construct a software term database (STDB) by analyzing tagging data in Stack Overflow and propose a correlation-based software search (CBSS) approach that performs correlation retrieval based on the term relevance obtained from STDB. In addition, we design a novel ranking method to optimize the initial retrieval result. We explore four research questions in four experiments, respectively, to evaluate the effectiveness of the STDB and investigate the performance of the CBSS. The experiment results show that the proposed CBSS can effectively respond to keyword-based software searches and significantly outperforms other existing search services at finding mature software.

References

[1]
Frakes WB, Kang K. Software reuse research: status and future. IEEE transactions on Software Engineering, 2005, 31(7): 529---536
[2]
Yin G, Wang T, Wang H, Fan Q, Zhang Y, Yu Y, Yang C. OSSEAN: mining crowd wisdom in open source communities. In: Proceedings of IEEE Symposium on Service-oriented System Engineering. 2015, 367---371
[3]
Krueger C W. Software reuse. ACM Computing Surveys, 1992, 24(2): 131---183
[4]
Ghezzi C, Jazayeri M, Mandrioli D. Fundamentals of Software Engineering. Beijing: China Electric Power Press, 2006
[5]
Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T. Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the International Conference on Software Engineering. 2013, 842---851
[6]
Chau M, Chen H. Comparison of three vertical search spiders. Computer, 2003, 36(5): 56---62
[7]
Guha R, McCool R, Miller E. Semantic search. Bulletin of the American Society for Information Science & Technology, 2003, 36(1): 700---709
[8]
Howard M J, Gupta S, Pollock L, Vijay-Shanker K. Automatically mining software-based, semantically-similar words from comment-code mappings. In: Proceedings of the 10th Working Conference on Mining Software Repositories. 2013, 377---386
[9]
Yang J, Tan L. Swordnet: inferring semantically related words from software context. Empirical Software Engineering, 2014, 19(6): 161---170
[10]
Wang S, Lo D, Jiang L. Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: Proceedings of IEEE International Conference on Software Maintenance. 2012, 604---607
[11]
Tian Y, Lo D, Lawall J. Automated construction of a software-specific word similarity database. In: proceedings of IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering. 2014, 44---53
[12]
Meij E, Balog K, Odijk D. Entity linking and retrieval for semantic search. In: Proceedings of ACM International Conference on Web Search and Data Mining. 2014, 683---684
[13]
Rasolofo Y, Savoy J. Term proximity scoring for keyword-based retrieval systems. In: Proceedings of European Conference on Information Retrieval. 2003, 207---218
[14]
Widdows C, Duijnhouwer F. Open source maturity model. Cap Gemini Ernst & Young, 2003
[15]
Wasserman A I, PalM, Chan C. The business readiness rating: a framework for evaluating open source. EFOSS-Evaluation Framework for Open Source Software, 2006
[16]
Russo B, Damiani E, Hissam S, Lundell B, Succi G. Open Source Development, Communities and Quality. Springer US, 2008
[17]
Yu Y, Wang H, Yin G, Wang T. Reviewer recommendation for pullrequests in GitHub: What can we learn from code review and bug assignment. Information and Software Technology, 2016, 74: 204---218
[18]
Fan Q, Wang H, Yin G, Wang T. Ranking open source software based on crowd wisdom. In: Proceedings of IEEE International Conference on Software Engineering and Service Science. 2015, 966---972
[19]
Zhang Y, Yin G, Wang T, Yu Y, Wang H. Evaluating bug severity using crowd-based knowledge: an exploratory study. In: Proceedings of the 7th Asia-Pacific Symposium on Internetware. 2015
[20]
Bhat V, Gokhale A, Jadhav R, Pudipeddi J, Akoglu L. Min(e)d your tags: analysis of question response time in stackoverflow. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2014, 328---335
[21]
Pal D, Mitra M, Bhattacharya S. Exploring query categorisation for query expansion: a study. Computer Science, 2015
[22]
Miller G A. Wordnet: a lexical database for English. Communications of the ACM, 1995, 38(11): 39---41
[23]
Stanley C, Byrne M D. Predicting tags for stackoverflow posts. Proceedings of ICCM, 2013
[24]
Short L, Wong C, Zeng D. Tag recommendations in stackoverflow. San Francisco: Stanford University, 2014
[25]
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Computer Science, 2013
[26]
Jamieson S. Likert scales: how to (ab)use them. Medical Education, 2004, 38(38): 1217---1218
[27]
Manning C D, Raghavan P, Tze H. Introduction to Information Retrieval. Beijing: Posts & Telecom Press, 2010
[28]
Aula A, Majaranta P, Räihä K J. Eye-tracking reveals the personal styles for search result evaluation. In: Proceedings of IFIP Conference on Human-Computer Interaction. 2005, 1058---1061
[29]
Hucka M, Graham M J. Software search is not a science, even among scientists. 2016, arXiv preprint arXiv:1605.02265
[30]
Bissyande T F, Thung F, Lo D, Jiang L, Reveillere L. Orion: a software project search engine with integrated diverse software artifacts. In: Proceedings of the International Conference on Engineering of Complex Computer Systems. 2013, 242---245
[31]
Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P P. Sourcerer: mining and searching Internet-scale software repositories. Data Mining and Knowledge Discovery, 2009, 18(2): 300---336
[32]
Lu M, Sun X,Wang S, Lo D. Query expansion via wordnet for effective code search. In: Proceedings of IEEE International Conference on Software Analysis, Evolution and Reengineering. 2015, 545---549
[33]
Nie L, Jiang H, Ren Z, Sun Z, Li X. Query expansion based on crowd knowledge for code search. IEEE Transactions on Services Computing, 2016, 9(5): 771---783
[34]
Lv F, Zhang H, Lou J, Wang S, Zhang D, Zhao J. Codehow: effective code search based on API understanding and extended boolean model(e). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering. 2015, 260---270
[35]
McMillan C, Grechanik M, Poshyvanyk D, Fu C, Xie Q. Exemplar: a source code search engine for finding highly relevant applications. IEEE Transactions on Software Engineering, 2012, 38(5): 1069---1087
[36]
Sridhara G, Hill E, Pollock L, Vijay-Shanker K. Identifying word relations in software: a comparative study of semantic similarity tools. In: Proceedings of IEEE International Conference on Program Comprehension. 2008, 123---132
[37]
Wang S, Lo D, Jiang L. Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: Proceedings of IEEE International Conference on Software Maintenance. 2012, 604---607
[38]
Tian Y, Lo D, Lawall J. SEWordSim: software-specific word similarity database. In: Proceedings of the 36th ACM International Conference on Software Engineering. 2014, 568---571
[39]
Bhat V, Gokhale A, Jadhav R, Pudipeddi J, Akoglu L. Min(e)d your tags: analysis of question response time in stackoverflow. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2014, 328---335
[40]
Wang S, Lo D, Vasilescu B, Serebrenik A. Entagrec: an enhanced tag recommendation system for software information sites. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution. 2014, 291---300
[41]
Mo W, Zhu J, Qian Z, Shen B. SOLinker: constructing semantic links between tags and URLs on StackOverflow. In: Proceedings of the 40th IEEE Annual Computer Software and Applications Conference. 2016, 582---591
[42]
Chen C, Gao S, Xing Z. Mining analogical libraries in Q&A discussions---incorporating relational and categorical knowledge into word embedding In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 2016, 338---348

Cited By

View all
  • (2023)A Systematic Review of Automated Query Reformulations in Source Code SearchACM Transactions on Software Engineering and Methodology10.1145/360717932:6(1-79)Online publication date: 4-Jul-2023
  1. Correlation-based software search by leveraging software term database

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Frontiers of Computer Science: Selected Publications from Chinese Universities
    Frontiers of Computer Science: Selected Publications from Chinese Universities  Volume 12, Issue 5
    October 2018
    213 pages
    ISSN:2095-2228
    EISSN:2095-2236
    Issue’s Table of Contents

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 October 2018

    Author Tags

    1. open source software
    2. software retrieval
    3. software term database

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A Systematic Review of Automated Query Reformulations in Source Code SearchACM Transactions on Software Engineering and Methodology10.1145/360717932:6(1-79)Online publication date: 4-Jul-2023

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media