Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content

Correlation-based software search by leveraging software term database

Published: 01 October 2018 Publication History


Internet-scale open source software (OSS) production in various communities generates abundant reusable resources for software developers. However, finding the desired and mature software with keyword queries from a considerable number of candidates, especially for the fresher, is a significant challenge because current search services often fail to understand the semantics of user queries. In this paper, we construct a software term database (STDB) by analyzing tagging data in Stack Overflow and propose a correlation-based software search (CBSS) approach that performs correlation retrieval based on the term relevance obtained from STDB. In addition, we design a novel ranking method to optimize the initial retrieval result. We explore four research questions in four experiments, respectively, to evaluate the effectiveness of the STDB and investigate the performance of the CBSS. The experiment results show that the proposed CBSS can effectively respond to keyword-based software searches and significantly outperforms other existing search services at finding mature software.


Frakes WB, Kang K. Software reuse research: status and future. IEEE transactions on Software Engineering, 2005, 31(7): 529---536
Yin G, Wang T, Wang H, Fan Q, Zhang Y, Yu Y, Yang C. OSSEAN: mining crowd wisdom in open source communities. In: Proceedings of IEEE Symposium on Service-oriented System Engineering. 2015, 367---371
Krueger C W. Software reuse. ACM Computing Surveys, 1992, 24(2): 131---183
Ghezzi C, Jazayeri M, Mandrioli D. Fundamentals of Software Engineering. Beijing: China Electric Power Press, 2006
Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T. Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the International Conference on Software Engineering. 2013, 842---851
Chau M, Chen H. Comparison of three vertical search spiders. Computer, 2003, 36(5): 56---62
Guha R, McCool R, Miller E. Semantic search. Bulletin of the American Society for Information Science & Technology, 2003, 36(1): 700---709
Howard M J, Gupta S, Pollock L, Vijay-Shanker K. Automatically mining software-based, semantically-similar words from comment-code mappings. In: Proceedings of the 10th Working Conference on Mining Software Repositories. 2013, 377---386
Yang J, Tan L. Swordnet: inferring semantically related words from software context. Empirical Software Engineering, 2014, 19(6): 161---170
Wang S, Lo D, Jiang L. Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: Proceedings of IEEE International Conference on Software Maintenance. 2012, 604---607
Tian Y, Lo D, Lawall J. Automated construction of a software-specific word similarity database. In: proceedings of IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering. 2014, 44---53
Meij E, Balog K, Odijk D. Entity linking and retrieval for semantic search. In: Proceedings of ACM International Conference on Web Search and Data Mining. 2014, 683---684
Rasolofo Y, Savoy J. Term proximity scoring for keyword-based retrieval systems. In: Proceedings of European Conference on Information Retrieval. 2003, 207---218
Widdows C, Duijnhouwer F. Open source maturity model. Cap Gemini Ernst & Young, 2003
Wasserman A I, PalM, Chan C. The business readiness rating: a framework for evaluating open source. EFOSS-Evaluation Framework for Open Source Software, 2006
Russo B, Damiani E, Hissam S, Lundell B, Succi G. Open Source Development, Communities and Quality. Springer US, 2008
Yu Y, Wang H, Yin G, Wang T. Reviewer recommendation for pullrequests in GitHub: What can we learn from code review and bug assignment. Information and Software Technology, 2016, 74: 204---218
Fan Q, Wang H, Yin G, Wang T. Ranking open source software based on crowd wisdom. In: Proceedings of IEEE International Conference on Software Engineering and Service Science. 2015, 966---972
Zhang Y, Yin G, Wang T, Yu Y, Wang H. Evaluating bug severity using crowd-based knowledge: an exploratory study. In: Proceedings of the 7th Asia-Pacific Symposium on Internetware. 2015
Bhat V, Gokhale A, Jadhav R, Pudipeddi J, Akoglu L. Min(e)d your tags: analysis of question response time in stackoverflow. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2014, 328---335
Pal D, Mitra M, Bhattacharya S. Exploring query categorisation for query expansion: a study. Computer Science, 2015
Miller G A. Wordnet: a lexical database for English. Communications of the ACM, 1995, 38(11): 39---41
Stanley C, Byrne M D. Predicting tags for stackoverflow posts. Proceedings of ICCM, 2013
Short L, Wong C, Zeng D. Tag recommendations in stackoverflow. San Francisco: Stanford University, 2014
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Computer Science, 2013
Jamieson S. Likert scales: how to (ab)use them. Medical Education, 2004, 38(38): 1217---1218
Manning C D, Raghavan P, Tze H. Introduction to Information Retrieval. Beijing: Posts & Telecom Press, 2010
Aula A, Majaranta P, Räihä K J. Eye-tracking reveals the personal styles for search result evaluation. In: Proceedings of IFIP Conference on Human-Computer Interaction. 2005, 1058---1061
Hucka M, Graham M J. Software search is not a science, even among scientists. 2016, arXiv preprint arXiv:1605.02265
Bissyande T F, Thung F, Lo D, Jiang L, Reveillere L. Orion: a software project search engine with integrated diverse software artifacts. In: Proceedings of the International Conference on Engineering of Complex Computer Systems. 2013, 242---245
Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P P. Sourcerer: mining and searching Internet-scale software repositories. Data Mining and Knowledge Discovery, 2009, 18(2): 300---336
Lu M, Sun X,Wang S, Lo D. Query expansion via wordnet for effective code search. In: Proceedings of IEEE International Conference on Software Analysis, Evolution and Reengineering. 2015, 545---549
Nie L, Jiang H, Ren Z, Sun Z, Li X. Query expansion based on crowd knowledge for code search. IEEE Transactions on Services Computing, 2016, 9(5): 771---783
Lv F, Zhang H, Lou J, Wang S, Zhang D, Zhao J. Codehow: effective code search based on API understanding and extended boolean model(e). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering. 2015, 260---270
McMillan C, Grechanik M, Poshyvanyk D, Fu C, Xie Q. Exemplar: a source code search engine for finding highly relevant applications. IEEE Transactions on Software Engineering, 2012, 38(5): 1069---1087
Sridhara G, Hill E, Pollock L, Vijay-Shanker K. Identifying word relations in software: a comparative study of semantic similarity tools. In: Proceedings of IEEE International Conference on Program Comprehension. 2008, 123---132
Wang S, Lo D, Jiang L. Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: Proceedings of IEEE International Conference on Software Maintenance. 2012, 604---607
Tian Y, Lo D, Lawall J. SEWordSim: software-specific word similarity database. In: Proceedings of the 36th ACM International Conference on Software Engineering. 2014, 568---571
Bhat V, Gokhale A, Jadhav R, Pudipeddi J, Akoglu L. Min(e)d your tags: analysis of question response time in stackoverflow. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2014, 328---335
Wang S, Lo D, Vasilescu B, Serebrenik A. Entagrec: an enhanced tag recommendation system for software information sites. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution. 2014, 291---300
Mo W, Zhu J, Qian Z, Shen B. SOLinker: constructing semantic links between tags and URLs on StackOverflow. In: Proceedings of the 40th IEEE Annual Computer Software and Applications Conference. 2016, 582---591
Chen C, Gao S, Xing Z. Mining analogical libraries in Q&A discussions---incorporating relational and categorical knowledge into word embedding In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 2016, 338---348

Cited By

View all
  • (2023)A Systematic Review of Automated Query Reformulations in Source Code SearchACM Transactions on Software Engineering and Methodology10.1145/360717932:6(1-79)Online publication date: 4-Jul-2023
  1. Correlation-based software search by leveraging software term database



    Information & Contributors


    Published In

    cover image Frontiers of Computer Science: Selected Publications from Chinese Universities
    Frontiers of Computer Science: Selected Publications from Chinese Universities  Volume 12, Issue 5
    October 2018
    213 pages
    Issue’s Table of Contents



    Berlin, Heidelberg

    Publication History

    Published: 01 October 2018

    Author Tags

    1. open source software
    2. software retrieval
    3. software term database


    • Article


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics


    Cited By

    View all
    • (2023)A Systematic Review of Automated Query Reformulations in Source Code SearchACM Transactions on Software Engineering and Methodology10.1145/360717932:6(1-79)Online publication date: 4-Jul-2023

    View Options

    View options






    Share this Publication link

    Share on social media