Abstract
Third-party libraries are an integral part of many software projects. It often happens that developers need to find analogical libraries that can provide comparable features to the libraries they are already familiar with for different programming languages or different mobile platforms. Existing methods to find analogical libraries are limited by the community-curated list of libraries, blogs, or Q&A posts, which often contain overwhelming or out-of-date information. In this paper, we present a new approach to recommend analogical libraries based on a knowledge base of analogical libraries mined from tags of millions of Stack Overflow questions. The novelty of our approach is to solve analogical-library questions by combining state-of-the-art word embedding technique and domain-specific relational and categorical knowledge mined from Stack Overflow. Given a library and a recommended analogical library, our approach further extracts questions and answer snippets in Stack Overflow about comparison of analogical libraries, which can potentially offer useful information scents for developers to further their investigation of the recommended analogical libraries. We implement our approach in a proof-of-concept web application and more than 34.8 thousands of users visited our website from November 2015 to August 2017. Our evaluation shows that our approach can make accurate recommendation of analogical libraries. We also demonstrate the usefulness of our analogical-library recommendations by using them to answer analogical-library questions in Stack Overflow. Google Analytics of our website traffic and analysis of the visitors’ interaction with website contents provide the insights into the usage patterns and the system design of our web application.
Similar content being viewed by others
Notes
A complete list can be found at https://graphofknowledge.appspot.com/libCategory
The detailed threshold to discriminate popular or unpopular queries is a commercial secret of Google.
The list of sampled questions can be found at https://graphofknowledge.appspot.com/questions
As most search engine robots do not activate javascript, robot traffic is not counted in Google Analytics (Google 2016).
References
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM sigmod record, vol 22. ACM, pp 207–216
Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, VLDB, pp 487–499
Barua A, Thomas SW, Hassan AE (2014) What are developers talking about? An analysis of topics and trends in stack overflow. Empir Softw Eng 19(3):619–654
Bird S (2006) Nltk: the natural language toolkit. In: Proceedings of the COLING/ACL on interactive presentation sessions. Association for Computational Linguistics, pp 69–72
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008(10):P10,008
Chan WK, Cheng H, Lo D (2012) Searching connected api subgraph via text phrases. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering. ACM, p 10
Chen C, Xing Z (2016a) Similartech: automatically recommend analogical libraries across different programming languages. In: 2016 31st IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 834–839
Chen C, Xing Z (2016b) Towards correlating search on google and asking on stack overflow. In: The 40th IEEE computer society international conference on computers, software & applications. IEEE, pp 83–92
Chen W, Zhang Y, Zhang M (2014) Feature embedding for dependency parsing. In: Proceedings of the international conference on computational linguistics
Chen C, Gao S, Xing Z (2016a) Mining analogical libraries in q&a discussions–incorporating relational and categorical knowledge into word embedding. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 1. IEEE, pp 338–348
Chen C, Xing Z, Han L (2016b) Techland: assisting technology landscape inquiries with insights from stack overflow. In: 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 356–366
Chen G, Chen C, Xing Z, Xu B (2016c) Learning a dual-language vector space for domain-specific cross-lingual question retrieval. In: Proceedings of the 31st IEEE/ACM International Conference On Automated Software Engineering. ACM, pp 744–755
Chen C, Xing Z, Liu Y (2017a) By the community & for the community: a deep learning approach to assist collaborative editing in q&a sites. PACMHCI 1 (CSCW):32:1–32:21
Chen C, Xing Z, Wang X (2017b) Unsupervised software-specific morphological forms inference from informal discussions. In: Proceedings of the 39th international conference on software engineering. IEEE Press, pp 450–461
Chen C, Chen X, Sun J, Xing Z, Li G (2018) Data-driven proactive policy assurance of post quality in community q&a sites. vol 2, pp 33:1–32:22
Cilibrasi RL, Vitanyi P (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383
Deshmukh J, Podder S, Sengupta S, Dubash N et al (2017) Towards accurate duplicate bug retrieval using deep learning techniques. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 115–124
Gligorov R, ten Kate W, Aleksovski Z, Van Harmelen F (2007) Using google distance to weight approximate ontology matches. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 767–776
Google (2015) Google trends. https://www.google.com.sg/trends/
Google (2016) Google analytics policy. https://support.google.com/analytics/answer/1315708?hl=en
Huang Y, Chen C, Xing Z, Lin T, Liu Y (2018) Tell them apart: distilling technology differences from crowd-scale comparison discussions. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering. ACM, pp 214–224
Kazama J, Torisawa K (2007) Exploiting wikipedia as external knowledge for named entity recognition. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 698–707
Li G, Zhu H, Lu T, Ding X, Gu N (2015) Is it good to be like wikipedia?: Exploring the trade-offs of introducing collaborative editing model to q&a sites. In: Proceedings of the 18th ACM conference on computer supported cooperative work & social computing. ACM, pp 1080–1091
Manning CD, Raghavan P, Schütze H et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge
Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781
Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Mikolov T, Yih WT, Zweig G (2013c) Linguistic regularities in continuous space word representations. HLT-NAACL 746–751
Nasehi SM, Sillito J, Maurer F, Burns C (2012) What makes a good code example?: A study of programming q&a in stackoverflow. In: 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, pp 25–34
Nguyen TT, Nguyen AT, Nguyen HA (2013) A statistical semantic language model for source code. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 532–542
Nguyen AT, Nguyen HA, Nguyen TT, Nguyen TN (2014) Statistical learning approach for mining api usage mappings for code migration. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering. ACM, pp 457–468
Nguyen TD, Nguyen AT, Nguyen TN (2016) Mapping api elements for code migration with vector representations. In: Proceedings of the 38th international conference on software engineering companion. ACM, pp 756–758
Nguyen TD, Nguyen AT, Phan HD, Nguyen TN (2017) Exploring api embedding for api usages and applications. In: Proceedings of the 39th international conference on software engineering. IEEE Press, pp 438–449
Student (1908) The probable error of a mean. Biometrika VI:1–25
Teyton C, Falleri JR, Blanc X (2012) Mining library migration graphs. In: 2012 19th working conference on reverse engineering (WCRE). IEEE, pp 289–298
Teyton C, Falleri JR, Blanc X (2013) Automatic discovery of function mappings between similar libraries. In: 2013 20th working conference on reverse engineering (WCRE). IEEE, pp 192–201
Teyton C, Falleri JR, Palyart M, Blanc X (2014) A study of library migrations in java. J Softw: Evol Process 26(11):1030–1052
Thummalapenta S, Xie T (2007) Parseweb: a programmer assistant for reusing open source code on the web. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 204–213
Thung F, Lo D, Lawall J (2013a) Automated library recommendation. In: 2013 20th working conference on reverse engineering (WCRE). IEEE, pp 182–191
Thung F, Wang S, Lo D, Lawall J (2013b) Automatic recommendation of api methods from feature requests. In: 2013 IEEE/ACM 28th international conference on automated software engineering (ASE). IEEE, pp 290–300
Turney PD (2006) Similarity of semantic relations. Comput Linguist 32(3):379–416
Van Nguyen T, Nguyen AT, Nguyen TN (2016) Characterizing api elements in software documentation with vector representation. In: Proceedings of the 38th international conference on software engineering companion. ACM, pp 749–751
Vasilescu B, Serebrenik A, Goeminne M, Mens T (2014) On the variation and specialisation of workload—a case study of the gnome ecosystem community. Empir Softw Eng 19(4):955–1008
Vu PM, Nguyen TT, Pham HV, Nguyen TT (2015) Mining user opinions in mobile app reviews: a keyword-based approach (t). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 749–759
Vu PM, Pham HV, Nguyen TT et al (2016) Phrase-based extraction of user opinions in mobile app reviews. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ACM, pp 726–731
Wang S, Lo D, Vasilescu B, Serebrenik A (2014) Entagrec: an enhanced tag recommendation system for software information sites. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 291–300
Webb GI (2006) Discovering significant rules. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 434–443
Wu Y, Wang N, Kropczynski J, Carroll JM (2017) The appropriation of github for curation. PeerJ Preprints 5:e2952v1
Xia X, Lo D, Wang X, Zhou B (2013) Tag recommendation in software information sites. In: 2013 10th IEEE working conference on mining software repositories (MSR). IEEE, pp 287–296
Xu DML, Bodık R, Kimelman D (2005) Jungloid mining: helping to navigate the api jungle. In: POPL
Xu C, Bai Y, Bian J, Gao B, Wang G, Liu X, Liu TY (2014) Rc-net: a general framework for incorporating knowledge into word representations. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management. ACM, pp 1219–1228
Xu B, Ye D, Xing Z, Xia X, Chen G, Li S (2016) Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ACM, pp 51–62
Ye D, Xing Z, Li J, Kapre N (2016a) Software-specific part-of-speech tagging: an experimental study on stack overflow. In: Proceedings of the 31st annual ACM symposium on applied computing. ACM, pp 1378–1385
Ye X, Shen H, Ma X, Bunescu R, Liu C (2016b) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th international conference on software engineering. ACM, pp 404–415
Zhong H, Xie T, Zhang L, Pei J, Mei H (2009) Mapo: mining and recommending api usage patterns. In: ECOOP 2009–Object-Oriented Programming. Springer, pp 318–343
Zhong H, Thummalapenta S, Xie T, Zhang L, Wang Q (2010) Mining api mapping for language migration. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, vol 1. ACM, pp 195–204
Zhou G, He T, Zhao J, Hu P (2015) Learning continuous word embedding with metadata for question retrieval in community question answering. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers)). Association for Computational Linguistics, Beijing, pp 250–259
Acknowledgements
We’d like to appreciate the valuable review from reviewers. This work is partially supported by the seed grant from Monash University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Yasutaka Kamei
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, C., Xing, Z. & Liu, Y. What’s Spain’s Paris? Mining analogical libraries from Q&A discussions. Empir Software Eng 24, 1155–1194 (2019). https://doi.org/10.1007/s10664-018-9657-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-018-9657-y