Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

What’s Spain’s Paris? Mining analogical libraries from Q&A discussions

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Third-party libraries are an integral part of many software projects. It often happens that developers need to find analogical libraries that can provide comparable features to the libraries they are already familiar with for different programming languages or different mobile platforms. Existing methods to find analogical libraries are limited by the community-curated list of libraries, blogs, or Q&A posts, which often contain overwhelming or out-of-date information. In this paper, we present a new approach to recommend analogical libraries based on a knowledge base of analogical libraries mined from tags of millions of Stack Overflow questions. The novelty of our approach is to solve analogical-library questions by combining state-of-the-art word embedding technique and domain-specific relational and categorical knowledge mined from Stack Overflow. Given a library and a recommended analogical library, our approach further extracts questions and answer snippets in Stack Overflow about comparison of analogical libraries, which can potentially offer useful information scents for developers to further their investigation of the recommended analogical libraries. We implement our approach in a proof-of-concept web application and more than 34.8 thousands of users visited our website from November 2015 to August 2017. Our evaluation shows that our approach can make accurate recommendation of analogical libraries. We also demonstrate the usefulness of our analogical-library recommendations by using them to answer analogical-library questions in Stack Overflow. Google Analytics of our website traffic and analysis of the visitors’ interaction with website contents provide the insights into the usage patterns and the system design of our web application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://en.wikipedia.org/wiki/List_of_unit_testing_frameworks#cite_note-496

  2. https://meta.stackexchange.com/questions/77808/does-it-matter-the-order-you-tag-your-question

  3. https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

  4. A complete list can be found at https://graphofknowledge.appspot.com/libCategory

  5. https://code.google.com/p/word2vec/

  6. https://en.wikipedia.org/wiki/Search_engine_optimization

  7. https://webmasters.googleblog.com/2016/03/continuing-to-make-web-more-mobile.html

  8. https://goo.gl/nb5czF

  9. The detailed threshold to discriminate popular or unpopular queries is a commercial secret of Google.

  10. https://github.com/ClearTK/cleartk

  11. The list of sampled questions can be found at https://graphofknowledge.appspot.com/questions

  12. https://analytics.google.com/

  13. As most search engine robots do not activate javascript, robot traffic is not counted in Google Analytics (Google 2016).

  14. http://ipinfo.io/

  15. http://stackoverflow.com/questions/212151/

  16. www.similarweb.com/

  17. http://alternativeto.net/

  18. https://graphofknowledge.appspot.com/similartech

References

  • Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM sigmod record, vol 22. ACM, pp 207–216

  • Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, VLDB, pp 487–499

  • Barua A, Thomas SW, Hassan AE (2014) What are developers talking about? An analysis of topics and trends in stack overflow. Empir Softw Eng 19(3):619–654

    Article  Google Scholar 

  • Bird S (2006) Nltk: the natural language toolkit. In: Proceedings of the COLING/ACL on interactive presentation sessions. Association for Computational Linguistics, pp 69–72

  • Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008(10):P10,008

    Article  Google Scholar 

  • Chan WK, Cheng H, Lo D (2012) Searching connected api subgraph via text phrases. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering. ACM, p 10

  • Chen C, Xing Z (2016a) Similartech: automatically recommend analogical libraries across different programming languages. In: 2016 31st IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 834–839

  • Chen C, Xing Z (2016b) Towards correlating search on google and asking on stack overflow. In: The 40th IEEE computer society international conference on computers, software & applications. IEEE, pp 83–92

  • Chen W, Zhang Y, Zhang M (2014) Feature embedding for dependency parsing. In: Proceedings of the international conference on computational linguistics

  • Chen C, Gao S, Xing Z (2016a) Mining analogical libraries in q&a discussions–incorporating relational and categorical knowledge into word embedding. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 1. IEEE, pp 338–348

  • Chen C, Xing Z, Han L (2016b) Techland: assisting technology landscape inquiries with insights from stack overflow. In: 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 356–366

  • Chen G, Chen C, Xing Z, Xu B (2016c) Learning a dual-language vector space for domain-specific cross-lingual question retrieval. In: Proceedings of the 31st IEEE/ACM International Conference On Automated Software Engineering. ACM, pp 744–755

  • Chen C, Xing Z, Liu Y (2017a) By the community & for the community: a deep learning approach to assist collaborative editing in q&a sites. PACMHCI 1 (CSCW):32:1–32:21

    Google Scholar 

  • Chen C, Xing Z, Wang X (2017b) Unsupervised software-specific morphological forms inference from informal discussions. In: Proceedings of the 39th international conference on software engineering. IEEE Press, pp 450–461

  • Chen C, Chen X, Sun J, Xing Z, Li G (2018) Data-driven proactive policy assurance of post quality in community q&a sites. vol 2, pp 33:1–32:22

  • Cilibrasi RL, Vitanyi P (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383

    Article  Google Scholar 

  • Deshmukh J, Podder S, Sengupta S, Dubash N et al (2017) Towards accurate duplicate bug retrieval using deep learning techniques. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 115–124

  • Gligorov R, ten Kate W, Aleksovski Z, Van Harmelen F (2007) Using google distance to weight approximate ontology matches. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 767–776

  • Google (2015) Google trends. https://www.google.com.sg/trends/

  • Google (2016) Google analytics policy. https://support.google.com/analytics/answer/1315708?hl=en

  • Huang Y, Chen C, Xing Z, Lin T, Liu Y (2018) Tell them apart: distilling technology differences from crowd-scale comparison discussions. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering. ACM, pp 214–224

  • Kazama J, Torisawa K (2007) Exploiting wikipedia as external knowledge for named entity recognition. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 698–707

  • Li G, Zhu H, Lu T, Ding X, Gu N (2015) Is it good to be like wikipedia?: Exploring the trade-offs of introducing collaborative editing model to q&a sites. In: Proceedings of the 18th ACM conference on computer supported cooperative work & social computing. ACM, pp 1080–1091

  • Manning CD, Raghavan P, Schütze H et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60

  • Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781

  • Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  • Mikolov T, Yih WT, Zweig G (2013c) Linguistic regularities in continuous space word representations. HLT-NAACL 746–751

  • Nasehi SM, Sillito J, Maurer F, Burns C (2012) What makes a good code example?: A study of programming q&a in stackoverflow. In: 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, pp 25–34

  • Nguyen TT, Nguyen AT, Nguyen HA (2013) A statistical semantic language model for source code. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 532–542

  • Nguyen AT, Nguyen HA, Nguyen TT, Nguyen TN (2014) Statistical learning approach for mining api usage mappings for code migration. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering. ACM, pp 457–468

  • Nguyen TD, Nguyen AT, Nguyen TN (2016) Mapping api elements for code migration with vector representations. In: Proceedings of the 38th international conference on software engineering companion. ACM, pp 756–758

  • Nguyen TD, Nguyen AT, Phan HD, Nguyen TN (2017) Exploring api embedding for api usages and applications. In: Proceedings of the 39th international conference on software engineering. IEEE Press, pp 438–449

  • Student (1908) The probable error of a mean. Biometrika VI:1–25

    Article  Google Scholar 

  • Teyton C, Falleri JR, Blanc X (2012) Mining library migration graphs. In: 2012 19th working conference on reverse engineering (WCRE). IEEE, pp 289–298

  • Teyton C, Falleri JR, Blanc X (2013) Automatic discovery of function mappings between similar libraries. In: 2013 20th working conference on reverse engineering (WCRE). IEEE, pp 192–201

  • Teyton C, Falleri JR, Palyart M, Blanc X (2014) A study of library migrations in java. J Softw: Evol Process 26(11):1030–1052

    Google Scholar 

  • Thummalapenta S, Xie T (2007) Parseweb: a programmer assistant for reusing open source code on the web. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 204–213

  • Thung F, Lo D, Lawall J (2013a) Automated library recommendation. In: 2013 20th working conference on reverse engineering (WCRE). IEEE, pp 182–191

  • Thung F, Wang S, Lo D, Lawall J (2013b) Automatic recommendation of api methods from feature requests. In: 2013 IEEE/ACM 28th international conference on automated software engineering (ASE). IEEE, pp 290–300

  • Turney PD (2006) Similarity of semantic relations. Comput Linguist 32(3):379–416

    Article  MATH  Google Scholar 

  • Van Nguyen T, Nguyen AT, Nguyen TN (2016) Characterizing api elements in software documentation with vector representation. In: Proceedings of the 38th international conference on software engineering companion. ACM, pp 749–751

  • Vasilescu B, Serebrenik A, Goeminne M, Mens T (2014) On the variation and specialisation of workload—a case study of the gnome ecosystem community. Empir Softw Eng 19(4):955–1008

    Article  Google Scholar 

  • Vu PM, Nguyen TT, Pham HV, Nguyen TT (2015) Mining user opinions in mobile app reviews: a keyword-based approach (t). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 749–759

  • Vu PM, Pham HV, Nguyen TT et al (2016) Phrase-based extraction of user opinions in mobile app reviews. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ACM, pp 726–731

  • Wang S, Lo D, Vasilescu B, Serebrenik A (2014) Entagrec: an enhanced tag recommendation system for software information sites. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 291–300

  • Webb GI (2006) Discovering significant rules. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 434–443

  • Wu Y, Wang N, Kropczynski J, Carroll JM (2017) The appropriation of github for curation. PeerJ Preprints 5:e2952v1

    Google Scholar 

  • Xia X, Lo D, Wang X, Zhou B (2013) Tag recommendation in software information sites. In: 2013 10th IEEE working conference on mining software repositories (MSR). IEEE, pp 287–296

  • Xu DML, Bodık R, Kimelman D (2005) Jungloid mining: helping to navigate the api jungle. In: POPL

  • Xu C, Bai Y, Bian J, Gao B, Wang G, Liu X, Liu TY (2014) Rc-net: a general framework for incorporating knowledge into word representations. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management. ACM, pp 1219–1228

  • Xu B, Ye D, Xing Z, Xia X, Chen G, Li S (2016) Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ACM, pp 51–62

  • Ye D, Xing Z, Li J, Kapre N (2016a) Software-specific part-of-speech tagging: an experimental study on stack overflow. In: Proceedings of the 31st annual ACM symposium on applied computing. ACM, pp 1378–1385

  • Ye X, Shen H, Ma X, Bunescu R, Liu C (2016b) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th international conference on software engineering. ACM, pp 404–415

  • Zhong H, Xie T, Zhang L, Pei J, Mei H (2009) Mapo: mining and recommending api usage patterns. In: ECOOP 2009–Object-Oriented Programming. Springer, pp 318–343

  • Zhong H, Thummalapenta S, Xie T, Zhang L, Wang Q (2010) Mining api mapping for language migration. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, vol 1. ACM, pp 195–204

  • Zhou G, He T, Zhao J, Hu P (2015) Learning continuous word embedding with metadata for question retrieval in community question answering. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers)). Association for Computational Linguistics, Beijing, pp 250–259

Download references

Acknowledgements

We’d like to appreciate the valuable review from reviewers. This work is partially supported by the seed grant from Monash University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunyang Chen.

Additional information

Communicated by: Yasutaka Kamei

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, C., Xing, Z. & Liu, Y. What’s Spain’s Paris? Mining analogical libraries from Q&A discussions. Empir Software Eng 24, 1155–1194 (2019). https://doi.org/10.1007/s10664-018-9657-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-018-9657-y

Keywords