Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3131704.3131706acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

Scalable Relevant Project Recommendation on GitHub

Published: 23 September 2017 Publication History

Abstract

GitHub, one of the largest social coding platforms, fosters a flexible and collaborative development process. In practice, developers in the open source software platform need to find projects relevant to their development work to reuse their function, explore ideas of possible features, or analyze the requirements for their projects. Recommending relevant projects to a developer is a difficult problem considering that there are millions of projects hosted on GitHub, and different developers may have different requirements on relevant projects. In this paper, we propose a scalable and personalized approach to recommend projects by leveraging both developers' behaviors and project features. Based on the features of projects created by developers and their behaviors to other projects, our approach automatically recommends top N most relevant software projects to developers. Moreover, to improve the scalability of our approach, we implement our approach in a parallel processing frame (i.e., Apache Spark) to analyze large-scale data on GitHub for efficient recommendation. We perform an empirical study on the data crawled from GitHub, and the results show that our approach can efficiently recommend relevant software projects with a relatively high precision fit for developers' interests.

References

[1]
Steven H.H. Ding, Benjamin C.M. Fung, and Philippe Charland. 2016. Kam1N0: MapReduce-based Assembly Clone Search for Reverse Engineering. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 461--470.
[2]
Zekeriya Erkin, Michael Beye, Thijs Veugen, and Reginald L. Lagendijk. 2012. Privacy-preserving Content-based Recommender System. In Proceedings of the on Multimedia and Security (Sec '12). ACM, New York, NY, USA, 77--84.
[3]
Matteo Interlandi, Kshitij Shah, Sai Deep Tetali, Muhammad Ali Gulzar, Seunghyun Yoo, Miryung Kim, Todd Millstein, and Tyson Condie. 2015. Titian: Data Provenance Support in Spark. Proc. VLDB Endow. 9, 3 (Nov. 2015), 216--227.
[4]
Holden Karau, Andy Konwinski, Patrick Wendell, and Matei Zaharia. 2015. Learning Spark: Lightning-Fast Big Data Analytics (1st ed.). O'Reilly Media, Inc.
[5]
Daniel T Larose. 2005. k-Nearest Neighbor Algorithm. Discovering Knowledge in Data: An Introduction to Data Mining (2005), 90--106.
[6]
Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of massive datasets. Cambridge University Press.
[7]
Collin McMillan, Mark Grechanik, and Denys Poshyvanyk. 2012. Detecting Similar Software Applications. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). IEEE Press, Piscataway, NJ, USA, 364--374.
[8]
Collin McMillan, Negar Hariri, Denys Poshyvanyk, Jane Cleland-Huang, and Bamshad Mobasher. 2012. Recommending Source Code for Use in Rapid Software Prototypes. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). IEEE Press, Piscataway, NJ, USA, 848--858.
[9]
Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2016. MLlib: Machine Learning in Apache Spark. J. Mach. Learn. Res. 17, 1 (Jan. 2016), 1235--1241.
[10]
Michael J Pazzani and Daniel Billsus. 2007. Content-based recommendation systems. In The adaptive web. Springer, 325--341.
[11]
Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to recommender systems handbook. Springer.
[12]
Wei Shi, Xiaobing Sun, Bin Li, Yucong Duan, and Xiangyue Liu. 2015. Using feature-interface graph for automatic interface recommendation: A case study. In Advanced Cloud and Big Data, 2015 Third International Conference on. IEEE, 296--303.
[13]
Xiaobing Sun, Bin Li, Yucong Duan, Wei Shi, and Xiangyue Liu. 2016. Mining Software Repositories for Automatic Interface Recommendation. Scientific Programming 2016 (2016).
[14]
Xiaobing Sun, Bin Li, Yun Li, and Ying Chen. 2015. What Information in Software Historical Repositories Do We Need to Support Software Maintenance Tasks? An Approach Based on Topic Model. In Computer and Information Science. 27--37. https://doi.org/10.1007/978-3-319-10509-3_3
[15]
Xiaobing Sun, Xiangyue Liu, Jiajun Hu, and Junwu Zhu. 2014. Empirical Studies on the NLP Techniques for Source Code Data Preprocessing. In Proceedings of the 2014 3rd International Workshop on Evidential Assessment of Software Technologies (EAST 2014). ACM, New York, NY, USA, 32--39. https://doi.org/10.1145/2627508.2627514
[16]
Ferdian Thung, David Lo, and Lingxiao Jiang. 2012. Detecting similar applications with collaborative tagging. In Software Maintenance (ICSM), 2012 28th IEEE International Conference on. IEEE, 600--603.
[17]
Jun Wang, Arjen P De Vries, and Marcel JT Reinders. 2006. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 501--508.
[18]
Wenyuan Xu, Xiaobing Sun, Jiajun Hu, and Bin Li. 2017. REPERSP: Recommending Personalized Software Projects on GitHub. In Software Maintenance and Evolution (ICSME), 2017 IEEE International Conference on. IEEE.
[19]
Cheng Yang, Qiang Fan, Tao Wang, Gang Yin, and Huaimin Wang. 2016. RepoLike: Personal Repositories Recommendation in Social Coding Communities. In Proceedings of the 8th Asia-Pacific Symposium on Internetware (Internetware '16). ACM, New York, NY, USA, 54--62.
[20]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2--2.
[21]
Lingxiao Zhang, Yanzhen Zou, Bing Xie, and Zixiao Zhu. 2014. Recommending Relevant Projects via User Behaviour: An Exploratory Study on Github. In Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies (CrowdSoft 2014). ACM, New York, NY, USA, 25--30.
[22]
Yun Zhang, David Lo, Kochhar Pavneet Singh, Xin Xia, Quanlai Li, and Jianling Sun. 2017. Detecting Similar Repositories on GitHub. In 2017 IEEE 24rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE.

Cited By

View all
  • (2024)Wiki2GH: A Recommendation Service to Link Software Engineering Knowledge to Practical DevelopmentService Science10.1007/978-981-97-5760-2_14(203-220)Online publication date: 19-Aug-2024
  • (2022)Profiling developers to predict vulnerable code changesProceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering10.1145/3558489.3559069(32-41)Online publication date: 7-Nov-2022
  • (2022)An Open-source Repository Retrieval Service Using Functional Semantics for Software Developers2022 International Conference on Service Science (ICSS)10.1109/ICSS55994.2022.00012(12-20)Online publication date: May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
Internetware '17: Proceedings of the 9th Asia-Pacific Symposium on Internetware
September 2017
172 pages
ISBN:9781450353137
DOI:10.1145/3131704
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 September 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GitHub
  2. Software recommendation
  3. parallel processing frame

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

Internetware'17

Acceptance Rates

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Wiki2GH: A Recommendation Service to Link Software Engineering Knowledge to Practical DevelopmentService Science10.1007/978-981-97-5760-2_14(203-220)Online publication date: 19-Aug-2024
  • (2022)Profiling developers to predict vulnerable code changesProceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering10.1145/3558489.3559069(32-41)Online publication date: 7-Nov-2022
  • (2022)An Open-source Repository Retrieval Service Using Functional Semantics for Software Developers2022 International Conference on Service Science (ICSS)10.1109/ICSS55994.2022.00012(12-20)Online publication date: May-2022
  • (2022)Improving Personalized Project Recommendation on GitHub Based on Deep Matrix FactorizationCollaborative Computing: Networking, Applications and Worksharing10.1007/978-3-030-92635-9_19(318-332)Online publication date: 1-Jan-2022
  • (2021)Sequential Recommendations on GitHub RepositoryApplied Sciences10.3390/app1104158511:4(1585)Online publication date: 10-Feb-2021
  • (2021)FunkR-pDAE: Personalized Project Recommendation Using Deep LearningIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2018.28707349:2(886-900)Online publication date: 1-Apr-2021
  • (2021)GHTRec: A Personalized Service to Recommend GitHub Trending Repositories for Developers2021 IEEE International Conference on Web Services (ICWS)10.1109/ICWS53863.2021.00049(314-323)Online publication date: Sep-2021
  • (2018)Measuring and Predicting the Relevance Ratings between FLOSS Projects using Topic FeaturesProceedings of the 10th Asia-Pacific Symposium on Internetware10.1145/3275219.3275222(1-10)Online publication date: 16-Sep-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media