research-article

An approach to helping developers learn open source projects based on machine learning

Authors:

Junrui Guan, and

Yanchun SunAuthors Info & Claims

Internetware '19: Proceedings of the 11th Asia-Pacific Symposium on Internetware

October 2019

Article No.: 13, Pages 1 - 10

https://doi.org/10.1145/3361242.3361251

Published: 28 October 2019 Publication History

Abstract

Developers usually learn excellent coding methods and design patterns by reading the code from well-known open-source projects, and participate in the development of open-source projects to enhance their programming capabilities. When developers have just joined an existing open-source project development, the first thing to do is to read and understand the project code. However, almost no project will maintain design documentations. Developers can only understand code according to user guide (mainly focus on how to use code but not on how to develop code) or brief code comments, which is relatively difficult for new developers. To help developers learn open-source projects more quickly, we propose an approach to helping developers learn open-source projects based on machine learning. First, we build a code structure graph for the project code by static analysis. Second, we implement a project entries recommendation approach based on clustering and machine learning to recommend project entries suitable for developers to read. Third, we implement a learning path recommendation algorithm. The algorithm recommends learning paths based on function nodes in the code structure graph selected by the developers, helps developers understand open-source projects better. In experiments, we select two famous c++ open-source projects, Lua and Memcache, as examples to perform project learning path recommendation. The experimental results show that our approach save a lot of time for developers to learn open-source projects while maintaining the accuracy of the recommendations.

References

[1]

A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, & O. Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, 2787--2795.

[2]

Z. Wang, J. Zhang, J. Feng, & Z. Chen. 2014. Knowledge graph embedding by translating on hyperplanes. In Twenty-Eighth AAAI conference on artificial intelligence.

Digital Library

[3]

Y. Lin, Z. Liu, M. Sun, Y. Liu, & X. Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In Twenty-ninth AAAI conference on artificial intelligence.

Digital Library

[4]

G. Ji, K. Liu, S. He, & J. Zhao. 2016. Knowledge graph completion with adaptive sparse transfer matrix. In Thirtieth AAAI Conference on Artificial Intelligence.

Digital Library

[5]

B. Perozzi, R. Al-Rfou, & S. Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 701--710.

Digital Library

[6]

G. Gharibi, R. Tripathi, & Y. Lee. 2018. Code2graph: automatic generation of static call graphs for python source code. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ACM, 880--883.

Digital Library

[7]

A. Habib, & M. Pradel. 2018. Is this class thread-safe? inferring documentation using graph-based learning. In ASE, 41--52.

[8]

J. Tu, X. Xie, Y. Zhou, B. Xu, & L. Chen. 2016. A Search Based Context-Aware Approach for Understanding and Localizing the Fault via Weighted Call Graph. In 2016 Third International Conference on Trustworthy Systems and their Applications (TSA), IEEE, 64--72.

[9]

H. Gascon, F. Yamaguchi, D. Arp, & K. Rieck. 2013. Structural detection of android malware using embedded call graphs. In Proceedings of the 2013 ACM workshop on Artificial intelligence and security, ACM, 45--54.

Digital Library

[10]

M. Trapp, M. Rossberg, & G. Schaefer. 2015. Program partitioning based on static call graph analysis for privilege separation. In 2015 IEEE Symposium on Computers and Communication (ISCC), IEEE, 613--618.

Digital Library

[11]

Y. Zou, C. Ling, Z. Lin, & B. Xie. 2018. Graph Embedding based Code Search in Software Project. In Proceedings of the Tenth Asia-Pacific Symposium on Internetware, ACM, 1.

Digital Library

[12]

F. Lv, H. Zhang, J. G. Lou, S. Wang, D. Zhang, & J. Zhao. 2015. Codehow: Effective code search based on api understanding and extended boolean model (e). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, 260--270.

Digital Library

[13]

F. Asadi, M. Di Penta, G. Antoniol, & Y. G. Guéhéneuc. 2010. A heuristic-based approach to identify concepts in execution traces. In 2010 14th European Conference on Software Maintenance and Reengineering, IEEE, 31--40.

Digital Library

[14]

M. Revelle, B. Dit, & D. Poshyvanyk. 2010. Using data fusion and web mining to support feature location in software. In 2010 IEEE 18th International Conference on Program Comprehension, IEEE, 14--23.

Digital Library

[15]

https://www.jetbrains.com/idea/

[16]

K. Zimmerman, & C. R. Rupakheti. 2015. An automated framework for recommending program elements to novices (n). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, 283--288.

Digital Library

[17]

Y. Lin, G. Meng, Y. Xue, Z. Xing, J. Sun, X. Peng, ... & J. Dong. 2017. Mining implicit design templates for actionable code reuse. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, IEEE Press, 394--404.

Digital Library

[18]

S. Zhou, H. Zhong, & B. Shen. 2018. SLAMPA: Recommending Code Snippets with Statistical Language Model. In 2018 25th Asia-Pacific Software Engineering Conference (APSEC), IEEE, 79--88.

[19]

S. Prabhakar, G. Spanakis, & O. Zaiane. 2017. Reciprocal recommender system for learners in massive open online courses (moocs). In International Conference on Web-Based Learning, Springer, Cham, 157--167.

[20]

Y. Dai, Y. Asano, & M. Yoshikawa. 2016. Course Content Analysis: An Initiative Step toward Learning Object Recommendation Systems for MOOC Learners. International Educational Data Mining Society.

[21]

H. M. Chang, T. M. L. Kuo, S. C. Chen, C. A. Li, Y. W. Huang, Y. C. Cheng, ... & J. W. Tzeng. 2016. Developing a data-driven learning interest recommendation system to promoting self-paced learning on MOOCs. In 2016 IEEE 16th International Conference on Advanced Learning Technologies (ICALT), IEEE, 23--25.

[22]

Y. Pang, C. Liao, W. Tan, Y. Wu, & C. Zhou. 2018. Recommendation for MOOC with Learner Neighbors and Learning Series. In International Conference on Web Information Systems Engineering, Springer, Cham, 379--394.

[23]

http://www.doxygen.nl/

[24]

A. K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern recognition letters, 31(8), 651--666.

[25]

G. H. Ball, & D. J. Hall. 1965. ISODATA, a novel method of data analysis and pattern classification. Stanford research inst Menlo Park CA.

[26]

X. Han, S. Cao, X. Lv, Y. Lin, Z. Liu, M. Sun, & J. Li. 2018. Openke: An open toolkit for knowledge embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 139--144.

[27]

http://www.lua.org/

[28]

https://memcached.org/

Cited By

Idialu OMathews NMaipradit RAtlee JNagappan MSpinellis DConstantinou EBacchelli A(2024)Whodunit: Classifying Code as Human Authored or GPT-4 Generated - A case study on CodeChef problemsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644926(394-406)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644926
Viduka DLičina BKraguljac V(2021)Open model of education using Open Source principlesTrendovi u poslovanju10.5937/trendpos2101041V9:1(40-48)Online publication date: 2021
https://doi.org/10.5937/trendpos2101041V
Yin HSun ZSun YHuang G(2021)Automatic Learning Path Recommendation for Open Source Projects Using Deep Learning on Knowledge Graphs2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC51774.2021.00115(824-833)Online publication date: Jul-2021
https://doi.org/10.1109/COMPSAC51774.2021.00115

Index Terms

An approach to helping developers learn open source projects based on machine learning
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
2. Security and privacy
  1. Software and application security
    1. Software reverse engineering

Recommendations

An empirical analysis of reopened bugs based on open source projects
EASE '16: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering

Background: Bug fixing is a long-term and time-consuming activity. A software bug experiences a typical life cycle from newly reported to finally closed by developers, but it could be reopened afterwards for further actions due to reasons such as ...
Read More
Studying the fix-time for bugs in large open source projects
Promise '11: Proceedings of the 7th International Conference on Predictive Models in Software Engineering

Background: Bug fixing lies at the core of most software maintenance efforts. Most prior studies examine the effort needed to fix a bug (fix-effort). However, the effort needed to fix a bug may not correlate with the calendar time needed to fix it (fix-...
Read More
Developers’ perception matters: machine learning to detect developer-sensitive smells
Abstract
Code smells are symptoms of poor design that hamper software evolution and maintenance. Hence, code smells should be detected as early as possible to avoid software quality degradation. However, the notion of whether a design and/or implementation ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

Internetware '19: Proceedings of the 11th Asia-Pacific Symposium on Internetware

October 2019

179 pages

ISBN:9781450377010

DOI:10.1145/3361242

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Key Research and Development Program of China
National Natural Science Foundation of China
National Basic Research Program of China

Conference

Internetware '19

Internetware '19: The 11th Asia-Pacific Symposium on Internetware

October 28 - 29, 2019

Fukuoka, Japan

Acceptance Rates

Internetware '19 Paper Acceptance Rate 20 of 35 submissions, 57%;

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
153
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Idialu OMathews NMaipradit RAtlee JNagappan MSpinellis DConstantinou EBacchelli A(2024)Whodunit: Classifying Code as Human Authored or GPT-4 Generated - A case study on CodeChef problemsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644926(394-406)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644926
Viduka DLičina BKraguljac V(2021)Open model of education using Open Source principlesTrendovi u poslovanju10.5937/trendpos2101041V9:1(40-48)Online publication date: 2021
https://doi.org/10.5937/trendpos2101041V
Yin HSun ZSun YHuang G(2021)Automatic Learning Path Recommendation for Open Source Projects Using Deep Learning on Knowledge Graphs2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC51774.2021.00115(824-833)Online publication date: Jul-2021
https://doi.org/10.1109/COMPSAC51774.2021.00115

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents