Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3180155.3180205acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Identifying features in forks

Published: 27 May 2018 Publication History

Abstract

Fork-based development has been widely used both in open source communities and in industry, because it gives developers flexibility to modify their own fork without affecting others. Unfortunately, this mechanism has downsides: When the number of forks becomes large, it is difficult for developers to get or maintain an overview of activities in the forks. Current tools provide little help. We introduce Infox, an approach to automatically identify non-merged features in forks and to generate an overview of active forks in a project. The approach clusters cohesive code fragments using code and network-analysis techniques and uses information-retrieval techniques to label clusters with keywords. The clustering is effective, with 90 % accuracy on a set of known features. In addition, a human-subject evaluation shows that Infox can provide actionable insight for developers of forks.

References

[1]
2016. Dear Github Issue 109: Tell us Concisely What Other People Changed in Their Forks. (2016). https://github.com/dear-github/dear-github/issues/109
[2]
2016. Dear Github Issue 175: Better overview over forks. (2016). https://github.com/dear-github/dear-github/issues/175
[3]
2017. Lovely Forks Browser Extension: Show notable forks of Github repositories under their names. (2017). https://github.com/musically-ut/lovely-forks
[4]
Michal Antkiewicz, WenbinJi, Thorsten Berger, Krzysztof Czarnecki, Thomas Schmorleiz, Ralf Laemmel, Stefan Stănciulescu, Andrzej Wąsowski, and Ina Schaefer. 2014. Flexible Product Line Engineering with a Virtual Platform. In Comp. Int'l Conf. Software Engineering (ICSE). ACM, 532--535.
[5]
Mike Barnett, Christian Bird, Joao Brunet, and Shuvendu K Lahiri. 2015. Helping Developers Help Themselves: Automatic Decomposition of Code Review Changesets. In Proc. Int'l Conf. Software Engineering (ICSE), Vol. 1. IEEE, 134--144.
[6]
Thorsten Berger, Divya Nair, Ralf Rublack, Joanne M Atlee, Krzysztof Czarnecki, and Andrzej Wasowski. 2014. Three Cases of Feature-based Variability Modeling in Industry. In Proc. Int'l Conf. Model Driven Engineering Languages and Systems (MoDELS). Springer, 302--319.
[7]
Christian Bird and Thomas Zimmermann. 2012. Assessing the Value of Branches with What-if Analysis. In Proc. Int'l Symposium Foundations of Software Engineering (FSE). ACM, 45.
[8]
Jürgen Bitzer and Philipp JH Schröder. 2006. The Impact of Entry and Competition by Open Source Software on Innovation Activity. The economics of open source software development (2006), 219--245.
[9]
Jan Bosch. 2009. From Software Product Lines to Software Ecosystems. In Proc. Int'l Software Product Line Conf. (SPLC). Carnegie Mellon University, 111--119.
[10]
Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2011. Improving the Tokenisation of Identifier Names. Proc. Europ. Conf. Object-Oriented Programming (ECOOP) (2011), 130--154.
[11]
Kunrong Chen and Václav Rajlich. 2000. Case Study of Feature Location using Dependence Graph. In Proc. Int'l Workshop on Program Comprehension (IWPC). IEEE, 241--247.
[12]
Bredan Cleary and Chris Exton. 2007. Assisting Concept Location in Software Comprehension. Ph.D. Dissertation. University of Limerick.
[13]
Brendan Cleary, Chris Exton, Jim Buckley, and Michael English. 2009. An Empirical Analysis of Information Retrieval based Concept Location Techniques in Software Comprehension. Empirical Software Engineering 14, 1 (2009), 93--130.
[14]
Michael L Collard, Michael John Decker, and Jonathan I Maletic. 2013. srcML: An Infrastructure for the Exploration, Analysis, and Manipulation of Source Code: A Tool Demonstration. In Proc. Int'l Conf. Software Maintenance (ICSM). IEEE, 516--519.
[15]
Bas Cornelissen, Andy Zaidman, Arie Van Deursen, Leon Moonen, and Rainer Koschke. 2009. A Systematic Survey of Program Comprehension through Dynamic Analysis. IEEE Trans. Softw. Eng. (TSE) 35, 5 (2009), 684--702.
[16]
Davor Čubranić and Gail C Murphy. 2003. Hipikat: Recommending Pertinent Software Development Artifacts. In Proc. Int'l Conf. Software Engineering (ICSE). IEEE Computer Society, 408--418.
[17]
Davor ČubraniĆ, Gail C Murphy, Janice Singer, and Kellogg S Booth. 2004. Learning from Project History: a Case Study for Software Development. In Proc. Conf. Computer Supported Cooperative Work (CSCW). ACM, 82--91.
[18]
Davor Cubranic, Gail C Murphy, Janice Singer, and Kellogg S Booth. 2005. Hipikat: A Project Memory for Software Development. IEEE Trans. Softw. Eng. (TSE) 31, 6 (2005), 446--465.
[19]
Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository. In Proc. Conf. Computer Supported Cooperative Work (CSCW). ACM, 1277--1286.
[20]
Laura Dabbish, Colleen Stuart, Jason Tsay, and James Herbsleb. 2013. Leveraging Transparency. IEEE Software 30, 1 (2013), 37--43.
[21]
Martín Dias, Alberto Bacchelli, Georgios Gousios, Damien Cassou, and Stéphane Ducasse. 2015. Untangling fine-grained code changes. In Proc. Int'l Conf. Software Analysis, Evolution, and Reengineering (SANER). IEEE, 341--350.
[22]
Bogdan Dit, Meghan Revelle, Malcom Gethers, and Denys Poshyvanyk. 2013. Feature Location in Source Code: a Taxonomy and Survey. Journal of software: Evolution and Process 25, 1 (2013), 53--95.
[23]
Yael Dubinsky, Julia Rubin, Theodore Berger, Slawomir Duszynski, Matthias Becker, and Krzysztof Czarnecki. 2013. An Exploratory Study of Cloning in Industrial Software Product Pines. In Proc. Europ. Conf. Software Maintenance and Reengineering(CSMR). IEEE, 25--34.
[24]
Anh Nguyen Duc, Audris Mockus, Randy Hackbarth, and John Palframan. 2014. Forking and Coordination in Multi-platform Development: A Case Study. In Proc. Int'l Symp. Empirical Software Engineering and Measurement (ESEM). ACM, 59:1--59:10.
[25]
Marc Eaddy, Alfred V Aho, Giuliano Antoniol, and Yann-Gaël Guéhéneuc. 2008. Cerberus: Tracing Requirements to Source Code using Information Retrieval, Dynamic Analysis, and Program Analysis. In Proc. Int'l Conf. Program Comprehension (ICPC). Ieee, 53--62.
[26]
Thomas Eisenbarth, Rainer Koschke, and Daniel Simon. 2003. Locating Features in Source Code. IEEE Trans. Softw. Eng. (TSE) 29, 3 (2003), 210--224.
[27]
Thomas J Emerson. 1984. A Discriminant Metric for Module Cohesion. In Proc. Int'l Conf. Software Engineering (ICSE). IEEE Press, 294--303.
[28]
Michael D. Ernst, Greg J. Badros, and David Notkin. 2002. An Empirical Analysis of C Preprocessor Use. IEEE Trans. Softw. Eng. (TSE) (2002), 1146--1170.
[29]
Neil A Ernst, Steve Easterbrook, and John Mylopoulos. 2010. Code Forking in Open-source Software: a Requirements Perspective. arXiv preprint arXiv:1004.2889 (2010).
[30]
Janet Feigenspan, Maria Papendieck, Christian Kästner, Mathias Frisch, and Raimund Dachselt. 2011. FeatureCommander: Colorful #ifdef World. In Proc. Int'l Software Product Line Conf. (SPLC). ACM, 48.
[31]
Karl Fogel. 2005. Producing Open Source Software: How to Run a Successful Free Software Project. " O'Reilly Media, Inc.".
[32]
Santo Fortunato. 2010. Community Detection in Graphs. Physics reports 486, 3 (2010), 75--174.
[33]
Gregory Gay, Sonia Haiduc, Andrian Marcus, and Tim Menzies. 2009. On the Use of Relevance Feedback in IR-based Concept Location. In Proc. Int'l Conf. Software Maintenance (ICSM). IEEE, 351--360.
[34]
Michelle Girvan and Mark EJ Newman. 2002. Community Structure in Social and Biological Networks. Proceedings of the national academy of sciences 99, 12 (2002), 7821--7826.
[35]
Georgios Gousios, Martin Pinzger, and Arie van Deursen. 2014. An Exploratory Study of the Pull-based Software Development Model. In Proceedings of the 36th International Conference on Software Engineering. ACM, 345--355.
[36]
Georgios Gousios, Bogdan Vasilescu, Alexander Serebrenik, and Andy Zaidman. 2014. Lean GHTorrent: GitHub Data on Demand. In Proc. Int'l Conf. Mining Software Repositories (MSR). ACM, 384--387.
[37]
Kim Herzig and Andreas Zeller. 2011. Untangling Changes. Unpublished manuscript, September 37 (2011), 38--40.
[38]
Kim Herzig and Andreas Zeller. 2013. The Impact of Tangled Code Changes. In Proc. Int'l Conf. Mining Software Repositories (MSR). IEEE Press, 121--130.
[39]
Kim Herzig and Andreas Zeller. 2013. The Impact of Tangled Code Changes. In Proc. Int'l Conf. Mining Software Repositories (MSR). IEEE, 121--130.
[40]
Emily Hill, Lori Pollock, and K Vijay-Shanker. 2007. Exploring the Neighborhood with Dora to Expedite Software Maintenance. In Proc. Int'l Conf. Automated Software Engineering (ASE). ACM, 14--23.
[41]
Emily Hill, Lori Pollock, and K Vijay-Shanker. 2009. Automatically Capturing Source Code Context of NL-queries for Software Maintenance and Reuse. In Proc. Int'l Conf. Software Engineering (ICSE). IEEE, 232--242.
[42]
Riitta Jääskeläinen. 2010. Think-aloud Protocol. Handbook of translation studies 1 (2010), 371--374.
[43]
David Kawrykow and Martin P. Robillard. 2011. Non-essential Changes in Version Histories. In Proc. Int'l Conf. Software Engineering (ICSE). ACM, 351--360.
[44]
Adrian Kuhn, Stéphane Ducasse, and Tudor Gírba. 2007. Semantic Clustering: Identifying Topics in Source Code. Information and Software Technology (IST) 49, 3 (2007), 230--243.
[45]
AndrewM St Laurent. 2004. Understanding Open Source and Free Software Licensing: Guide to Navigating Licensing Issues in Existing & New Software. " O'Reilly Media, Inc.".
[46]
Sungjick Lee and Han-joon Kim. 2008. News Keyword Extraction for Topic Tracking. In Proc. Int'l Conf. Networked Computing and Advanced Information (NCM). IEEE, 554--559.
[47]
Yi Li, Chenguang Zhu, Julia Rubin, and Marsha Chechik. 2016. Precise Semantic History Slicing through Dynamic Delta Refinement. In Proc. Int'l Conf. Automated Software Engineering (ASE). 495--506.
[48]
Y. Li, C. Zhu, J. Rubin, and M. Chechik. 2017. Semantic Slicing of Software Version Histories. IEEE Trans. Softw. Eng. (TSE) (2017), 1--1.
[49]
Jörg Liebig, Sven Apel, Christian Lengauer, Christian Kästner, and Michael Schulze. 2010. An Analysis of the Variability in Forty Preprocessor-based Software Product Lines. In Proc. Int'l Conf. Software Engineering (ICSE).
[50]
Andrian Marcus, Andrey Sergeyev, Vaclav Rajlich, and Jonathan I Maletic. 2004. An information retrieval approach to concept location in source code. In Proc. Working Conf. Reverse Engineering (WCRE). IEEE, 214--223.
[51]
Flávio Medeiros, Christian Kästner, Márcio Ribeiro, Sarah Nadi, and Rohit Gheyi. 2015. The Love/Hate Relationship with the C Preprocessor: An Interview Study. In Proc. Europ. Conf. Object-Oriented Programming (ECOOP). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 495--518.
[52]
Tommi Mikkonen and Linus Nyman. 2011. To Fork or Not to Fork: Fork Motivations in Source Forge Projects. Int. J. Open Source Softw. Process. 3, 3 (July 2011), 1--9.
[53]
Gail Cecile Murphy. 1996. Lightweight Structural Summarization as an Aid to Software Evolution. Ph.D. Dissertation.
[54]
Emerson Murphy-Hill, Chris Parnin, and Andrew P. Black. 2009. How We Refactor, and How We Know It. In Proc. Int'l Conf. Software Engineering (ICSE). IEEE Computer Society, 287--297.
[55]
Linus Nyman. 2014. Hackers on forking. In Proceedings of The International Symposium on Open Collaboration. ACM, 6.
[56]
Linus Nyman, Tommi Mikkonen, Juho Lindman, and Martin Fougère. 1999. Perspectives on Code Forking and Sustainability in Open Source Software. Why Linux on't fork (1999). http://linuxmafia.com/faq/Licensing_and_Law/forking.html.
[57]
Maksym Petrenko, Václav Rajlich, and Radu Vanciu. 2008. Partial Domain Comprehension in Software Evolution and Maintenance. In Proc. Int'l Conf. Program Comprehension (ICPC). IEEE, 13--22.
[58]
Denys Poshyvanyk and Andrian Marcus. 2007. Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code. In Proc. Int'l Conf. Program Comprehension (ICPC). IEEE, 37--48.
[59]
V. Rajlich and N. Wilde. 2002. The Role of Concepts in Program Comprehension. In Proc. Int'l Conf. Program Comprehension (ICPC). 271--278.
[60]
Eric S Raymond. 2001. The Cathedral & the Bazaar: Musings on linux and open source by an accidental revolutionary. " O'Reilly Media, Inc.".
[61]
Meghan Revelle, Bogdan Dit, and Denys Poshyvanyk. 2010. Using Data Fusion and Web Mining to Support Feature Location in Software. In Proc. Int'l Conf. Program Comprehension (ICPC). IEEE, 14--23.
[62]
Martin P Robillard. 2005. Automatic Generation of Suggestions for Program Investigation. In SIGSOFT Softw. Eng. Notes, Vol. 30. ACM, 11--20.
[63]
Martin P Robillard, David Shepherd, Emily Hill, K Vijay-Shanker, and Lori Pollock. 2007. An Empirical Study of the Concept Assignment Problem. School of Computer Science, McGill University, Tech. Rep. SOCS-TR-2007.3 (2007).
[64]
Gregorio Robles and Jesús M. González-Barahona. 2012. A Comprehensive Study of Software Forks: Dates, Reasons and Outcomes. In Open Source Systems: Long-Term Sustainability - 8th IFIP WG 2.13 International Conference, OSS 2012, Hammamet, Tunisia, September 10--13, 2012. Proceedings. 1--14.
[65]
Julia Rubin and Marsha Chechik. 2013. A Framework for Managing Cloned Product Variants. In Proc. Int'l Conf. Software Engineering (ICSE). IEEE Press, 1233--1236.
[66]
Johnny Saldana. 2015. The Coding Manual for Qualitative Researchers. Sage.
[67]
Gerard Salton and Christopher Buckley. 1988. Term-weighting Approaches in Automatic Text Retrieval. Information processing & management 24, 5 (1988), 513--523.
[68]
David Shepherd, Zachary P Fry, Emily Hill, Lori Pollock, and K Vijay-Shanker. 2007. Using Natural Language Program Analysis to Locate and Understand Action-oriented Concerns. In Proc. Int'l Conf. Aspect-Oriented Software Development (AOSD). ACM, 212--224.
[69]
Igor STEINMACHER, Gustavo H. L. PINTO, Igor Scaliante WIESE, and Marco Aurélio GEROSA. 2018. Almost There: A Study on Quasi-Contributors in Open-Source Software Projects. In Proc. Int'l Conf. Software Engineering (ICSE). 1--12.
[70]
Margaret-Anne Storey, Li-Te Cheng, Ian Bull, and Peter Rigby. 2006. Shared Waypoints and Social Tagging to Support Collaboration in Software Development. In Proc. Conf. Computer Supported Cooperative Work (CSCW). ACM, 195--198.
[71]
Ştefan Stănciulescu, Sandro Schulze, and Andrzej Wąsowski. 2015. Forked and Integrated Variants in an Open-Source Firmware Project. In Proc. Int'l Conf. Software Maintenance (ICSM). IEEE, 151--160.
[72]
Ching Y Suen. 1979. N-gram Statistics for Natural Language Understanding and Text Processing. IEEE transactions on pattern analysis and machine intelligence 2 (1979), 164--172.
[73]
Marcel Taeumel, Stephanie Platz, Bastian Steinert, Robert Hirschfeld, and Hidehiko Masuhara. 2017. Unravel Programming Sessions with THRESHER: Identifying Coherent and Complete Sets of Fine-granular Source Code Changes. Information and Media Technologies 12 (2017), 24--39.
[74]
Lei Tang and Huan Liu. 2010. Community Detection and Mining in Social Media. Synthesis Lectures on Data Mining and Knowledge Discovery (2010), 1--137.
[75]
Greg R Vetter. 2007. Open Source Licensing and Scattering Opportunism in Software Standards. BCL Rev. 48 (2007), 225.
[76]
Norman Wilde and Michael C Scully. 1995. Software Reconnaissance: Mapping Program Features to Code. Journal of Software: Evolution and Process 7, 1 (1995), 49--62.
[77]
Andrew Y Yao. 2001. CVSSearch: Searching through Source Code using CVS Comments. In Proc. Int'l Conf. Software Maintenance (ICSM). IEEE Computer Society, 364.

Cited By

View all
  • (2024)On the Expressive Power of Languages for Static VariabilityProceedings of the ACM on Programming Languages10.1145/36897478:OOPSLA2(1018-1050)Online publication date: 8-Oct-2024
  • (2024)Forking From the Future: How an Interorganizational Network Learned Its Way to New Software BusinessIEEE Transactions on Engineering Management10.1109/TEM.2022.319395971(2744-2757)Online publication date: 2024
  • (2024)Use the Forks, Look! Visualizations for Exploring Fork Ecosystems2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00107(993-1004)Online publication date: 12-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '18: Proceedings of the 40th International Conference on Software Engineering
May 2018
1307 pages
ISBN:9781450356381
DOI:10.1145/3180155
  • Conference Chair:
  • Michel Chaudron,
  • General Chair:
  • Ivica Crnkovic,
  • Program Chairs:
  • Marsha Chechik,
  • Mark Harman
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICSE '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)8
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)On the Expressive Power of Languages for Static VariabilityProceedings of the ACM on Programming Languages10.1145/36897478:OOPSLA2(1018-1050)Online publication date: 8-Oct-2024
  • (2024)Forking From the Future: How an Interorganizational Network Learned Its Way to New Software BusinessIEEE Transactions on Engineering Management10.1109/TEM.2022.319395971(2744-2757)Online publication date: 2024
  • (2024)Use the Forks, Look! Visualizations for Exploring Fork Ecosystems2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00107(993-1004)Online publication date: 12-Mar-2024
  • (2023)A Vision on Intentions in Software EngineeringProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613087(2117-2121)Online publication date: 30-Nov-2023
  • (2023)User Perspectives on Branching in Computer-Aided DesignProceedings of the ACM on Human-Computer Interaction10.1145/36102207:CSCW2(1-30)Online publication date: 4-Oct-2023
  • (2023)In the Age of Collaboration, the Computer-Aided Design Ecosystem is Behind: An Interview Study of Distributed CAD PracticeProceedings of the ACM on Human-Computer Interaction10.1145/35796137:CSCW1(1-29)Online publication date: 16-Apr-2023
  • (2023)DSDGenProceedings of the 27th ACM International Systems and Software Product Line Conference - Volume B10.1145/3579028.3609015(47-56)Online publication date: 28-Aug-2023
  • (2023)VariantIncProceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A10.1145/3579027.3608984(129-140)Online publication date: 28-Aug-2023
  • (2023)Benchmark Generation with VEVOS: A Coverage Analysis of Evolution Scenarios in Variant-Rich SystemsProceedings of the 17th International Working Conference on Variability Modelling of Software-Intensive Systems10.1145/3571788.3571793(13-22)Online publication date: 25-Jan-2023
  • (2023)DupHunter: Detecting Duplicate Pull Requests in Fork-Based DevelopmentIEEE Transactions on Software Engineering10.1109/TSE.2023.323594249:4(2920-2940)Online publication date: 1-Apr-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media