Abstract
Developing and maintaining large software systems typically requires that developers collaborate on many tasks. During such collaborations, when multiple people work on the same chunk of code at the same time, they communicate with each other and employ safeguards in various ways. Recent studies have considered group co-development in OSS projects and found that it is an essential part of many projects. However, those studies were limited to groups of size two, i.e., pairs of developers. Here we go further and characterize co-development in larger groups. We develop an effective methodology for capturing distributed collaboration beyond groups of size two, based on synchronized commit activities among multiple developers, and apply it to data from 26 OSS projects from the Apache Software Foundation. We find that distributed collaborations is prevalent, but not as frequent as expected. We also find that while in distributed collaborative groups, developers’ behavior is different than when programming alone, e.g., high developer focus on specific code packages associates with lower team participation, while packages with higher ownership get less attention from groups than from individuals. Finally, we show that productivity effort during co-development is more often lower for developers while they co-develop in groups. To verify our results we use both quantitative and qualitative methods, including a developer survey. We conclude that these methods and results can be used to understand the effects of the collaborative dynamic in OSS teams on the software engineering process. Our code, along with our datasets and survey is available at http://www.gharehyazie.com/supplementary/teamwork/.
Similar content being viewed by others
Notes
Linus Torvalds runs the Linux project in a more centralized fashion, depending on his lieutenants for decisions regarding which new code filters up to him.
Most of our studied projects are written in Java where files within the same file directory are considered to be in the same package. The three non-java projects, “axis2_c”, “log4net”, and “log4php”, use the same file structure as their Java counterparts,“axis2_java” and “log4j”.
The reason that we speak of files instead of packages at this stage is that commit datasets record files, and to randomize them, we have to randomize at a file level. All results extracted from these randomized datasets are still based on package level code proximity.
We scanned by hand a number of CoGs and were able to identify via the contents of their messages that developers were truly coordinating their collaboration as predicted. That encouraged us to come up with the automated, but necessarily more simplistic, large-scale analysis, presented here.
We search for files within the packages subject to collaboration since in technical discussions, file names occur naturally and more frequently than package names.
The word cloud was created using the “comparison.wordcloud” function in the “wordcloud” package in R.
References
Adams PJ, Capiluppi A, Boldyreff C (2009) Coordination and productivity issues in free software: The role of Brooks’ law. In: IEEE International Conference on Software Maintenance, 2009. ICSM 2009, pages 319–328. IEEE
Al-Ani B, Edwards HK (2008) A comparative empirical study of communication in distributed and collocated development teams. In: ICGSE IEEE International Conference on Global Software Engineering, 2008, pages 35–44. IEEE
Avritzer A, Paulish DJ (2010) A comparison of commonly used processes for multi-site software development. In: Collaborative Software Engineering, pages 285–302. Springer
Baruch Y (1999) Response rate in academic studies-a comparative analysis. Human relations 52(4):421–438
Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A (2006) Mining email social networks. Inproceedings of the 2006 international workshop on Mining software repositories. ACM:137–143
Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pages 4–14 ACM
Bird C, Pattison D, D’Souza R, Filkov V, Devanbu P (2008) Latent social structure in open source projects. In: proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 24–35 ACM
Blüthgen N, Menzel F, Blüthgen N (2006) Measuring specialization in species interaction networks. BMC Ecology 6(1):9
Brooks Jr FP (1995) The Mythical Man-month (Anniversary Ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA USA
Caglayan B, Bener AB, Miranskyy A (2013) Emergence of developer teams in the collaboration network. In: Cooperative and Human Aspects of Software Engineering (CHASE), 2013 6th International Workshop on, pages 33–40. IEEE
Carmel E (1999) Global software teams: collaborating across borders and time zones Prentice Hall PTR
Cataldo M, Herbsleb JD (2013) Coordination breakdowns and their impact on development productivity and software failures Engineering. IEEE Trans Softw Eng 39(3):343–360
Child J (1972) Organizational structure, environment and performance: the role of strategic choice. Sociology 6(1):1–22
Cohen PR, Levesque HJ (1991) Teamwork SRI International Menlo Park
Crowston K, Li Q, Wei K, Eseryel UY, Howison J (2007) Self-organization of teams for free/libre open source software development. J Inf Softw Technol 49(6):564–575
Dabbish L, Stuart C, Tsay J, Herbsleb J (2012) Social coding in GitHub: transparency and collaboration in an open software repository. In: Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, pages 1277–1286 ACM
Damian D, Izquierdo L, Singer J, Kwan I (2007) Awareness in the wild: Why communication breakdowns occur. In: Global Software Engineering, 2007. ICGSE 2007. Second IEEE International Conference on, pages 81–90. IEEE
Di Penta M, Harman M, Antoniol G, Qureshi F (2007) The effect of communication overhead on software maintenance project staffing: a search-based approach. In: Software Maintenance, 2007. ICSM 2007. IEEE International Conference on, pages 315–324. IEEE
Dugatkin LA (1997) Cooperation among animals, Oxford Series in Ecology and Evolution
Foucault M, Falleri J-R, Blanc X (2014) Code ownership in open-source software. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, page 39 ACM
Gharehyazie M, Posnett D, Filkov V (2013) Social activities rival patch submission for prediction of developer initiation in oss projects. In: Software Maintenance (ICSM), 2013 29th IEEE International Conference on, pages 340–349. IEEE
Gharehyazie M, Posnett D, Vasilescu B, Filkov V (2014) Developer initiation and social interactions in oss: A case study of the apache software foundation. Empir Softw Eng:1–36
Goeminne M, Claes M, Mens T (2013) A historical dataset for the gnome ecosystem
Grechanik M, Jones JA, Orso A, van der Hoek A (2010) Bridging gaps between developers and testers in globally-distributed software development. In: Proceedings of the FSE/SDP workshop on Future of software engineering research, pages 149–154 ACM
Gutwin C, Penner R, Schneider K (2004) Group awareness in distributed software development. In: Proceedings of the 2004 ACM conference on Computer supported cooperative work, pages 72–81. ACM
Guzzi A, Bacchelli A, Lanza M, Pinzger M, Deursen AV (2013) Communication in open source software development mailing lists. In: MSR, pages 277–286. IEEE
Herbsleb JD (2007) Global software engineering: The future of socio-technical coordination. In: 2007 Future of Software Engineering, pages 188–198. IEEE Computer Society
Herbsleb J, Grinter RE (1999) Architectures, coordination, and distance: Conway’s law and beyond. IEEE Softw 16(5):63–70
Herbsleb J, Mockus A, Finholt TA, Grinter RE (2001) An empirical study of global software development: distance and speed. In: Proceedings of the 23rd international conference on software engineering, pages 81–90 IEEE Computer Society
Herbsleb JD, Moitra D (2001) Global software development. IEEE Soft 18 (2):16–20
Hertel G, Niedner S, Herrmann S (2003) Motivation of software developers in open source projects: an internet-based survey of contributors to the linux kernel. Res Policy 32(7):1159–1177
Holmstrom H, Conchúir E. Ó, Ågerfalk PJ, Fitzgerald B (2006) Global software development challenges: A case study on temporal, geographical and socio-cultural distance. In: Global Software Engineering, 2006. ICGSE’06. International Conference on, pages 3–11. IEEE
Jermakovics A, Sillitti A, Succi G (2011) Mining and visualizing developer networks from version control systems. In: Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering, pages 24–31 ACM
Kakimoto T, Kamei Y, Ohira M, Matsumoto K (2006) Social network analysis on communications for knowledge collaboration in oss communities
Kampstra P et al (2008) Beanplot: A boxplot alternative for visual comparison of distributions. J Stat Softw 28(1):1–9
Katzenbach JR (1993) The wisdom of teams: Creating the high-performance organization. Harvard Business Press
Kuipers BS, De Witte MC (2005) Teamwork: a case study on development and performance. Int J Hum Resour Manag 16(2):185–201
Lanubile F, Ebert C, Prikladnicki R, Vizca íno A (2010) Collaboration tools for global software engineering. IEEE soft 2:52–55
Luther K, Caine K, Ziegler K, Bruckman A (2010) Why it works (when it works): Success factors in online creative collaboration. In: Proceedings of the 16th ACM international conference on Supporting group work, pages 1–10 ACM
Maalej W, Happel H-J (2009) From work to word: How do software developers describe their work?. In: Mining Software Repositories, 2009. MSR’09. 6th IEEE International Working Conference on, pages 121–130. IEEE
Maalej W, Happel H-J (2010) Can development work describe itself?. In: Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, pages 191–200. IEEE
Mistrík I, Grundy J, Van der Hoek A, Whitehead J (2010) Collaborative software engineering: challenges and prospects. In: Collaborative Software Engineering, pages 389–403. Springer
Mockus A (2009) Succession: Measuring transfer of code and developer productivity. In: Proceedings of the 31st International Conference on Software Engineering, pages 67–77 IEEE Computer Society
Mockus A (2010) Organizational volatility and its effects on software defects. In: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering, pages 117–126 ACM
Moe NB, Dingsøyr T, Dybå T (2010) A teamwork model for understanding an agile team: A case study of a scrum project. Inf Softw Technol 52(5):480–491
Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality: an empirical case study. In: Proceedings of the 30th international conference on Software engineering, pages 521–530 ACM
Nakakoji K, Yamada K, Giaccardi E (2005) Understanding the nature of collaboration in open-source software development. In: Software Engineering Conference, 2005. APSEC’05. 12th Asia-Pacific, pages 8–pp. IEEE
Nakakoji K, Ye Y, Yamamoto Y (2010) Supporting expertise communication in developer-centered collaborative software development environments. In: Collaborative Software Engineering, pages 219–236. Springer
Nguyen T, Wolf T, Damian D (2008) Global software development and delay: Does distance still matter?. In: Global Software Engineering, 2008. ICGSE 2008. IEEE International Conference on, pages 45–54. IEEE
Nohria N, Eccles R (1994) Networks and organizations: structure, form, and action. Harvard Business School Press
Pagano D, Maalej W (2013) How do open source communities blog? Empir Softw Eng 18(6):1090–1124
Panichella S, Canfora G, Di Penta M, Oliveto R (2014) How the evolution of emerging collaborations relates to code changes: An empirical study. In: 22nd International Conference on Program Comprehension (ICPC). IEEE
Pinzger M, Gall H (2010) Dynamic analysis of communication and collaboration in oss projects. In: Collaborative Software Engineering, pages 265–284. Springer
Posnett D, D’Souza R, Devanbu P, Filkov V (2013) Dual ecological measures of focus in software development. In: 35th International Conference on Software Engineering (ICSE), pages 452–461. IEEE
Rahman F, Devanbu P (2011) Ownership, experience and defects: a fine-grained study of authorship. In: Proceedings of the 33rd International Conference on Software Engineering, pages 491–500 ACM
Redmiles D, Van Der Hoek A, Al-Ani B, Hildenbrand T, Quirk S, Sarma A, Filho R, de Souza C, Trainer E (2007) Continuous coordination-a new paradigm to support globally distributed software development projects. Wirtschafts Informatik 49(1):28
Robertsa J, Hann I-H, Slaughter S (2006) Communication networks in an open source software project. In: Open Source Systems, pages 297–306. Springer
Salas EE, Fiore SM (2004) Team cognition: Understanding the factors that drive process and performance. American Psychological Association
Sarma A, Al-Ani B, Trainer E, Silva Filho RS, da Silva IA, Redmiles D, van der Hoek A (2010) Continuous coordination tools and their evaluation. In: Collaborative Software Engineering, pages 153–178. Springer
Sarma A, Herbsleb J, Van Der Hoek A (2008) Challenges in measuring, understanding, and achieving social-technical congruence. In: Proceedings of Socio-Technical Congruence Workshop, In Conjuction With the International Conference on Software Engineering
Scacchi W (2010) Collaboration practices and affordances in free/open source software development. In: Collaborative software engineering, pages 307–327. Springer
Serebrenik A, van den Brand M (2010) Theil index for aggregation of software metrics values. In: Software Maintenance (ICSM), 2010 IEEE International Conference on, pages 1–9. IEEE
Takhteyev Y, Hilts A (2010) Investigating the geography of open source software through github
Vasilescu B, Serebrenik A, van den Brand M (2011) You can’t control the unfamiliar: A study on the relations between aggregation techniques for software metrics. In: Software Maintenance (ICSM), 2011 27th IEEE International Conference on, pages 313–322. IEEE
Whitehead J, Mistrík I, Grundy J, van der Hoek A (2010) Collaborative software engineering: concepts and techniques. In: Collaborative Software Engineering, pages 1–30. Springer
Wilson EO (1978) What is sociobiology? Society 15(6):10–14
Xuan Q, Devanbu P, Filkov V (2014) Converging work-talk patterns in online task-oriented communities. arXiv:1404.5708
Xuan Q, Fang H, Fu C, Filkov V (2015) Temporal motifs reveal collaboration patterns in online task-oriented networks. Phys Rev E 91(5):052813
Xuan Q, Filkov V (2013) Synchrony in social groups and its benefits. In: Handbook of Human Computation, pages 791–802. Springer
Xuan Q, Filkov V (2014) Building it together: synchronous development in OSS. In: Proceedings of the 34th International Conference on Software Engineering ACM
Xuan Q, Gharehyazie M, Devanbu P, Filkov V (2012) Measuring the effect of social communications on individual working rhythms: A case study of open source software. In: Social Informatics (SocialInformatics), 2012 International Conference on, pages 78–85. IEEE
Xuan Q, Okano A, Devanbu P, Filkov V (2014) Focus-shifting patterns of oss developers and their congruence with call graphs. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 401–412 ACM
Acknowledgments
The authors would like to thank the members of our DECAL research group and Prof. Qi Xuan for the valuable discussion about the ideas and technical details presented in this paper. We thank also Dr. Bogdan Vasilescu for his contributions in designing the survey and for his insightful comments and feedback on this work, and Mehrdad Afshari for his help in improving the paper. The comments by the anonymous reviewers helped us make this paper better, for which we are thankful. Both authors gratefully acknowledge support from the Air Force Office of Scientific Research, award FA955-11-1-0246.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Filippo Lanubile
Appendices
Appendix A Developer Questionnaire
The questionnaire is sent to each individual through email. Each email starts with a proper introduction of the authors, and our research. afterwards, they are asked to complete the form and submit it to us.
ASF Collaborative Development Questionnaire
* Required
How would you describe your involvement in this project? e.g., project founder, core developer, ...
How frequently did/do you work on this mentioned project? *
-
◯ Daily
-
◯ Once per 2-3 days
-
◯ Once per week
-
◯ Less than once per week
What are some typical tasks you carried out in this project? Please give a few examples. e.g., fixing bugs, implementing a new feature, ...
How do you choose which tasks to work on? Do you choose your own tasks? How do you prioritize which tasks to work on first?
How long did tasks you worked on typically take, from start to finish? * If you were part of a bigger task, please answer with the overall task in mind
-
◯ 1-2 days
-
◯ 3-5 days
-
◯ A week
-
◯ 2 weeks
-
◯ Other:
When does work by others influence you / your work directly? *
When it is in the same files you are touching at the time; the same packages; the whole project or something else
-
◯ The file(s) I am working on
-
◯ The package(s) I am working on
-
◯ The whole project
-
◯ Other:
Which of your tasks do you consider to be more collaborative than the others? e.g., bug fixes, adding new features, ....
How many people do collaborative tasks typically involve?
-
◯ 2
-
◯ 3
-
◯ 4
-
◯ 5
-
◯ 6
-
◯ more
How do you coordinate your work with collaborators on the same task? What communication channels do you use? Do you discuss with them prior to task assignment, during task work, or after task completion?
How do you adjust your working style when collaborating as opposed to during solitary work, if at all? e.g.,, by committing less frequently, or by pushing smaller commits more frequently, ...
When is it beneficial and when is it detrimental to collaborate with others on the same task?
Please tell us how much you agree or disagree with the following sentences *
Appendix B Verification of Data Mining Scripts
Our scripts are based on scripts developed by Bird et al., which we have slighlty modified to fit our purposes. Both ours and their scripts are available at http://www.gharehyazie.com/supplementary/teamwork/miningscripts/. As this data gathering step is critical to the analyses downstream, we proceeded to verify their accuracy. To that end, we randomly selected three months (June 2008, April 2009, and Feburary 2010) and three of our 26 projects (Abdera, Harmony and Cayenne). We then manually iterated over all of the messages by those selected projects during those selected time periods. Overall about 1200 messages were inspected during this process, as follows.
We observed the message senders, subject, timestamp, thread IDs, and body. This information was then compared to the corresponding entries for the messages in the projects’ mailing list archive available at http://mail-archives.apache.org/mod_mbox/. While almost everything was consistent the original archive, two issues were discovered:
-
1.
The timestamp of messages stored in our database were off by a few hours compared to the archives. Upon further investigation, we identified the issue to be the way we parse the timezone information. This inconsistency does not affect our results since it results in a time discrepancy in message timestamps of at most one day and our study is insensitive to this resolution of time.
-
2.
The last message of each month was not recorded in our database. This resulted in a difference of 12 messages per project per year between our database and the actual archives, a difference of 1%.
Rights and permissions
About this article
Cite this article
Gharehyazie, M., Filkov, V. Tracing distributed collaborative development in apache software foundation projects. Empir Software Eng 22, 1795–1830 (2017). https://doi.org/10.1007/s10664-016-9463-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-016-9463-3