Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1453101.1453107acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Latent social structure in open source projects

Published: 09 November 2008 Publication History

Abstract

Commercial software project managers design project organizational structure carefully, mindful of available skills, division of labour, geographical boundaries, etc. These organizational "cathedrals" are to be contrasted with the "bazaar-like" nature of Open Source Software (OSS) Projects, which have no pre-designed organizational structure. Any structure that exists is dynamic, self-organizing, latent, and usually not explicitly stated. Still, in large, complex, successful, OSS projects, we do expect that subcommunities will form spontaneously within the developer teams. Studying these subcommunities, and their behavior can shed light on how successful OSS projects self-organize. This phenomenon could well hold important lessons for how commercial software teams might be organized. Building on known well-established techniques for detecting community structure in complex networks, we extract and study latent subcommunities from the email social network of several projects: Apache HTTPD, Python, PostgresSQL, Perl, and Apache ANT. We then validate them with software development activity history. Our results show that subcommunities do indeed spontaneously arise within these projects as the projects evolve. These subcommunities manifest most strongly in technical discussions, and are significantly connected with collaboration behaviour.

References

[1]
Ahuja, Manju K., Galletta, Dennis F., and Carley, Kathleen M. Individual centrality and performance in virtual r&d groups: An empirical study. Management Science, 49(1):21--38, jan 2003.
[2]
T. Allen et al. Managing the flow of technology. Cambridge: The MIT Pr., 1979.
[3]
U. Alon. Biological Networks: The Tinkerer as an Engineer. Science, 301(5641):1866--1867, 2003.
[4]
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 44--54, 2006.
[5]
Y. Benjamini and Y. Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289--300, 1995.
[6]
J. Berkus. The 5 types of open source projects. March 20, 2007 http://www.powerpostgresql.com/5_types.
[7]
C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan. Mining email social networks. In Proceedings of the 3rd International Workshop on Mining Software Repositories, 2006.
[8]
C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan. Mining email social networks in postgres. In Proceedings of the 3rd International Workshop on Mining Software Repositories, 2006.
[9]
C. Bird, A. Gourley, P. Devanbu, A. Swaminathan, and G. Hsu. Open borders? immigration in open source projects. In MSR '07: Proceedings of the Fourth International Workshop on Mining Software Repositories, page 6, Washington, DC, USA, 2007. IEEE Computer Society.
[10]
G. Box, W. Hunter, and J. Hunter. Statistics for experimenters: an introductory to design data analysis and model building. Wiley Series in Probability and Mathematical Statistics)., 1978.
[11]
P. Boykin and V. Roychowdhury. Personal Email Networks: An Effective Anti-Spam Tool. Arxiv preprint cond-mat/0402143, 2004.
[12]
F. Brooks. The mythical man-month. Addison-Wesley, 1995.
[13]
M. Cataldo, P. Wagstrom, J. Herbsleb, and K. Carley. Identification of coordination requirements: implications for the Design of collaboration and awareness tools. Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work, pages 353--362, 2006.
[14]
A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70(6):66111, 2004.
[15]
J. F. P. D. Cleidson de Souza. Seeking the source: Software source code as a social and technical artifact, 2005. http://opensource.mit.edu/papers/desouza.pdf.
[16]
M. Conway. How do committees invent. Datamation, 14(4):28--31, 1968.
[17]
K. Crowston and J. Howison. The social structure of free and open source software development. First Monday, 10(2), 2005.
[18]
B. Curtis, H. Krasner, and N. Iscoe. A field study of the software design process for large systems. Commun. ACM, 31(11):1268--1287, 1988.
[19]
P. Dalgaard. Introductory Statistics With R. Springer, 2002.
[20]
L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas. Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 9:P09008, 2005.
[21]
S. N. Dorogovtsev and J. F. F. Mendes. Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, 2003.
[22]
N. Ducheneaut. Socialization in an Open Source Software Community: A Socio-Technical Analysis. Computer Supported Cooperative Work (CSCW), 14(4):323--368, 2005.
[23]
N. Ducheneaut and L. Watts. In search of coherence: a review of e-mail research. Human-Computer Interaction, 20(1--2):11--48, 2005.
[24]
K. Ehrlich, K. Chang, I. Res, and M. Cambridge. Leveraging expertise in global software teams: Going outside boundaries. Global Software Engineering, 2006. ICGSE'06. International Conference on, pages 149--158, 2006.
[25]
M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. WH Freeman & Co. New York, NY, USA, 1979.
[26]
M. Girvan and M. E. J. Newman. Community structure in social and biological networks. PROC. NATL. ACAD. SCI. USA, 99:7821, 2002.
[27]
C. Gkantsidis, M. Mihail, and E. Zegura. The markov chain simulation method for generating connected power law random graphs. In Proceedings of ALENEX '03, pages 16--25, 2003.
[28]
P. Gleiser and L. Danon. Community structure in jazz. Advances in Complex Systems, 6:565, 2003.
[29]
J. González-Barahona, L. López, and G. Robles. Community structure of modules in the apache project. In MSR '05: Proceedings of the 2005 international workshop on Mining software repositories, 2005.
[30]
R. Guimera, L. Danon, A. Diaz-Guilera, F. Giralt, and A. Arenas. Self-similar community structure in organisations. Physical Review E, 68:065103, 2003.
[31]
R. Guimerà, S. Mossa, A. Turtschi, and L. Amaral. From the Cover: The worldwide air transportation network: Anomalous centrality, community structure, and cities' global roles. Proc Natl Acad Sci US A, 102(22):7794--7799, 2005.
[32]
R. M. Henderson and K. B. Clark. Architectural innovation: The reconfiguration of existing product technologies and the failure of established firms. Administrative Science Quarterly, 35(1):9--30, 1990.
[33]
J. Herbsleb. Global Software Engineering: The Future of Socio-technical Coordination. International Conference on Software Engineering, pages 188--198, 2007.
[34]
J. D. Herbsleb and A. Mockus. Formulation and preliminary test of an empirical theory of coordination in software engineering. In ESEC / SIGSOFT FSE, pages 138--137, 2003.
[35]
G. Hertel, S. Niedner, and S. Herrmann. Motivation of software developers in Open Source projects: an Internet-based survey of contributors to the Linux kernel. Research Policy, 32(7):1159--1177, 2003.
[36]
P. Hinds and C. McGrath. Structures that work: social structure, work structure and coordination ease in geographically distributed teams. In CSCW '06: Proceedings of the 20th conference on Computer supported cooperative work, pages 343--352, New York, NY, USA, 2006. ACM.
[37]
A. Hintze and C. Adami. Evolution of complex modular biological networks. PloS Computational Biology, e23.eor, 2008.
[38]
L. Hossain, A. Wu, and K. K. S. Chung. Actor centrality correlates to project based coordination. In Proceedings of the 20th conference on Computer supported cooperative work, pages 363--372, 2006.
[39]
H. Ibarra. Network centrality, power, and innovation involvement: Determinants of technical and administrative roles. The Academy of Management Journal, 36(3):471--501, jun 1993.
[40]
N. Kashtan and U. Alon. Spontaneous evolution of modularity and network motifs. Proceedings of the National Academy of Sciences, 102(39):13773--13778, 2005.
[41]
K. Kuwabara. Linux: A bazaar at the edge of chaos. First Monday, 5(3), March 2000.
[42]
L. Layman, L. Williams, D. Damian, and H. Bures. Essential communication practices for Extreme Programming in a global software development team. Information and Software Technology, 48(9):781--794, 2006.
[43]
L. Lopez, J. M. Gonzalez-Barahona, and G. Robles. Applying social network analysis to the information in cvs repositories. In Proceedings of the International Workshop on Mining Software Repositories, 2004.
[44]
R. Milo, N. Kashtan, S. Itzkovitz, M. E. J. Newman, and U. Alon. On the uniform generation of random graphs with prescribed degree sequences. Arxiv preprint cond-mat/0312028, 2003.
[45]
A. Mockus, R. Fielding, and J. Herbsleb. A case study of open source software development: The Apache server. In Proceedings of the 22nd International Conference on Software Engineering (ICSE 2000), pages 263--272, Limerick, Ireland, 2000.
[46]
A. Mockus, R. T. Fielding, and J. D. Herbsleb. Two case studies of Open Source software development: Apache and Mozilla. ACM Transactions on Software Engineering and Methodology, 11(3):309--346, 2002.
[47]
M. Molloy and B. Reed. A critical point for random graphs with a given degree sequence. Random Struct. Algorithms, 6(2--3):161--179, 1995.
[48]
K. Nakakoji, Y. Yamamoto, Y. Nishinaka, K. Kishida, and Y. Ye. Evolution patterns of open-source software systems and communities. Proceedings of the International Workshop on Principles of Software Evolution, pages 76--85, 2002.
[49]
M. E. J. Newman. Analysis of weighted networks. Physical Review E, 70:056131, 2004.
[50]
M. E. J. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 74(3):36104, 2006.
[51]
M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69(2):026113, Feb 2004.
[52]
M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E, 64(2), Jul 2001.
[53]
D. Parnas. The criteria to be used in decomposing systems into modules. Communications of the ACM, 14(1):221--227, 1972.
[54]
E. S. Raymond. The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary. O'Reilly and Associates, Sebastopol, California, 1999.
[55]
M. P. Robillard. Bellairs workshop on recommender systems, 3 2008.
[56]
M. Sosa, S. Eppinger, and C. Rowles. The Misalignment of Product Architecture and Organizational Structure in Complex Product Development. Management Science, 50(12):1674--1689, 2004.
[57]
M. L. Tushman and R. Katz. External communication and project performance: An investigation into the role of gatekeepers. Management Science, 26(11):1071--1085, 1980.
[58]
J. Tyler, D. Wilkinson, and B. Huberman. E-Mail as Spectroscopy: Automated Discovery of Community Structure within Organizations. The Information Society, 21(2):143--153, 2005.
[59]
G. Valetto, M. Helander, K. Ehrlich, S. Chulani, M. Wegman, and C. Williams. Using Software Repositories to Investigate Socio-technical Congruence in Development Projects. Proceedings of the Fourth International Workshop on Mining Software Repositories, 2007.
[60]
P. Wagstrom, J. Herbsleb, and K. Carley. A Social Network Approach To Free/Open Source Software Simulation. Proceedings of the 1st International Conference on Open Source Systems, Genova, 11th--15th July, 2005.
[61]
S. Wasserman and K. Faust. Social network analysis: Methods and applications. Cambridge University Press, 1994.
[62]
J. Xu, Y. Gao, S. Christley, and G. Madey. A topological analysis of the open source software development community. In HICSS '05: Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences, 2005.
[63]
Y. Ye, Y. Yamamoto, and K. Nakakoji. A socio-technical framework for supporting programmers. Proceedings of the 6th joint meeting of the european software engineering conference and the 14th ACM SIGSOFT symposium on Foundations of software engineering, pages 351--360, 2007.
[64]
J. Yoon, A. Blumer, and K. Lee. An algorithm for modularity analysis of directed and weighted biological networks based on edge-betweenness centrality. Bioinformatics, 22(24):3106, 2006.
[65]
E. Ziv, M. Middendorf, and C. Wiggins. Information-theoretic approach to network modularity. Physical Review E, 71(4):46117, 2005.

Cited By

View all
  • (2024)Curated Email-Based Code Reviews DatasetsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644872(294-298)Online publication date: 15-Apr-2024
  • (2023)A Systematic Analysis of Problems in Open Collaborative Data EngineeringACM Transactions on Social Computing10.1145/36290406:3-4(1-30)Online publication date: 9-Dec-2023
  • (2023)"Nip it in the Bud": Moderation Strategies in Open Source Software Projects and the Role of BotsProceedings of the ACM on Human-Computer Interaction10.1145/36100927:CSCW2(1-29)Online publication date: 4-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSOFT '08/FSE-16: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
November 2008
369 pages
ISBN:9781595939951
DOI:10.1145/1453101
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. collaboration
  2. open source software
  3. social networks

Qualifiers

  • Research-article

Conference

SIGSOFT '08/FSE-16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)5
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Curated Email-Based Code Reviews DatasetsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644872(294-298)Online publication date: 15-Apr-2024
  • (2023)A Systematic Analysis of Problems in Open Collaborative Data EngineeringACM Transactions on Social Computing10.1145/36290406:3-4(1-30)Online publication date: 9-Dec-2023
  • (2023)"Nip it in the Bud": Moderation Strategies in Open Source Software Projects and the Role of BotsProceedings of the ACM on Human-Computer Interaction10.1145/36100927:CSCW2(1-29)Online publication date: 4-Oct-2023
  • (2023)Automatic Core-Developer Identification on GitHub: A Validation StudyACM Transactions on Software Engineering and Methodology10.1145/359380332:6(1-29)Online publication date: 30-Sep-2023
  • (2023)Hierarchical and Hybrid Organizational Structures in Open-source Software Projects: A Longitudinal StudyACM Transactions on Software Engineering and Methodology10.1145/356994932:4(1-29)Online publication date: 26-May-2023
  • (2023)Behind Developer Contributions on Conflicting Merge Scenarios2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM59687.2023.00014(25-36)Online publication date: 2-Oct-2023
  • (2023)On the Self-Governance and Episodic Changes in Apache Incubator Projects: An Empirical StudyProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00066(678-689)Online publication date: 14-May-2023
  • (2023)Predicting merge conflicts considering social and technical assetsEmpirical Software Engineering10.1007/s10664-023-10395-829:1Online publication date: 15-Dec-2023
  • (2023)Cross-status communication and project outcomes in OSS developmentEmpirical Software Engineering10.1007/s10664-023-10298-828:3Online publication date: 12-May-2023
  • (2023)One Microservice per Developer: Is This the Trend in OSS?Service-Oriented and Cloud Computing10.1007/978-3-031-46235-1_2(19-34)Online publication date: 12-Oct-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media