The small-world and scale-free structure of an internet technological community

Dimitris Assimakopoulos

THE SMALL WORLD AND SCALE-FREE STRUCTURE OF AN INTERNET TECHNICAL COMMUNITY Jie Yan Grenoble Ecole de Management 12 rue Pierre Semard, BP127, 38003 Grenoble, France jie.yan@grenoble-em.com Dimitris Assimakopoulos Grenoble Ecole de Management 12 rue Pierre Semard, BP127, 38003 Grenoble, France dimitris.assimakopoulos@grenoble-em.com Received (Day Month Year) Revised (Day Month Year) Two network topology models, ‘small world’ and ‘scale free’ networks, have been recently introduced by physicists and mathematicians generating a growing interest by natural and social scientists alike.. In this paper, we analyse the structure of the questioning and replying network in a very large Internet technical community, China Software Development Net (CSDN), which is the biggest Chinese language software technical forum with over one million registered members in early 2006. Results reveal that the CSDN network presents both small world and scale free properties. The technology and knowledge management implications for this network structure are discussed with respect to technical knowledge and innovation diffusion. Keywords: Network topology, small world network, scale free network, Internet community. 1. Introduction In the past decade or so, virtual communities have become increasingly popular. People join in online Internet communities to interact, make friends and share information and knowledge about their common interests. Knowledge and innovation management scholars have recognized the value added from such online informal interaction across organizational boundaries enabling work in inter-organizational communities and networks of practice (Brown and Duguid 2001, Teigland 2003). Professionals share work related information and knowledge in such online communities, seeking advice to solve technical problems (Assimakopoulos and Yan 2006), exploring opportunities for business collaboration and enabling personal professional development (Wasko and Faraj 2000). In parallel, two network topology models, ‘small world’ and ‘scale free’ networks, have been introduced by physicists and mathematicians (Watts and Strogatz 1998, Barabasi and Albert 1999, Newman 2001) generating a considerable interest by natural and social scientists alike. Empirical research has been carried out using these models and exploring neuronal networks (Watts 1999), biological networks (Koch and Laurent 1999), scientific collaboration networks (Newman 2001), email networks (Ebel et al 2002), telecommunication networks (Schintler et al 2003), airline transport networks (Guimerà et al 2005), and online communities (Ravid and Rafaeli 2004, Adamic et al 2003). 1 We apply and test these two network models in exploring the structural properties of an Internet technical community, China Software Development Net (CSDN), and investigating how these specific network structures influence the practice of knowledge diffusion in the community. CSDN is the biggest Chinese language software technical forum with over one million registered members in January 2006. The members are located throughout mainland China and South-East Asia. The forum consists of 30 discussion sub-forums specializing in different software and hardware technology areas. Everyday ten thousands software engineers visit the forum producing thousands threads of discussions about the technical problems engineers meet in their daily development work. We gathered relational data of questioning and replying linkages in CSDN during a three month period in the fall of 2005. 74,066 members participated in the online technical discussion during the three month period of data collection, delivering 188,776 threads consisting of 1,207,433 reply posts. This paper is divided in seven additional sections. In the next section, we briefly discuss network communities. In section 3, we present a profile of the technical discussion activities in CSDN forum. And then in section 4 we discuss the network topology models developed in the past decade or so. Section 5 focuses on methodological issues, introducing the process of data collection and social network analysis. Section 6 presents the results of data analysis, highlighting the structural properties of the CSDN network according to the topology models presented in section 4. Section 7 discusses further these results and explains their implications for technology management and practice. Last but not least, section 8 highlights some directions for further research. 2. Network Communities: Offline and Online Continuous development of transportation and communication technologies has greatly expanded the reach of people’s social activities in the past decades. In contemporary society, people do not only interact with others in their local neighborhood, but also with people who live across large geographical distances. This far distance interaction has transformed peoples’ understanding of the notion of community, from a localized concept to a sparsely knitted interpersonal network of social ties, i.e. network community (Wellman 1979, Wellman et al 1997). Consistent with each person’s multiple social relations, for example as employee, friend, neighbor, family member, or sports club fan, everybody maintains several corresponding social networks at the same time. People benefit from their personal networks by obtaining resources, seizing opportunities, and reducing uncertainty (Wellman and Gulia 1999, Burt 2004). Network communities of importance are interest-based communities, which comprise people sharing a common interest. Some of these communities are based on a hobby, for example, antique collectors or sports funs. They provide opportunities, channels, and venues for community members to share time, and informally share resources, such as information, personal ideas and feelings. Traditionally, these communities exist locally. Some have regular meetings or events, in which members gather and share some time face-to-face. As people who have similar interests do not always live close to each other, they need an easy way to communicate with each other. Since the early 1990s, the Internet has significantly enabled long distance communication and interaction. Interestbased communities increasingly use mailing lists, or their own websites to connect members and share information and exchange messages (Castells 2000). 2 According to Rheingold (2000), Internet virtual communities are the ‘social aggregations that emerge from the Net when enough people carry on those public discussions long enough, with sufficient human feeling, to form webs of personal relationship in the cyberspace’. Schubert and Ginsburg (2000) describe Internet communities as the union between individuals or organizations who share common values and interests using electronic media to communicate within a shared semantic space on a regular basis. The emergence of virtual communities rises from people’s need to gather and participate in informal public spaces in everyday life and their primordial wish to look for a sense of ‘community’. Jones (1995) has the same view, that Internet based ‘discussion groups’ spring from the need to rebuild a sense of community in everyday life. Virtual communities take various forms of computer mediated group communication, such as mailing lists, newsgroups, bulletin boards, chat rooms, and multiple user domains. In some virtual communities, members never see each other in real life. Many virtual communities focus on work related professional practices, for example, scholars in academia (Koku and Wellman 2003), lawyers (Hara 2000), computer professionals (Wasko and Teigland 2002), and open source software developers (Lakhani and von Hippel 2000; O’Mahony and Ferraro 2004). These virtual communities provide opportunities, channels, and venues for professionals to share everyday work related resources, not just information, but also innovative ideas, solutions to specific problems, professional knowledge, and the latest thinking in their field of interest. Many participants treat such virtual communities as a place for learning and professional problem solving (Assimakopoulos and Yan 2006). Participants benefit from these communities by creating, accessing and exchanging new information, knowledge, expertise and innovative ideas not available in their local community of practice (Brown and Duguid 1998) and work environment. Current research on virtual communities mainly follows qualitative approach, analyzing the function and impact of virtual communities to social life, knowledge sharing and business practices (see the above references). Few studies were made to quantitatively understand the structural properties of this new form of community. This research adopts social network analysis techniques (Wasserman and Faust 1994; Nooy et al. 2005) to model the underlying structure of Internet community and explore the impact of this structure to knowledge and innovation management. 3. China Software Development Net The Internet technical community we studied is China Software Development Net (CSDN). CSDN (www.csdn.net) is the biggest Chinese language software technical forum. Since its foundation in late 1999, the forum has undergone constant and accelerating growth. Up to January 2006, some 1,000,000 software developers, system administrators and other IT professionals from 20,000 IT firms, organisations, etc., from all over mainland China, as well as Taiwan, Hong Kong, Singapore, and Malaysia, have been registered as members in the forum. The forum is based on a bulletin board system, consisting of 30 discussion sub-forums specializing in different software and hardware technology areas. During working hours, there are always several thousands members online in the forum taking part in technical discussions. 3 The online discussions in CSDN are organized in technical topics oriented subforums. Members are free to initiate a thread in any sub-forum by posting a message, seeking advice to solve their specific technical problem. Every day thousands of software technical problems are posted and discussed. Most of the online enquiries during working hours get replies within a few hours, if not minutes. The usual reply points out what is probably wrong and gives possible solutions. The senders are often professionals who have experience and knowledge of similar software problems. Most of the replies are less than 5 lines in length. Some posts which include original software code may be longer up to several pages. In some other, even longer posts, the contents are obviously copied from electronic technical documents. The senders advise to refer to the documentation for finding information about solving a particular problem. The vast majority of messages are written in mixed Chinese and English language, i.e. the software program code in English, and the diagnosis of the problem and suggestions for solutions in Chinese. In some cases when the problem is rather complex, the questioning-and-replying often evolves into interactive discussions among many interested participants. The answer to a particular question seems more a collaborative group result. It is common after several suggestions are provided by different respondents; the member who initially asked the question to test these alternatives, and report back to the forum including original software codes for input and output. More diagnosis and suggestions follow on the basis of the new information. In this way, more and more contextual information is provided and the online discussion goes deeper and deeper adding to the knowledge of all participants. Although most of the online discussions are completed within 10 exchanges, it is not infrequent to see some enquiries getting back 30 or more replies. Sometimes this interactive discussion would last a few days. Such open discussions on technical problems in the Internet forum have strong impact on the knowledge and innovation diffusion at regional and national level. As revealed by our previous research (Assimakopoulos and Yan 2006), Internet technical forums have overshadowed personal networks and become the most important external source for Chinese software engineers to acquire information and knowledge to solve their daily technical problems. Knowledge is shared among the participants, regardless of their physical location, organization affiliation, and technical background. Successful solutions found through Internet forums are often most valuable since they are very likely completely new to the local software engineer community of practice, and thus have great value for technical innovation of the software companies. Here we are particularly interested in the network structure of CSDN community, in order to understand the process of the knowledge and innovation diffusion happening in Internet technical communities. What a structure does CSDN community have? How does this specific structure influence knowledge and innovation diffusion in the community? To answer these questions, we consult two network topology models, i.e. small world and scale free networks developed recently by physicists and mathematicians. These two models have specific characteristics and implications to knowledge and innovation diffusion, as discussed below. 4 4. Network Topology Models 4.1 Small world network In large sparse networks, people often only maintain a small number of direct linkages. This however does not mean that people are isolated from the majority of community members. They can possibly reach many others in the network through a very short chain of ‘friends of friends’ (Boissevain 1973). One of the pioneers to study this phenomenon, back in the 1960s, was Harvard psychologist Stanley Milgram. Milgram asked people to send a letter through friends of friends to some complete strangers. To his surprise, Milgram found that letters traveled through chains consisting of average 6 intermediates to reach their destinations. Milgram’s theory was named as the ‘six degrees of separation’ of a ‘small world’. Despite some methodological ambiguities raised by later works (for example, Kleinfeld 2002), Milgram’s experiment indicates that the average ‘distance’, i.e., Average Path Length (APL), between two randomly chosen persons is small, i.e., six intermediaries, compared to the underlying huge population involved. Moreover, it was found that in such networks, there are many tightly-connected small groups with overlapping internal linkages, but few ties connecting to outsiders. In such small groups, a member’s directly connected neighbors are also often directly connected to each others. For example, in a friendship network, two friends of a given person are likely to be also friends. This phenomenon is called ‘clustering effect’ and it is measured by a ‘Clustering Coefficient’ (CC). CC is formally defined as the average faction of pairs of neighbors of a person which are also neighbors of each other. If the CC is relatively low, there is less clustering effect, and actors are rather isolated at local level, i.e., in the neighborhood of a given actor. Social networks with small APL and high CC are defined as small world networks (for detailed mathematical modeling, see, for example, Watts and Strogatz 1998). This network topology model has important implications for diffusion theory (Rogers 2003). Information, knowledge or innovation spread much faster in a network with a small APL, say 6 or less, compared to a network with an APL equal to, for example, 100. Many social networks have been proved to be small worlds showing both low degree of separation of nodes and high clustering effects. For example, networks of Hollywood actors who co-star in films (Watts and Strogatz 1998), co-authorship networks of academic papers in a variety of disciplines (Newman 2001), transportation networks (Latora and Marchiori 2002), company directors’ networks of the Fortune 1000 list of firms (Mariolis 2001), email communication networks (Dodds et al 2003), etc. 4.2 Random network A random network is defined as a network whose connections between actors happen at random. Random networks are based on two assumptions: firstly, the size of the network keeps unchanged as time elapses. That is, the network does not grow over time. Secondly, the probability of connection between any two nodes is equal for all nodes. That is, a connection happens at random with no preference whatsoever for any network member. As a result in a random network the number of connections each node has follows a Poisson distribution (Newman 2003). 5 Random networks also show small world properties, i.e., small APL compared to the size of the network. Let’s suppose in a large community with millions of memberships, everybody has 20 connections randomly. So, a person can reach 20 members directly, and reach 20n (n power of 20) members through n intermediaries. When n is equal to 5, 20n is equal to 3,200,000. This means in such a community with more than 3 million members, on average everybody can reach any other by a chain of 5 ‘friends of friends’. However, random networks do not demonstrate clustering effects. As all the connections happen by random, the probability of the connection between a person’s two friends to be friends is equal to those who are completely strangers. Therefore the CC in random networks is very low compared to small world networks. In this sense, CC is the key indicator which distinguishes random from small world networks. A small world network may have equivalent APL to a random network, but it ought to demonstrate much higher CC than the random network. 4.3 Scale free network According to some commentators such as Jeong (2003) the random network topology is a rather over-simplistic model. Key assumptions of random networks do not simply hold any water when applied to many real world networks. For example, in an Internet community, like the CSDN, membership does grow over time, and like in social networks of friendship, linkages obviously don’t form at random but show preferential attachment. Individual members show preference over some types of members who possess specific attributes and play active roles in the community. As a result of the shortcomings of random networks, Barabasi and Albert (1999) put forward a new topology model which addresses the issues discussed above with respect to random networks. Based on investigations of the structure of the World Wide Web and other social and technical networks, they discovered that these networks show a power law distribution, in which the vast majority of nodes has only few linkages, and a small number of nodes play a significant role by connecting extremely large number of nodes. Put it differently, only a small percentage of actors scores high out-degree centrality (Freeman et al 1991), and the vast majority of actors scores very low out-degree centrality. Thus the out-degree centrality of actors follows a power law distribution. This uneven distribution of linkages can be explained on the ground that networks do often develop over time, by adding new nodes and new linkages, and the new nodes are more likely to connect to the nodes that already have developed a large number of linkages. This type of network is termed as ‘scale free network’ (for detailed mathematical modelling, see Albert and Barabasi, 2002). So far, it has been shown that a broad range of networks, from biological networks (Koch and Laurent 1999), to cellular, protein and metabolic networks (Jeong et al 2000), email networks (Ebel et al 2002), the world wide web (Broder et al 2000), and telecommunication networks (Schintler et al 2003) are scale free networks. Although the above topological models were primarily developed by physicists, they have recently gained increasing attention in social science (Granovetter 2003) and management (Schilling and Phelps 2004a, 2004b) research. It is also worth noting that among the network studies mentioned above, several commentators have proven that a variety of networks are both small world and scale free networks. For example, the movie 6 co-star networks (Watts and Strogtz 1998), co-authorship networks (Newman 2001), Internet discussion groups (Adamic et al 2003, Ravid and Rafaeli 2004) and airway transport network (Guimerà et al 2005). A weakness of these topology models is however that they only model the existence or absence of a connection between any two nodes, and assume mutuality of ties, ignoring the ‘strength’ or ‘frequency’ of connection between actors. In other words, all linkages are viewed as equally important. Many real life networks however show diversity in the ‘strength’ or ‘significance’ of linkages. For example, in advice seeking networks among Chinese software engineers, some pairs of engineers have more frequent discussion about technical problems than others (Assimakopoulos and Yan 2006). 5. Methodology This research follows a Social Network Analysis (SNA) perspective (Wasserman and Faust 1994; Nooy et al. 2005) in modeling network structure. SNA is a set of theories and methods used to uncover and describe the underlying structure of social relationships. SNA assumes that individuals are embedded in social networks and how people behave primarily depends upon how they are tied to, and embedded with others, or where they are located in ongoing networks of relationships. By exploring questions like who is linked to whom, what is the nature of the linkages, and how do the linkages affect the actors’ and the communities’ behavior, SNA seeks to model social relationships and to describe the structure of any social group or community. The power of SNA stems from its fundamental difference from non-network sociological studies. In a non-network study, researchers often focus on attributes of individual actors, who are viewed as isolates and under-socialized profit maximizing seekers and agents. From a SNA perspective attributes of individuals are less important than their ongoing relationships and ties with other actors within the social network they are embedded. The behavior of actors therefore arises and is guided by structural or relational processes and norms arising out of the group and social network in which they are embedded. In many empirical research settings, SNA has shown significant advantage as the result of inclusion of relational information among social actors, in conjunction with statistical analysis of independent variables and attributes about the same set of actors. On top of using such metrics as mean, median, standard deviation, etc. borrowed from statistics, SNA provides an additional set of quantitative and qualitative concepts, vocabulary, and techniques to analyze relational data. For example, SNA uses the term ‘degree’ to measure how many other actors directly link to a certain actor; uses ‘centrality’ to measure how critical an actor is in a network according to a measure such as ‘degree’; and ‘density’ to measure how closely a group of actors are connected. There are many other important quantitative concepts, for example, centralization, component, clique, role, position, and so on (Wasserman and Faust, 1994). In this paper, we will adopt SNA techniques to analyze the distribution of the actors’ in and out degree in the CSDN technical community. The collection of relational data in terms of answering and replying linkages took place in CSDN during a three month period from October 3rd to December 31st of 2005 (91 days, 13 weeks). 74,066 software engineers took part in online discussion during this period and produced 188,776 threads of discussion, consisting of 1,207,433 reply posts. 7 Number of threads During the three months, as shown by Figure 1, 26,330 threads (13.9%) did not get any reply; 50.2% threads were completed within 1 to 5 replies; 21.3% threads were completed within 6 to 10 replies; only 4.9% threads got more than 20 replies. The ‘hottest’ question got 937 replies. On average, every thread had 6.4 replies, with standard deviation of 11.0. 100000 50.2% 80000 60000 21.3% 40000 13.9% 7.0% 20000 2.7% 2.3% 2.6% 16-20 21-30 ≥ 31 0 0 1-5 6-10 11-15 Number of replies in the thread Figure 1: Distribution of the length of the discussion threads in CSDN community The data of threads of discussion were recoded into person-to-person replying linkages. There are 770,913 linkages totally among the 74,066 nodes in the network. The linkages are directed from the member who posted replying message to the member who asked the question. In the online discussions, many people who initiated threads also took part in the discussion; this produced 39,247 self-replying linkages, i.e. loops, which do not give us meaningful information in terms of network structure. After removing these loops, there were 731,666 directed linkages in the CSDN network under study. It is worth also noting that the replying linkages are valued. One member may post more than one replies in one or many threads initiated by another member. The linkages with higher value are more data intensive and perhaps share more information than the linkages with lower value. In other words, they are perceived as ‘strong’ linkages. In the CSDN network, the average value of linkages is 1.31, with standard deviation 1.22. The ‘strongest’ linkage got a value of 116. Table 1 shows the distribution of intensity of these linkages. Table 1: Distribution of intensity of the linkages Intensity (value) of the linkages Number Percentage (%) 1 610,267 83.4 2 80,385 11.0 3 21,962 3.0 4 8,142 1.1 5 3,954 0.5 ≥6 6,956 1.0 However, scale free and small world network models take only account of the existence of linkages, and ignore the intensity of communication. This is one of the 8 drawbacks of these models (Granovetter 2003). To test whether the network under investigation is scale free or small world, we simply “binarized” all linkages; 1 denotes the existence of a ‘reply’ linkage, while 0 denotes no linkage between any two members of the network. For the data analysis we used the following software packages: Pajek (Batagelj and Mrvar 2004), Ucinet (Borgatti et al. 2002), SPSS and Microsoft Excel. 6. Results 6.1. Small world network The APL and CC indicators are the key indicators for detecting a small world network topology. A small world network should have a small APL, like a random network, and a CC with orders of magnitude higher than the equivalent random network. A small world network topology has therefore APL ≈ APLrandom and CC>>CCrandom. Watts and Strogatz (1998) have adopted the same criteria to test whether a network is small world, i.e., APL/APLrandom ≈ 1 and CC/CCrandom >> 1. APL and CC of a random network can be calculated as following: APLrandom = ln(n)/ln(k), and CCrandom = k/n Where, n is the number of actors of the network, and k is the average out degree of the network (for detailed mathematical modelling, see Newman 2003). In the CSDN network, APL = 4.5529 and CC = 0.05389. The size of CSDN network is n = 74066, and the average out degree of its actors is k = 9.8786. For a random network with the same number of nodes and linkages, APL = ln(n)/ln(k) = ln(74066)/ln(9.8786) = 4.8956, and CC = k/n = 0.0001334, see Table 2. Table 2: Average path lengths and clustering coefficients for CSDN and equivalent random network CSDN network Theoretical random network with same number of nodes and linkages Ratio between CSDN network and random network APL 4.5528 4.8956 0.9300 CC 0.05389 0.0001334 404.0 The APL of CSDN network is small, 4.5528. This suggests that a message from a member can reach any other members with less than 5 intermediaries on average. More importantly, the CC of CSDN network is 404 times than that of a random network with the same number of nodes and linkages. In other studies of small world networks, the ratio of CC/CCrandom is also high. For example, in Ravid and Rafaeli’s (2003) university online discussion groups, this ratio is 216; in Adamic’s (2003) Stanford University student Internet social club, this ratio is 40. Therefore we argue that the CSDN network has a small world topology. 6.2. Scale free network As we discussed above, in a scale free network, the degree of actors follows a power law distribution, i.e. exponential distribution. Generally an exponential distribution is represented with a formula as following: where a0 and a1 are coefficients Y = a0 * Xa1 9 We are particularly interested in the distribution of the out degree of CSDN community, because the outward linkages are actually the key vitality of the community. The members with high out degree form the core of the network, activising the online discussion and enabling the community sustainable over time. Therefore in our analysis the independent variable X is out degree, and dependent variable Y is the number of the actors who have the out degree, i.e. frequency. The formula can be presented as: frequency = a0 * (out degree)a1 When logarithmizing this formula, we get a linear model presenting the relation between lg(frequency) and lg(out degree): where b0 = lg(a0), b1 = a1 lg(frequency) = b0 + b1 * lg(out degree) We adopt a linear regression to model the CSDN network and find out the coefficients b0 and b1, and then calculate a0 (=10b0) and a1 (=b1). Table 3 shows the linear regression model as an output of SPSS. Table 3: The linear regression model between lg(frequency) and lg(out degree) Model Out degree b0 5.122 b1 -2.002 R2 0.929 a0 (=10b0) 132434.2 Sig 0.000 a1 (=b1) -2.002 Frequency Therefore, we get: lg(frequency) = 5.122 - 2.002 * lg(out degree) This is a very strong linear relation between lg(frequency) and lg(out degree) with R2 = 0.929. The linear relation is significant (p<0.001). When converting the formulas to exponential function, we get: frequency = 132434.2 * (out degree) -2.002 Figure 2 presents the exponential distribution of the out degree of the CSDN participants, and the linear model in logarithmic scale. 20000 16000 12000 8000 4000 0 0 1000 2000 3000 4000 5000 Out Degree of nodes Figure 2: Exponential distribution of the out degree of nodes in CSDN network The above analysis reveals the out degree of the participants is significantly exponentially distributed, and indicates the CSDN network is also a scale free network. 10 7. Discussion As the above analysis reveals, the questioning and answering network in CSDN online software technical community has both a small world and scale free network topology. Its specific topological structure has significant implications to the knowledge and innovation diffusion activities in the community. 7.1. Implications of small world topology In the CSDN software engineering community, there is a great deal of heterogeneity rooted in the large number of participants coming from a wide range of technical backgrounds. Ten thousands of software engineers hotly discuss technical questions in the 30 sub-forums focusing on different technical topics, making CSDN community an active virtual venue for information and knowledge sharing. As a small world network, CSDN’s very high clustering coefficient, which is 404 times higher than random network, indicates there are a huge number of cohesive clusters in the network. The members of each cluster generally share similar technical background and common interest in a specific technical topic, so that they have frequent discussion involving in everybody in the cluster. Frequency and intensity of interaction facilitate the development of trust and reciprocity norms that can increase individual’s willingness to exchange information (Bouty 2000). Information and knowledge circulate rapidly in these clusters due to the members’ homogenous background and frequent interaction. As a result of the short average path length, in CSDN community participants are not difficult to be connected through chains of a small number of members. Information and knowledge will be relatively easy to travel from one software engineer to any other engineers in the community. The short path length indicates abundant existence of weak ties (Grannovetter 1973) connecting the cohesive clusters, providing boundary-crossing channels for the specific knowledge of each homogenous cluster to flow from one to the other. The integration of information and knowledge from heterogeneous sources are important to innovation and the development of software technology practice both as micro company and macro regional and national level as a whole. 7.2. Implications of scale free topology The out degree of CSDN members are exponentially distributed, indicating a very small number of the participants send a big percentage of the replying linkages, while the majority of the members only participate in few threads of discussions. In social network analysis, the actors with big number of direct linkages are termed as hubs, which play important role in providing connectivity for the network. Table 4: Linkages sent from the top repliers Percentage of top repliers Percentage of linkages sent 0.01% 2.52% 0.1% 10.56% 1% 38.33% 5% 69.76% 10% 81.26% 11 Out degree is of particular interest to network researchers, because the hubs of outward linkages form the core of the community. As it is shown in Table 4, in the CSDN network, the top repliers, accounting for only 0.1% of the members, send more than 10% of the replying linkages, while the top 1% of members maintain nearly two fifths of the replying linkages. Nearly 70% of the replying linkages are from the top 5% of repliers. The members who have high out degree are generally technical experts. They have good experiences and skills in software development and, more importantly, have the goodwill and passion to help others solve technical problems. Some of the active members sent huge number of replying linkages, for example, the top three repliers have 4,775, 2,206 and 1,854 outward linkages during the three month period. This suggests the three members initiate 79.6, 36.8, 30.9 new replying linkages every day on average (60 working days). When we calculate the replying linkages initiated by the core members at different level, we find that the top 0.01% of the active repliers, 7 in number, initiate 34.1 new replying linkages everyday; the 0.1% core member (74 in number) initiate 17.4 new linkages; and the 1% core members (741 in number) initiate 6.3 linkages everyday. These figures are amazingly high when we consider that one linkage may carry many replying messages during the three month period. Due to the extremely uneven distribution of the linkages, information in scale free networks can diffuse rapidly. In traditional information diffusion and innovation adoption models, which are built up on the base of random network, the number of receivers follows an ‘S’ curvy. Only when the information bearers or new innovation users surpass a critical threshold, the diffusion begins speed up and explodes to the system wide in an exponential fashion. Below the threads, the diffusion of information or innovation will die out (Valente 1994). Users Normal diffusion curve Diff usion curve in scale free netw orks Time Figure 3: Diffusion curve in scale free networks Physicists recently have proved that in scale free networks, the threshold of explosion is actually zero (Dezso and Barabais 2002, Gronlund and Holme 2005). Due to the ‘scale free’ connectivity of hubs, new information or innovation reach hubs almost directly after the entry of the information into the network, and then spread to a huge number of nodes connected to the hubs rapidly. The information will saturate the network just in few steps of diffusion. In CSDN community, this means the new technology can be spread to all 12 the community members through the answering and replying network in a very short period of time. The overwhelming role the hubs play in scale free networks also brings a problem that the network is very vulnerable to intentional systematic attacks. Scale free network is very robust when they are under random attacks. The network would survive even if up to 80% of the nodes are randomly removed. While in the case of intentional attacks, the network would collapse only when the key hubs are destroyed. As shown above, the top 1% active members of CSDN community, 740 in number, have nearly 40% of the outward linkages. It is likely if we remove these 740 active members from the network, the community, which has one million memberships, would collapse because when a question is posted in the forum, nobody answers it. The removal of the key hubs however must happen simultaneously, as scale free networks have strong capability of selfhealing. When an insufficient number of hubs are removed, new hubs would emerge rapidly with the growth of the network. 8. Further Research In this paper, we have primarily analyzed the structural properties of CSDN software engineering community following scale free and small world models. Further research will path to the following directions: 1. Analysis at individual level. What are the characteristics of the key actors who play different roles in the community as bridges, hubs, or opinion leaders? How can we quantitatively measure and evaluate their roles in the community? 2. Analysis at sub-structure level. How are the cohesive groups in the Internet technical community organized? What are the topological features of these substructures? Is it free of the restriction of the geographic distribution of community members? What are the underlying disciplines grouping the members? 3. The dynamics of the community. How did the structure of the network evolve during the past five years when the community grew from a start-up to 1 million memberships? What happened to the exponential coefficients a0 and a1, APL and CC during this process? References 1. Adamic, L.A, Buyukkokten, O. and Adar, E. (2003) A social network caught in the web, FirstMonday, 8 (6) [online] available at [ www.firstmonday.dk/issues/issue8_6/adamic/ ] 2. Albert, R. and A.L Barabasi (2002) Statistical mechanics of complex networks, Reviews of Modern Physics, 74: 47-97. 3. Assimakopoulos, D. and Yan, J. (2006) Sources of Knowledge Acquisition for Chinese Software Engineers, R&D Management, 36 (1): 97-106. 4. Barabasi, A.L. and Albert, R. (1999) Emergence of scaling in random networks. Science 286: 509–512. 5. Batagelj, V. and Mrvar, A. (2004) Pajek: Program for Large Network Analysis, [online] available at [http://vlado.fmf.uni-lj.si/pub/networks/pajek/] 6. Boissevain, J. (1974) Friends of Friends: Networks, Manipulators, and Coalitions, Oxford: Blackwell. 7. Borgatti, S.P., Everett, M.G. and Freeman, L.C. (2002) Ucinet for Windows: Software for Social Network Analysis. Harvard, MA: Analytic Technologies. 8. Bouty, I (2000) Interpresonal and interaction influences on informal resource exchanges between R&D researchers across organizational boundaries, Academy of Management Journal, 43: 50-66. 13 9. Broder A., R. Kumar, F. Maghoul, P. Raghavan, S. Rajalopagan, R. Stata, A. Tomkins and J. Wiener (2000), Graph Structure in the Web. Comput. Networks, 33: 309-320. 10. Brown, J.S. and Duguid, (2001) Knowledge and organization: a social-practice perspective, Organization Science, 12 (2): 198-213. 11. Brown, J.S. and Duguid, P. (1998) Organizing Knowledge, California Management Review, 40 (3): 90-112. 12. Burt, R. (2004) Structural Holes and Good Ideas, American Journal of Sociology, 110 (2): 349-99. 13. Castells, M. (2000) The Rise of the Networked Society (2nd edition), Oxford: Blackwell. 14. Dezso, Z. and Barabais, A.L. (2002) Halting viruses in scale-free networks, Physical Review, 65, 055103 (R). 15. Dodde, P., Muhamad, R. and Watts, D.J. (2003) An experimental study of search in global social networks, Science, 301: 827-829. 16. Ebel,H., Mielsch, L.I. and Bornholdt, S. (2002), Scale-free topology of e-mail networks, Physical Review, 66. 17. Freeman, L.C., White, D.R. and Romney, A.K. (1991) Research Methods in Social Network Analysis (eds.). George Mason University Press, 1991. 18. Granovetter, M. (2003) Ignorance, knowledge and outcomes in a small world, Science, 301: 773-774. 19. Granovetter, M.S. (1973) The Strength of Weak Ties, American Journal of Sociology, 78 (6): 1360-1380. 20. Gronlund, A. and Holme, P. (2005) A network-based threshold model for the spreading of fads in society and markets, Physics, 0505050 v1. 21. Guimerà, R, Mossa, S, Turtschi, A and Amaral, L. (2005) The worldwide air transportation network: Anomalous centrality, community structure, and cities' global roles Proceeding of National Academy of Science, U.S.A. 102: 7794-7799. 22. Hara, N. (2000) Social construction of knowledge in professional communities of practice: tales in courtrooms. Unpublished doctoral dissertation, Bloomington: Indiana University. 23. Jeong H., B. Tombor, R. Albert, Z.N. Oltvai and A.-L. Barabasi (2000), The Large-Scale Organization of Metabolic Networks. Nature, 407: 651-654. 24. Jeong, H. (2003) Complex scale–free networks, Physica A, 321: 226-237. 25. Jones, S.G. (1995) Understanding Community in the Information Age, in Jones, S.G. (Ed.), Cybersociety: Computer-Mediated Communication and Community, 10-35, London: Sage. 26. Kleinfeld, J.S. (2002) Could It Be A Big World After All? Society, 39: 61-66. 27. Koch, C. and Laurent, G. (1999) Complexity and the Nervous System, Science, 284 (5411): 96-98. 28. Koku, E. and Wellman, B. (2003) Scholarly Networks as Learning Communities: The Case of Technet, in Barab, B., Kling, R. and Gray, J., (Eds) Designing for Virtual Communities in the Service of Learning, Cambridge: Cambridge University Press, 299-337. 29. Lakhani and von Hippel (2000) How Open Source software works: Free user-to-user assistance, MIT Sloan School of Management Working Paper #4117, [available at http://web.mit.edu/evhippel/www/opensource.PDF] 30. Latora V. and M. Marchiori (2002), Is the Boston Subway a Small-World Network? Physica A, 314: .109. 31. Mariolis, P. (2001) Interlocking directorates and control of corporations, Social Science Quarterly, 56: 425-439. 32. Newman, M.E.J. (2003)The structure and function of complex networks. SIAM Review, 45 (2): 167-256. 33. Newman, M.E.J., (2001) Scientific collaboration networks: I. Network construction and fundamental results. Physics Review, 64. 34. Nooy, W., Mrvar. A. and Batagelj, V. (2005) Exploratory Social Network Analysis with Pajek, Cambridge: Cambridge University Press. 35. O’Mahony, S. and Ferraro, F. (2004) Manage the boundary of an open project, Working paper No.537, IESE Business School, University of Navarra, Spain. 14 36. Ravid, G. and Rafaeli, S. (2004) Asynchronous discussion groups as small world and scale free networks, FirstMonday, 9(9), [online] available at [http://www.firstmonday.org/issues/issues9_9/ravid/] 37. Rheingold, H. (2000) The Virtual Community: Homesteading on the Electronic Frontier (revised edition), Cambridge, MA: MIT Press. 38. Rogers, E.M. (2003) Diffusion of Innovations (5th edition), New York: Free Press. 39. Schintler, L.A., Gorman, S.P., Reggiani, R., Patuelli, R. and Nijkamp, P. (2003) Scale-Free Phenomena in Communications networks: A Cross Atlantic Comparison, [online] available at [http://ideas.repec.org/p/wiw/wiwrsa/ersa03p436.html] 40. Schubert, P. and Ginsburg, M. (2000) Virtual communities of transaction: the role of personalization in electronic commerce, Electronic Markets, 10 (1): 45-55. 41. Teigland, R. (2003) Knowledge Networking: Structure and Performance in Network of Practice, Stockholm: Stockholm School of Economics Press. 42. Valente, T. (1994) Network Models of the Diffusion of Innovations, New Jersey: Hampton Press. 43. Wasko, M and Faraj, S. (2000) It is what one does: Why people participate and help others in electronic communities of practice, Journal of Strategic Information System, 9 (2-3): 155-173. 44. Wasko, M.M. and Teigland, R. (2002) The provision of online public goods: Examining social structure in an electronic network of practice, in the Proceedings of the 23rd International Conference on Information System, Barcelona, Spain. 45. Wasserman, S. and Faust, K. (1994) Social Network Analysis: Methods and Application, Cambridge University Press. 46. Watts, D. J. (1999) Small Worlds: The Dynamics of Networks Between Order and Randomness, Princeton: Princeton University Press. 47. Watts, D.J. and Strogatz, H (1998) Collective dynamics of ‘small world’ networks, Nature, 393: 440-442. 48. Wellman, B. (1979) The community question, American Journal of Sociology, 84, 1201-1231. 49. Wellman, B., R. Wong, D. Tindall and N. Nazer (1997) A Decade of Network Change: Turnover, Mobility and Stability. Social Networks 19 (1): 27-51. 15

RELATED PAPERS

RELATED TOPICS

Log In

The small-world and scale-free structure of an internet technological community

The small-world and scale-free structure of an internet technological community

Related Papers

RELATED PAPERS

RELATED TOPICS