THE SMALL WORLD AND SCALE-FREE STRUCTURE OF AN INTERNET
TECHNICAL COMMUNITY
Jie Yan
Grenoble Ecole de Management
12 rue Pierre Semard, BP127, 38003 Grenoble, France
jie.yan@grenoble-em.com
Dimitris Assimakopoulos
Grenoble Ecole de Management
12 rue Pierre Semard, BP127, 38003 Grenoble, France
dimitris.assimakopoulos@grenoble-em.com
Received (Day Month Year)
Revised (Day Month Year)
Two network topology models, ‘small world’ and ‘scale free’ networks, have been recently
introduced by physicists and mathematicians generating a growing interest by natural and social
scientists alike.. In this paper, we analyse the structure of the questioning and replying network in a
very large Internet technical community, China Software Development Net (CSDN), which is the
biggest Chinese language software technical forum with over one million registered members in
early 2006. Results reveal that the CSDN network presents both small world and scale free
properties. The technology and knowledge management implications for this network structure are
discussed with respect to technical knowledge and innovation diffusion.
Keywords: Network topology, small world network, scale free network, Internet community.
1. Introduction
In the past decade or so, virtual communities have become increasingly popular. People
join in online Internet communities to interact, make friends and share information and
knowledge about their common interests. Knowledge and innovation management
scholars have recognized the value added from such online informal interaction across
organizational boundaries enabling work in inter-organizational communities and
networks of practice (Brown and Duguid 2001, Teigland 2003). Professionals share work
related information and knowledge in such online communities, seeking advice to solve
technical problems (Assimakopoulos and Yan 2006), exploring opportunities for business
collaboration and enabling personal professional development (Wasko and Faraj 2000).
In parallel, two network topology models, ‘small world’ and ‘scale free’ networks,
have been introduced by physicists and mathematicians (Watts and Strogatz 1998,
Barabasi and Albert 1999, Newman 2001) generating a considerable interest by natural
and social scientists alike. Empirical research has been carried out using these models and
exploring neuronal networks (Watts 1999), biological networks (Koch and Laurent
1999), scientific collaboration networks (Newman 2001), email networks (Ebel et al
2002), telecommunication networks (Schintler et al 2003), airline transport networks
(Guimerà et al 2005), and online communities (Ravid and Rafaeli 2004, Adamic et al
2003).
1
We apply and test these two network models in exploring the structural properties of
an Internet technical community, China Software Development Net (CSDN), and
investigating how these specific network structures influence the practice of knowledge
diffusion in the community. CSDN is the biggest Chinese language software technical
forum with over one million registered members in January 2006. The members are
located throughout mainland China and South-East Asia. The forum consists of 30
discussion sub-forums specializing in different software and hardware technology areas.
Everyday ten thousands software engineers visit the forum producing thousands threads
of discussions about the technical problems engineers meet in their daily development
work. We gathered relational data of questioning and replying linkages in CSDN during a
three month period in the fall of 2005. 74,066 members participated in the online
technical discussion during the three month period of data collection, delivering 188,776
threads consisting of 1,207,433 reply posts.
This paper is divided in seven additional sections. In the next section, we briefly
discuss network communities. In section 3, we present a profile of the technical
discussion activities in CSDN forum. And then in section 4 we discuss the network
topology models developed in the past decade or so. Section 5 focuses on methodological
issues, introducing the process of data collection and social network analysis. Section 6
presents the results of data analysis, highlighting the structural properties of the CSDN
network according to the topology models presented in section 4. Section 7 discusses
further these results and explains their implications for technology management and
practice. Last but not least, section 8 highlights some directions for further research.
2. Network Communities: Offline and Online
Continuous development of transportation and communication technologies has greatly
expanded the reach of people’s social activities in the past decades. In contemporary
society, people do not only interact with others in their local neighborhood, but also with
people who live across large geographical distances. This far distance interaction has
transformed peoples’ understanding of the notion of community, from a localized concept
to a sparsely knitted interpersonal network of social ties, i.e. network community
(Wellman 1979, Wellman et al 1997). Consistent with each person’s multiple social
relations, for example as employee, friend, neighbor, family member, or sports club fan,
everybody maintains several corresponding social networks at the same time. People
benefit from their personal networks by obtaining resources, seizing opportunities, and
reducing uncertainty (Wellman and Gulia 1999, Burt 2004).
Network communities of importance are interest-based communities, which comprise
people sharing a common interest. Some of these communities are based on a hobby, for
example, antique collectors or sports funs. They provide opportunities, channels, and
venues for community members to share time, and informally share resources, such as
information, personal ideas and feelings. Traditionally, these communities exist locally.
Some have regular meetings or events, in which members gather and share some time
face-to-face. As people who have similar interests do not always live close to each other,
they need an easy way to communicate with each other. Since the early 1990s, the
Internet has significantly enabled long distance communication and interaction. Interestbased communities increasingly use mailing lists, or their own websites to connect
members and share information and exchange messages (Castells 2000).
2
According to Rheingold (2000), Internet virtual communities are the ‘social
aggregations that emerge from the Net when enough people carry on those public
discussions long enough, with sufficient human feeling, to form webs of personal
relationship in the cyberspace’. Schubert and Ginsburg (2000) describe Internet
communities as the union between individuals or organizations who share common
values and interests using electronic media to communicate within a shared semantic
space on a regular basis. The emergence of virtual communities rises from people’s need
to gather and participate in informal public spaces in everyday life and their primordial
wish to look for a sense of ‘community’. Jones (1995) has the same view, that Internet
based ‘discussion groups’ spring from the need to rebuild a sense of community in
everyday life. Virtual communities take various forms of computer mediated group
communication, such as mailing lists, newsgroups, bulletin boards, chat rooms, and
multiple user domains. In some virtual communities, members never see each other in
real life.
Many virtual communities focus on work related professional practices, for example,
scholars in academia (Koku and Wellman 2003), lawyers (Hara 2000), computer
professionals (Wasko and Teigland 2002), and open source software developers (Lakhani
and von Hippel 2000; O’Mahony and Ferraro 2004). These virtual communities provide
opportunities, channels, and venues for professionals to share everyday work related
resources, not just information, but also innovative ideas, solutions to specific problems,
professional knowledge, and the latest thinking in their field of interest. Many
participants treat such virtual communities as a place for learning and professional
problem solving (Assimakopoulos and Yan 2006). Participants benefit from these
communities by creating, accessing and exchanging new information, knowledge,
expertise and innovative ideas not available in their local community of practice (Brown
and Duguid 1998) and work environment.
Current research on virtual communities mainly follows qualitative approach,
analyzing the function and impact of virtual communities to social life, knowledge
sharing and business practices (see the above references). Few studies were made to
quantitatively understand the structural properties of this new form of community. This
research adopts social network analysis techniques (Wasserman and Faust 1994; Nooy et
al. 2005) to model the underlying structure of Internet community and explore the impact
of this structure to knowledge and innovation management.
3. China Software Development Net
The Internet technical community we studied is China Software Development Net
(CSDN). CSDN (www.csdn.net) is the biggest Chinese language software technical
forum. Since its foundation in late 1999, the forum has undergone constant and
accelerating growth. Up to January 2006, some 1,000,000 software developers, system
administrators and other IT professionals from 20,000 IT firms, organisations, etc., from
all over mainland China, as well as Taiwan, Hong Kong, Singapore, and Malaysia, have
been registered as members in the forum. The forum is based on a bulletin board system,
consisting of 30 discussion sub-forums specializing in different software and hardware
technology areas. During working hours, there are always several thousands members
online in the forum taking part in technical discussions.
3
The online discussions in CSDN are organized in technical topics oriented subforums. Members are free to initiate a thread in any sub-forum by posting a message,
seeking advice to solve their specific technical problem. Every day thousands of software
technical problems are posted and discussed. Most of the online enquiries during working
hours get replies within a few hours, if not minutes. The usual reply points out what is
probably wrong and gives possible solutions. The senders are often professionals who
have experience and knowledge of similar software problems. Most of the replies are less
than 5 lines in length. Some posts which include original software code may be longer up
to several pages. In some other, even longer posts, the contents are obviously copied from
electronic technical documents. The senders advise to refer to the documentation for
finding information about solving a particular problem. The vast majority of messages
are written in mixed Chinese and English language, i.e. the software program code in
English, and the diagnosis of the problem and suggestions for solutions in Chinese.
In some cases when the problem is rather complex, the questioning-and-replying
often evolves into interactive discussions among many interested participants. The
answer to a particular question seems more a collaborative group result. It is common
after several suggestions are provided by different respondents; the member who initially
asked the question to test these alternatives, and report back to the forum including
original software codes for input and output. More diagnosis and suggestions follow on
the basis of the new information. In this way, more and more contextual information is
provided and the online discussion goes deeper and deeper adding to the knowledge of all
participants. Although most of the online discussions are completed within 10 exchanges,
it is not infrequent to see some enquiries getting back 30 or more replies. Sometimes this
interactive discussion would last a few days.
Such open discussions on technical problems in the Internet forum have strong
impact on the knowledge and innovation diffusion at regional and national level. As
revealed by our previous research (Assimakopoulos and Yan 2006), Internet technical
forums have overshadowed personal networks and become the most important external
source for Chinese software engineers to acquire information and knowledge to solve
their daily technical problems. Knowledge is shared among the participants, regardless of
their physical location, organization affiliation, and technical background. Successful
solutions found through Internet forums are often most valuable since they are very likely
completely new to the local software engineer community of practice, and thus have great
value for technical innovation of the software companies.
Here we are particularly interested in the network structure of CSDN community, in
order to understand the process of the knowledge and innovation diffusion happening in
Internet technical communities. What a structure does CSDN community have? How
does this specific structure influence knowledge and innovation diffusion in the
community? To answer these questions, we consult two network topology models, i.e.
small world and scale free networks developed recently by physicists and
mathematicians. These two models have specific characteristics and implications to
knowledge and innovation diffusion, as discussed below.
4
4. Network Topology Models
4.1 Small world network
In large sparse networks, people often only maintain a small number of direct linkages.
This however does not mean that people are isolated from the majority of community
members. They can possibly reach many others in the network through a very short chain
of ‘friends of friends’ (Boissevain 1973). One of the pioneers to study this phenomenon,
back in the 1960s, was Harvard psychologist Stanley Milgram. Milgram asked people to
send a letter through friends of friends to some complete strangers. To his surprise,
Milgram found that letters traveled through chains consisting of average 6 intermediates
to reach their destinations. Milgram’s theory was named as the ‘six degrees of separation’
of a ‘small world’.
Despite some methodological ambiguities raised by later works (for example,
Kleinfeld 2002), Milgram’s experiment indicates that the average ‘distance’, i.e.,
Average Path Length (APL), between two randomly chosen persons is small, i.e., six
intermediaries, compared to the underlying huge population involved. Moreover, it was
found that in such networks, there are many tightly-connected small groups with
overlapping internal linkages, but few ties connecting to outsiders. In such small groups,
a member’s directly connected neighbors are also often directly connected to each others.
For example, in a friendship network, two friends of a given person are likely to be also
friends. This phenomenon is called ‘clustering effect’ and it is measured by a ‘Clustering
Coefficient’ (CC). CC is formally defined as the average faction of pairs of neighbors of
a person which are also neighbors of each other. If the CC is relatively low, there is less
clustering effect, and actors are rather isolated at local level, i.e., in the neighborhood of a
given actor.
Social networks with small APL and high CC are defined as small world networks
(for detailed mathematical modeling, see, for example, Watts and Strogatz 1998). This
network topology model has important implications for diffusion theory (Rogers 2003).
Information, knowledge or innovation spread much faster in a network with a small APL,
say 6 or less, compared to a network with an APL equal to, for example, 100. Many
social networks have been proved to be small worlds showing both low degree of
separation of nodes and high clustering effects. For example, networks of Hollywood
actors who co-star in films (Watts and Strogatz 1998), co-authorship networks of
academic papers in a variety of disciplines (Newman 2001), transportation networks
(Latora and Marchiori 2002), company directors’ networks of the Fortune 1000 list of
firms (Mariolis 2001), email communication networks (Dodds et al 2003), etc.
4.2 Random network
A random network is defined as a network whose connections between actors happen at
random. Random networks are based on two assumptions: firstly, the size of the network
keeps unchanged as time elapses. That is, the network does not grow over time.
Secondly, the probability of connection between any two nodes is equal for all nodes.
That is, a connection happens at random with no preference whatsoever for any network
member. As a result in a random network the number of connections each node has
follows a Poisson distribution (Newman 2003).
5
Random networks also show small world properties, i.e., small APL compared to the size
of the network. Let’s suppose in a large community with millions of memberships,
everybody has 20 connections randomly. So, a person can reach 20 members directly,
and reach 20n (n power of 20) members through n intermediaries. When n is equal to 5,
20n is equal to 3,200,000. This means in such a community with more than 3 million
members, on average everybody can reach any other by a chain of 5 ‘friends of friends’.
However, random networks do not demonstrate clustering effects. As all the
connections happen by random, the probability of the connection between a person’s two
friends to be friends is equal to those who are completely strangers. Therefore the CC in
random networks is very low compared to small world networks. In this sense, CC is the
key indicator which distinguishes random from small world networks. A small world
network may have equivalent APL to a random network, but it ought to demonstrate
much higher CC than the random network.
4.3 Scale free network
According to some commentators such as Jeong (2003) the random network topology is a
rather over-simplistic model. Key assumptions of random networks do not simply hold
any water when applied to many real world networks. For example, in an Internet
community, like the CSDN, membership does grow over time, and like in social
networks of friendship, linkages obviously don’t form at random but show preferential
attachment. Individual members show preference over some types of members who
possess specific attributes and play active roles in the community. As a result of the
shortcomings of random networks, Barabasi and Albert (1999) put forward a new
topology model which addresses the issues discussed above with respect to random
networks. Based on investigations of the structure of the World Wide Web and other
social and technical networks, they discovered that these networks show a power law
distribution, in which the vast majority of nodes has only few linkages, and a small
number of nodes play a significant role by connecting extremely large number of nodes.
Put it differently, only a small percentage of actors scores high out-degree centrality
(Freeman et al 1991), and the vast majority of actors scores very low out-degree
centrality. Thus the out-degree centrality of actors follows a power law distribution. This
uneven distribution of linkages can be explained on the ground that networks do often
develop over time, by adding new nodes and new linkages, and the new nodes are more
likely to connect to the nodes that already have developed a large number of linkages.
This type of network is termed as ‘scale free network’ (for detailed mathematical
modelling, see Albert and Barabasi, 2002). So far, it has been shown that a broad range of
networks, from biological networks (Koch and Laurent 1999), to cellular, protein and
metabolic networks (Jeong et al 2000), email networks (Ebel et al 2002), the world wide
web (Broder et al 2000), and telecommunication networks (Schintler et al 2003) are scale
free networks.
Although the above topological models were primarily developed by physicists, they
have recently gained increasing attention in social science (Granovetter 2003) and
management (Schilling and Phelps 2004a, 2004b) research. It is also worth noting that
among the network studies mentioned above, several commentators have proven that a
variety of networks are both small world and scale free networks. For example, the movie
6
co-star networks (Watts and Strogtz 1998), co-authorship networks (Newman 2001),
Internet discussion groups (Adamic et al 2003, Ravid and Rafaeli 2004) and airway
transport network (Guimerà et al 2005). A weakness of these topology models is however
that they only model the existence or absence of a connection between any two nodes,
and assume mutuality of ties, ignoring the ‘strength’ or ‘frequency’ of connection
between actors. In other words, all linkages are viewed as equally important. Many real
life networks however show diversity in the ‘strength’ or ‘significance’ of linkages. For
example, in advice seeking networks among Chinese software engineers, some pairs of
engineers have more frequent discussion about technical problems than others
(Assimakopoulos and Yan 2006).
5. Methodology
This research follows a Social Network Analysis (SNA) perspective (Wasserman and
Faust 1994; Nooy et al. 2005) in modeling network structure. SNA is a set of theories and
methods used to uncover and describe the underlying structure of social relationships.
SNA assumes that individuals are embedded in social networks and how people behave
primarily depends upon how they are tied to, and embedded with others, or where they
are located in ongoing networks of relationships. By exploring questions like who is
linked to whom, what is the nature of the linkages, and how do the linkages affect the
actors’ and the communities’ behavior, SNA seeks to model social relationships and to
describe the structure of any social group or community.
The power of SNA stems from its fundamental difference from non-network
sociological studies. In a non-network study, researchers often focus on attributes of
individual actors, who are viewed as isolates and under-socialized profit maximizing
seekers and agents. From a SNA perspective attributes of individuals are less important
than their ongoing relationships and ties with other actors within the social network they
are embedded. The behavior of actors therefore arises and is guided by structural or
relational processes and norms arising out of the group and social network in which they
are embedded. In many empirical research settings, SNA has shown significant
advantage as the result of inclusion of relational information among social actors, in
conjunction with statistical analysis of independent variables and attributes about the
same set of actors.
On top of using such metrics as mean, median, standard deviation, etc. borrowed
from statistics, SNA provides an additional set of quantitative and qualitative concepts,
vocabulary, and techniques to analyze relational data. For example, SNA uses the term
‘degree’ to measure how many other actors directly link to a certain actor; uses
‘centrality’ to measure how critical an actor is in a network according to a measure such
as ‘degree’; and ‘density’ to measure how closely a group of actors are connected. There
are many other important quantitative concepts, for example, centralization, component,
clique, role, position, and so on (Wasserman and Faust, 1994). In this paper, we will
adopt SNA techniques to analyze the distribution of the actors’ in and out degree in the
CSDN technical community.
The collection of relational data in terms of answering and replying linkages took
place in CSDN during a three month period from October 3rd to December 31st of 2005
(91 days, 13 weeks). 74,066 software engineers took part in online discussion during this
period and produced 188,776 threads of discussion, consisting of 1,207,433 reply posts.
7
Number of threads
During the three months, as shown by Figure 1, 26,330 threads (13.9%) did not get any
reply; 50.2% threads were completed within 1 to 5 replies; 21.3% threads were
completed within 6 to 10 replies; only 4.9% threads got more than 20 replies. The
‘hottest’ question got 937 replies. On average, every thread had 6.4 replies, with standard
deviation of 11.0.
100000
50.2%
80000
60000
21.3%
40000
13.9%
7.0%
20000
2.7%
2.3%
2.6%
16-20
21-30
≥ 31
0
0
1-5
6-10
11-15
Number of replies in the thread
Figure 1: Distribution of the length of the discussion threads in
CSDN community
The data of threads of discussion were recoded into person-to-person replying
linkages. There are 770,913 linkages totally among the 74,066 nodes in the network. The
linkages are directed from the member who posted replying message to the member who
asked the question. In the online discussions, many people who initiated threads also took
part in the discussion; this produced 39,247 self-replying linkages, i.e. loops, which do
not give us meaningful information in terms of network structure. After removing these
loops, there were 731,666 directed linkages in the CSDN network under study.
It is worth also noting that the replying linkages are valued. One member may post
more than one replies in one or many threads initiated by another member. The linkages
with higher value are more data intensive and perhaps share more information than the
linkages with lower value. In other words, they are perceived as ‘strong’ linkages. In the
CSDN network, the average value of linkages is 1.31, with standard deviation 1.22. The
‘strongest’ linkage got a value of 116. Table 1 shows the distribution of intensity of these
linkages.
Table 1: Distribution of intensity of the linkages
Intensity (value) of the linkages
Number
Percentage (%)
1
610,267
83.4
2
80,385
11.0
3
21,962
3.0
4
8,142
1.1
5
3,954
0.5
≥6
6,956
1.0
However, scale free and small world network models take only account of the
existence of linkages, and ignore the intensity of communication. This is one of the
8
drawbacks of these models (Granovetter 2003). To test whether the network under
investigation is scale free or small world, we simply “binarized” all linkages; 1 denotes
the existence of a ‘reply’ linkage, while 0 denotes no linkage between any two members
of the network.
For the data analysis we used the following software packages: Pajek (Batagelj and
Mrvar 2004), Ucinet (Borgatti et al. 2002), SPSS and Microsoft Excel.
6. Results
6.1. Small world network
The APL and CC indicators are the key indicators for detecting a small world network
topology. A small world network should have a small APL, like a random network, and a
CC with orders of magnitude higher than the equivalent random network. A small world
network topology has therefore APL ≈ APLrandom and CC>>CCrandom. Watts and Strogatz
(1998) have adopted the same criteria to test whether a network is small world, i.e.,
APL/APLrandom ≈ 1 and CC/CCrandom >> 1. APL and CC of a random network can be
calculated as following:
APLrandom = ln(n)/ln(k), and
CCrandom = k/n
Where, n is the number of actors of the network, and k is the average out degree of
the network (for detailed mathematical modelling, see Newman 2003). In the CSDN
network, APL = 4.5529 and CC = 0.05389. The size of CSDN network is n = 74066, and
the average out degree of its actors is k = 9.8786. For a random network with the same
number of nodes and linkages, APL = ln(n)/ln(k) = ln(74066)/ln(9.8786) = 4.8956, and
CC = k/n = 0.0001334, see Table 2.
Table 2: Average path lengths and clustering coefficients for CSDN and equivalent random network
CSDN network
Theoretical random network with same number of nodes and linkages
Ratio between CSDN network and random network
APL
4.5528
4.8956
0.9300
CC
0.05389
0.0001334
404.0
The APL of CSDN network is small, 4.5528. This suggests that a message from a
member can reach any other members with less than 5 intermediaries on average. More
importantly, the CC of CSDN network is 404 times than that of a random network with
the same number of nodes and linkages. In other studies of small world networks, the
ratio of CC/CCrandom is also high. For example, in Ravid and Rafaeli’s (2003) university
online discussion groups, this ratio is 216; in Adamic’s (2003) Stanford University
student Internet social club, this ratio is 40. Therefore we argue that the CSDN network
has a small world topology.
6.2. Scale free network
As we discussed above, in a scale free network, the degree of actors follows a power law
distribution, i.e. exponential distribution. Generally an exponential distribution is
represented with a formula as following:
where a0 and a1 are coefficients
Y = a0 * Xa1
9
We are particularly interested in the distribution of the out degree of CSDN
community, because the outward linkages are actually the key vitality of the community.
The members with high out degree form the core of the network, activising the online
discussion and enabling the community sustainable over time. Therefore in our analysis
the independent variable X is out degree, and dependent variable Y is the number of the
actors who have the out degree, i.e. frequency. The formula can be presented as:
frequency = a0 * (out degree)a1
When logarithmizing this formula, we get a linear model presenting the relation
between lg(frequency) and lg(out degree):
where b0 = lg(a0), b1 = a1
lg(frequency) = b0 + b1 * lg(out degree)
We adopt a linear regression to model the CSDN network and find out the
coefficients b0 and b1, and then calculate a0 (=10b0) and a1 (=b1). Table 3 shows the linear
regression model as an output of SPSS.
Table 3: The linear regression model between lg(frequency) and lg(out degree)
Model
Out degree
b0
5.122
b1
-2.002
R2
0.929
a0 (=10b0)
132434.2
Sig
0.000
a1 (=b1)
-2.002
Frequency
Therefore, we get:
lg(frequency) = 5.122 - 2.002 * lg(out degree)
This is a very strong linear relation between lg(frequency) and lg(out degree) with R2
= 0.929. The linear relation is significant (p<0.001). When converting the formulas to
exponential function, we get:
frequency = 132434.2 * (out degree) -2.002
Figure 2 presents the exponential distribution of the out degree of the CSDN
participants, and the linear model in logarithmic scale.
20000
16000
12000
8000
4000
0
0
1000
2000
3000
4000
5000
Out Degree of nodes
Figure 2: Exponential distribution of the out degree of nodes in
CSDN network
The above analysis reveals the out degree of the participants is significantly
exponentially distributed, and indicates the CSDN network is also a scale free network.
10
7. Discussion
As the above analysis reveals, the questioning and answering network in CSDN online
software technical community has both a small world and scale free network topology. Its
specific topological structure has significant implications to the knowledge and
innovation diffusion activities in the community.
7.1. Implications of small world topology
In the CSDN software engineering community, there is a great deal of heterogeneity
rooted in the large number of participants coming from a wide range of technical
backgrounds. Ten thousands of software engineers hotly discuss technical questions in
the 30 sub-forums focusing on different technical topics, making CSDN community an
active virtual venue for information and knowledge sharing. As a small world network,
CSDN’s very high clustering coefficient, which is 404 times higher than random
network, indicates there are a huge number of cohesive clusters in the network. The
members of each cluster generally share similar technical background and common
interest in a specific technical topic, so that they have frequent discussion involving in
everybody in the cluster. Frequency and intensity of interaction facilitate the development
of trust and reciprocity norms that can increase individual’s willingness to exchange
information (Bouty 2000). Information and knowledge circulate rapidly in these clusters
due to the members’ homogenous background and frequent interaction.
As a result of the short average path length, in CSDN community participants are not
difficult to be connected through chains of a small number of members. Information and
knowledge will be relatively easy to travel from one software engineer to any other
engineers in the community. The short path length indicates abundant existence of weak
ties (Grannovetter 1973) connecting the cohesive clusters, providing boundary-crossing
channels for the specific knowledge of each homogenous cluster to flow from one to the
other. The integration of information and knowledge from heterogeneous sources are
important to innovation and the development of software technology practice both as
micro company and macro regional and national level as a whole.
7.2. Implications of scale free topology
The out degree of CSDN members are exponentially distributed, indicating a very small
number of the participants send a big percentage of the replying linkages, while the
majority of the members only participate in few threads of discussions. In social network
analysis, the actors with big number of direct linkages are termed as hubs, which play
important role in providing connectivity for the network.
Table 4: Linkages sent from the top repliers
Percentage of top repliers
Percentage of linkages sent
0.01%
2.52%
0.1%
10.56%
1%
38.33%
5%
69.76%
10%
81.26%
11
Out degree is of particular interest to network researchers, because the hubs of
outward linkages form the core of the community. As it is shown in Table 4, in the
CSDN network, the top repliers, accounting for only 0.1% of the members, send more
than 10% of the replying linkages, while the top 1% of members maintain nearly two
fifths of the replying linkages. Nearly 70% of the replying linkages are from the top 5%
of repliers.
The members who have high out degree are generally technical experts. They have
good experiences and skills in software development and, more importantly, have the
goodwill and passion to help others solve technical problems. Some of the active
members sent huge number of replying linkages, for example, the top three repliers have
4,775, 2,206 and 1,854 outward linkages during the three month period. This suggests the
three members initiate 79.6, 36.8, 30.9 new replying linkages every day on average (60
working days). When we calculate the replying linkages initiated by the core members at
different level, we find that the top 0.01% of the active repliers, 7 in number, initiate 34.1
new replying linkages everyday; the 0.1% core member (74 in number) initiate 17.4 new
linkages; and the 1% core members (741 in number) initiate 6.3 linkages everyday. These
figures are amazingly high when we consider that one linkage may carry many replying
messages during the three month period.
Due to the extremely uneven distribution of the linkages, information in scale free
networks can diffuse rapidly. In traditional information diffusion and innovation adoption
models, which are built up on the base of random network, the number of receivers
follows an ‘S’ curvy. Only when the information bearers or new innovation users surpass
a critical threshold, the diffusion begins speed up and explodes to the system wide in an
exponential fashion. Below the threads, the diffusion of information or innovation will
die out (Valente 1994).
Users
Normal diffusion curve
Diff usion curve in scale free netw orks
Time
Figure 3: Diffusion curve in scale free networks
Physicists recently have proved that in scale free networks, the threshold of explosion
is actually zero (Dezso and Barabais 2002, Gronlund and Holme 2005). Due to the ‘scale
free’ connectivity of hubs, new information or innovation reach hubs almost directly after
the entry of the information into the network, and then spread to a huge number of nodes
connected to the hubs rapidly. The information will saturate the network just in few steps
of diffusion. In CSDN community, this means the new technology can be spread to all
12
the community members through the answering and replying network in a very short
period of time.
The overwhelming role the hubs play in scale free networks also brings a problem
that the network is very vulnerable to intentional systematic attacks. Scale free network is
very robust when they are under random attacks. The network would survive even if up
to 80% of the nodes are randomly removed. While in the case of intentional attacks, the
network would collapse only when the key hubs are destroyed. As shown above, the top
1% active members of CSDN community, 740 in number, have nearly 40% of the
outward linkages. It is likely if we remove these 740 active members from the network,
the community, which has one million memberships, would collapse because when a
question is posted in the forum, nobody answers it. The removal of the key hubs however
must happen simultaneously, as scale free networks have strong capability of selfhealing. When an insufficient number of hubs are removed, new hubs would emerge
rapidly with the growth of the network.
8. Further Research
In this paper, we have primarily analyzed the structural properties of CSDN software
engineering community following scale free and small world models. Further research
will path to the following directions:
1. Analysis at individual level. What are the characteristics of the key actors who play
different roles in the community as bridges, hubs, or opinion leaders? How can we
quantitatively measure and evaluate their roles in the community?
2. Analysis at sub-structure level. How are the cohesive groups in the Internet
technical community organized? What are the topological features of these substructures? Is it free of the restriction of the geographic distribution of community
members? What are the underlying disciplines grouping the members?
3. The dynamics of the community. How did the structure of the network evolve
during the past five years when the community grew from a start-up to 1 million
memberships? What happened to the exponential coefficients a0 and a1, APL and CC
during this process?
References
1. Adamic, L.A, Buyukkokten, O. and Adar, E. (2003) A social network caught in the web,
FirstMonday, 8 (6) [online] available at [ www.firstmonday.dk/issues/issue8_6/adamic/ ]
2. Albert, R. and A.L Barabasi (2002) Statistical mechanics of complex networks, Reviews of
Modern Physics, 74: 47-97.
3. Assimakopoulos, D. and Yan, J. (2006) Sources of Knowledge Acquisition for Chinese
Software Engineers, R&D Management, 36 (1): 97-106.
4. Barabasi, A.L. and Albert, R. (1999) Emergence of scaling in random networks. Science 286:
509–512.
5. Batagelj, V. and Mrvar, A. (2004) Pajek: Program for Large Network Analysis, [online]
available at [http://vlado.fmf.uni-lj.si/pub/networks/pajek/]
6. Boissevain, J. (1974) Friends of Friends: Networks, Manipulators, and Coalitions, Oxford:
Blackwell.
7. Borgatti, S.P., Everett, M.G. and Freeman, L.C. (2002) Ucinet for Windows: Software for
Social Network Analysis. Harvard, MA: Analytic Technologies.
8. Bouty, I (2000) Interpresonal and interaction influences on informal resource exchanges
between R&D researchers across organizational boundaries, Academy of Management
Journal, 43: 50-66.
13
9. Broder A., R. Kumar, F. Maghoul, P. Raghavan, S. Rajalopagan, R. Stata, A. Tomkins and J.
Wiener (2000), Graph Structure in the Web. Comput. Networks, 33: 309-320.
10. Brown, J.S. and Duguid, (2001) Knowledge and organization: a social-practice perspective,
Organization Science, 12 (2): 198-213.
11. Brown, J.S. and Duguid, P. (1998) Organizing Knowledge, California Management Review,
40 (3): 90-112.
12. Burt, R. (2004) Structural Holes and Good Ideas, American Journal of Sociology, 110 (2):
349-99.
13. Castells, M. (2000) The Rise of the Networked Society (2nd edition), Oxford: Blackwell.
14. Dezso, Z. and Barabais, A.L. (2002) Halting viruses in scale-free networks, Physical Review,
65, 055103 (R).
15. Dodde, P., Muhamad, R. and Watts, D.J. (2003) An experimental study of search in global
social networks, Science, 301: 827-829.
16. Ebel,H., Mielsch, L.I. and Bornholdt, S. (2002), Scale-free topology of e-mail networks,
Physical Review, 66.
17. Freeman, L.C., White, D.R. and Romney, A.K. (1991) Research Methods in Social Network
Analysis (eds.). George Mason University Press, 1991.
18. Granovetter, M. (2003) Ignorance, knowledge and outcomes in a small world, Science, 301:
773-774.
19. Granovetter, M.S. (1973) The Strength of Weak Ties, American Journal of Sociology, 78 (6):
1360-1380.
20. Gronlund, A. and Holme, P. (2005) A network-based threshold model for the spreading of
fads in society and markets, Physics, 0505050 v1.
21. Guimerà, R, Mossa, S, Turtschi, A and Amaral, L. (2005) The worldwide air transportation
network: Anomalous centrality, community structure, and cities' global roles Proceeding of
National Academy of Science, U.S.A. 102: 7794-7799.
22. Hara, N. (2000) Social construction of knowledge in professional communities of practice:
tales in courtrooms. Unpublished doctoral dissertation, Bloomington: Indiana University.
23. Jeong H., B. Tombor, R. Albert, Z.N. Oltvai and A.-L. Barabasi (2000), The Large-Scale
Organization of Metabolic Networks. Nature, 407: 651-654.
24. Jeong, H. (2003) Complex scale–free networks, Physica A, 321: 226-237.
25. Jones, S.G. (1995) Understanding Community in the Information Age, in Jones, S.G. (Ed.),
Cybersociety: Computer-Mediated Communication and Community, 10-35, London: Sage.
26. Kleinfeld, J.S. (2002) Could It Be A Big World After All? Society, 39: 61-66.
27. Koch, C. and Laurent, G. (1999) Complexity and the Nervous System, Science, 284 (5411):
96-98.
28. Koku, E. and Wellman, B. (2003) Scholarly Networks as Learning Communities: The Case of
Technet, in Barab, B., Kling, R. and Gray, J., (Eds) Designing for Virtual Communities in the
Service of Learning, Cambridge: Cambridge University Press, 299-337.
29. Lakhani and von Hippel (2000) How Open Source software works: Free user-to-user
assistance, MIT Sloan School of Management Working Paper #4117, [available at
http://web.mit.edu/evhippel/www/opensource.PDF]
30. Latora V. and M. Marchiori (2002), Is the Boston Subway a Small-World Network? Physica
A, 314: .109.
31. Mariolis, P. (2001) Interlocking directorates and control of corporations, Social Science
Quarterly, 56: 425-439.
32. Newman, M.E.J. (2003)The structure and function of complex networks. SIAM Review, 45
(2): 167-256.
33. Newman, M.E.J., (2001) Scientific collaboration networks: I. Network construction and
fundamental results. Physics Review, 64.
34. Nooy, W., Mrvar. A. and Batagelj, V. (2005) Exploratory Social Network Analysis with
Pajek, Cambridge: Cambridge University Press.
35. O’Mahony, S. and Ferraro, F. (2004) Manage the boundary of an open project, Working paper
No.537, IESE Business School, University of Navarra, Spain.
14
36. Ravid, G. and Rafaeli, S. (2004) Asynchronous discussion groups as small world and scale
free
networks,
FirstMonday,
9(9),
[online]
available
at
[http://www.firstmonday.org/issues/issues9_9/ravid/]
37. Rheingold, H. (2000) The Virtual Community: Homesteading on the Electronic Frontier
(revised edition), Cambridge, MA: MIT Press.
38. Rogers, E.M. (2003) Diffusion of Innovations (5th edition), New York: Free Press.
39. Schintler, L.A., Gorman, S.P., Reggiani, R., Patuelli, R. and Nijkamp, P. (2003) Scale-Free
Phenomena in Communications networks: A Cross Atlantic Comparison, [online] available at
[http://ideas.repec.org/p/wiw/wiwrsa/ersa03p436.html]
40. Schubert, P. and Ginsburg, M. (2000) Virtual communities of transaction: the role of
personalization in electronic commerce, Electronic Markets, 10 (1): 45-55.
41. Teigland, R. (2003) Knowledge Networking: Structure and Performance in Network of
Practice, Stockholm: Stockholm School of Economics Press.
42. Valente, T. (1994) Network Models of the Diffusion of Innovations, New Jersey: Hampton
Press.
43. Wasko, M and Faraj, S. (2000) It is what one does: Why people participate and help others in
electronic communities of practice, Journal of Strategic Information System, 9 (2-3): 155-173.
44. Wasko, M.M. and Teigland, R. (2002) The provision of online public goods: Examining social
structure in an electronic network of practice, in the Proceedings of the 23rd International
Conference on Information System, Barcelona, Spain.
45. Wasserman, S. and Faust, K. (1994) Social Network Analysis: Methods and Application,
Cambridge University Press.
46. Watts, D. J. (1999) Small Worlds: The Dynamics of Networks Between Order and
Randomness, Princeton: Princeton University Press.
47. Watts, D.J. and Strogatz, H (1998) Collective dynamics of ‘small world’ networks, Nature,
393: 440-442.
48. Wellman, B. (1979) The community question, American Journal of Sociology, 84, 1201-1231.
49. Wellman, B., R. Wong, D. Tindall and N. Nazer (1997) A Decade of Network Change:
Turnover, Mobility and Stability. Social Networks 19 (1): 27-51.
15