Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Biclique communities

Sune Lehmann, Martin Schwartz, and Lars Kai Hansen
Phys. Rev. E 78, 016108 – Published 21 July 2008

Abstract

We present a method for detecting communities in bipartite networks. Based on an extension of the k-clique community detection algorithm, we demonstrate how modular structure in bipartite networks presents itself as overlapping bicliques. If bipartite information is available, the biclique community detection algorithm retains all of the advantages of the k-clique algorithm, but avoids discarding important structural information when performing a one-mode projection of the network. Further, the biclique community detection algorithm provides a level of flexibility by incorporating independent clique thresholds for each of the nonoverlapping node sets in the bipartite network.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Received 26 October 2007

DOI:https://doi.org/10.1103/PhysRevE.78.016108

©2008 American Physical Society

Authors & Affiliations

Sune Lehmann1,2,3, Martin Schwartz3,4, and Lars Kai Hansen3

  • 1Center for Complex Network Research and Department of Physics, Northeastern University, Boston, Massachusetts 02115, USA
  • 2Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Harvard University, Boston, Massachusetts 02115, USA
  • 3Informatics and Mathematical Modelling, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
  • 4IT University of Copenhagen, DK-2300 Copenhagen S, Denmark

Article Text (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand
Issue

Vol. 78, Iss. 1 — July 2008

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×

Images

  • Figure 1
    Figure 1
    (Color online) Three distinct bipartite networks that result in identical one-mode projections. In the first case, (a), the nodes Δ={a,b,c,d} share the single node Γ={1} in the complementary set. (b) The second case, that includes the nodes Δ={a,b,c,d} and the complementary nodes Γ={1,2,3,4,5,6}, has every pair of nodes in the Δ level linked via different nodes in the complementary set. In the third case, (c) three of the the four Δ={a,b,c,d} nodes, (a,b,d), share a single node in the complementary node set, while all other linkages between Δ nodes in this network are pairwise and run via nodes in the complementary set that are exclusive to the two nodes linked.Reuse & Permissions
  • Figure 2
    Figure 2
    (Color online) Maximally connected bigraphs. The notation Ka,b means that the complete bigraph consists of a of the black nodes in the Δ set and b of the larger nodes nodes in the Γ set.Reuse & Permissions
  • Figure 3
    Figure 3
    (Color online) Biclique adjacency. Two Ka,b cliques are adjacent if they share at least a Ka1,b1 clique. In this figure we list a few examples. The two adjacent K1,2 cliques share a K0,1 biclique, the two adjacent K1,3 cliques share a K0,2 clique, the two adjacent K2,2 cliques overlap by a K1,1 clique, and the two adjacent K2,3 cliques share a K1,2 clique.Reuse & Permissions
  • Figure 4
    Figure 4
    (Color online) Networks of communities for in cond-mat network [40] for various choices of Ka,b. In these plots, authors are represented by the dark red and papers are represented by light green; thus author node overlap is shown as a dark red link and paper overlap is shown as a light green link. Panel (a) shows the network of communities for K1,2, panel (b) shows the network of communities for K8,2, panel (c) shows the network for K3,5, and panel (d) describes the case of K2,12. See the main text for details. All panels are screenshots from BCFinder [33].Reuse & Permissions
  • Figure 5
    Figure 5
    (Color online) In many sparse real world networks, the number of maximal bicliques grows linearly with the size of the input data base. Panel (a) shows the number of maximal bicliques found for the IMDb [19] and cond-mat [40] networks as a function of size of the networks (measured in number of edges). The solid line labeled by circles shows is the number of bicliques found in the IMDb data and the dashed line labeled by circles is the number of bicliques in the randomized version of the same network; the lines labeled by triangles show the same quantities for the cond-mat network. The network was randomized using a bipartite version of the algorithm suggested in [39]. There are significant differences between the real and randomized data sets in the IMDb data, whereas there is little change for the cond-mat data. These differences are mainly due to the fact that, on average, there are more actors involved in the production of movies than there are authors of scientific papers. A forthcoming paper discusses the subject of biclique motifs in various bipartite networks. Panel (b) shows the growth of the bipartite adjacency matrix as a function of the number of edges included in the analysis; solid line marked by squares is the number of distinct movies and the dashed line marked by squares is the number of actors; the lines labeled by triangles display the number of authors (solid line) and the number of papers (dashed line) for the cond-mat network. The incremental growth of the number of movies in the IMDb network is explained in the main text.Reuse & Permissions
  • Figure 6
    Figure 6
    (Color online) The biclique algorithm in action on the cond-mat network [40] (years: 1996–2006). The top panel shows a K3,5-clique community of 4 authors and 11 papers; this community is a group of scientists studying econo-physics. The bottom panel shows another K3,5-clique community, this time consisting of 5 authors and 13 papers. The topic of this second community is bio-physics, more specifically analyses of various biological time series. A key point is that two authors (Stanley and Amaral) are members of both communities. The division into biclique communities make it immediately underlines the importance of node overlap: There is no doubt that Stanley and Amaral are full members of both communities. However, it is also immediately clear why the communities are distinct: they regard different subjects. The presence of context (a list of authors are complemented by a list of papers and vice versa) highly enriches our understanding of the communities. A list of authors and papers in these two communities can be found in the Appendix. Both panels are screenshots from BCFinder [33].Reuse & Permissions
×

Sign up to receive regular email alerts from Physical Review E

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×