Abstract
Many real-world networks display hidden community structures with important potential implications in their dynamics. Many algorithms highly relevant to network analysis have been introduced to unveil community structures. Accurate assessment and comparison of alternative solutions are typically approached by benchmarking the target algorithm(s) on a set of diverse networks that exhibit a broad range of controlled features, ensuring the assessment contemplates multiple representative properties. Tools have been developed to synthesize bipartite networks, but none of the previous solutions address the issue of generating networks with overlapping community structures. This is the motivation for the BNOC tool introduced in this paper. It allows synthesizing bipartite networks that mimic a wide range of features from real-world networks, including overlapping community structures. Multiple parameters ensure flexibility in controlling the scale and topological properties of the networks and embedded communities. BNOC’s applicability is illustrated assessing and comparing two popular overlapping community detection algorithms on bipartite networks, namely HLC and OSLOM. Results reveal interesting features of the algorithms in this scenario and confirm the relevant role played by a suitable benchmarking tool. Finally, to validate our approach, we present results comparing networks synthesized with BNOC with those obtained with an existing benchmarking tool and with already established sets of synthetic networks, in two different scenarios.
Similar content being viewed by others
Notes
The tool will be made available immediately after paper acceptance.
References
Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466(7307):761–764
Akoglu L (2014) Quantifying political polarity based on bipartite opinion networks. In: Proceedings of the international AAAI conference on web and social media (AAAI) eighth international AAAI conference on weblogs and social media (ICWSM)
Akoglu L, Faloutsos C (2009) RTG: a recursive realistic graph generator using random typing. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) 5781 LNAI(PART 1):13–28
Alessandro M, Vittorio CC (2018) Leveraging the nonuniform PSO network model as a benchmark for performance evaluation in community detection and link prediction. New J Phys 20(6):063,022
Ali AM, Alvari H, Hajibagheri A, Lakkaraj K, Sukthankar G (2014) Synthetic generators for cloning social network data. In: Proceedings of the international conference on social informatics (SocInfo)
Armstrong TG, Ponnekanti V, Borthakur D, Callaghan M (2013) Linkbench : a database benchmark based on the facebook social graph. In: Proceedings of the international conference on management of data (SIGMOD), pp 1185–1196
Barabasi AL, Bonabeau E (2003) Scale-free networks. Sci Am 288(5):60–69
Barber MJ (2007) Modularity and community detection in bipartite networks. Phys Rev E Stat Nonlinear Soft Matter Phys 76(6):1–11
Barrett CL, Beckman RJ, Khan M, Kumar VSA, Marathe MV, Stretz PE, Dutta T, Lewis B (2009) Generation and analysis of large synthetic social contact networks. In: Proceedings of the winter simulation conference, WSC ’09, pp 1003–1014
Beckett SJ (2016) Improved community detection in weighted bipartite networks. R Soc Open Sci 3(1):140,536
Birmelé E (2009) A scale-free graph model based on bipartite graphs. Discrete Appl Math 157(10):2267–2284
Boncz P (2013) LDBC: benchmarks for graph and RDF data management. In: Proceedings of the international database engineering and applications symposium, pp 1–2
Capota M, Hegeman T, Iosup A, Prat-Pérez A, Erling O, Boncz P (2015) Graphalytics: a big data benchmark for graph-processing platforms. In: Proceedings of the graph data management experiences and systems (GRADES), pp 1–6
Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Proceedings of the society for industrial and applied mathematics (SIAM) international conference on data mining (SDM), p 5
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge
Cui Y, Wang X (2014) Uncovering overlapping community structures by the key bi-community and intimate degree in bipartite networks. Physica A Stat Mech Appl 407:7–14
Danon L, Díaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp P09:008
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Du N, Wang B, Wu B, Wang Y (2008) Overlapping community detection in bipartite networks. In: Proceedings of the international conference on web intelligence (IEEE/WIC/ACM) (60402011), pp 176–179
Faleiros TP, Rossi RG, de Andrade Lopes A (2017) Optimizing the class information divergence for transductive classification of texts using propagation in bipartite graphs. Pattern Recognit Lett 87(Supplement C):127–138
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Fruchterman TMJ, Reingold EM (1991) Graph drawing by force-directed placement. Softw Pract Exp 21(11):1129–1164
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99:7821–7826
Grujić J (2008) Movies recommendation networks as bipartite graphs. In: Proceedings of the international conference on computational science (ICCS). Springer, Berlin, pp 576–583
Hwang T, Sicotte H, Tian Z, Wu B, Kocher JP, Wigle DA, Kumar V, Kuang R (2008) Robust and efficient identification of biomarkers by classifying features on graphs. Bioinformatics 24(18):2023–2029
Jonnalagadda A, Kuppusamy L (2016) A survey on game theoretic models for community detection in social networks. Soc Netw Anal Min 6(1):83
Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E Stat Nonlinear Soft Matter Phys 80(5):1–11
Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E Stat Nonlinear Soft Matter Phys 78(4):1–5
Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S (2011) Finding statistically significant communities in networks. PLoS One 6(4):1–18
Largeron C, Mougel PN, Rabbany R, Zaïane OR (2015) Generating attributed networks with communities. PLoS One 10(4):1–21
Larremore DB, Clauset A, Jacobs AZ (2014) Efficiently inferring community structure in bipartite networks. Phys Rev E 90(012):805
Latapy M, Magnien C, Vecchio ND (2008) Basic notions for the analysis of large two-mode networks. Soc Netw 30(1):31–48
Lehmann S, Schwartz M, Hansen LK (2008) Biclique communities. Phys Rev E Stat Nonlinear Soft Matter Phys 78(1):1–9
Li Z, Zhang S, Zhang X (2015) Mathematical model and algorithm for link community detection in bipartite networks. Am J Oper Res 5:421–434
McDaid AF, Greene D, Hurley N (2011) Normalized mutual information to evaluate overlapping community finding algorithms. eprint arXiv:1110.2515
Melamed D (2014) Community structures in bipartite networks: a dual-projection approach. PLoS One 9(5):1–5
Moussiades L, Vakali A (2009) Benchmark graphs for the evaluation of clustering algorithms. In: Proceedings of the international conference on research challenges in information science (RCIS), pp 197–206
Nettleton DF (2016) A synthetic data generator for online social network graphs. Soc Netw Anal Min 6(1):44
Newman MEJ (2001a) Scientific collaboration networks. I. Network construction and fundamental results. Phys Rev E 64:016,131
Newman MEJ (2001b) Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys Rev E 64:016,132
Newman MEJ (2010) Networks: an introduction. Oxford University Press Inc, New York
Pasta MQ, Zaidi F (2016) Leveraging evolution dynamics to generate benchmark complex networks with community structures. eprint arXiv:1606.01169
Pérez-Rosés H, Sebé F (2014) Synthetic generation of social network data with endorsements. eprint arXiv:1411.6273
Pham MD, Boncz P, Erling O (2013) S3G2: A scalable structure-correlated social graph generator. In: Proceedings in selected topics in performance evaluation and benchmarking: 4th TPC technology conference (August)
Rabbany R, Takaffoli M, Fagnan J, Zaïane OR, Campello RJGB (2013) Communities validity: methodical evaluation of community mining algorithms. Soc Netw Anal Min 3(4):1039–1062
Rees BS, Gallagher KB (2012) Overlapping community detection using a community optimized graph swarm. Soc Netw Anal Min 2(4):405–417
Rosvall M, Delvenne JC, Schaub MT, Lambiotte R (2017) Different approaches to community detection. arXiv e-print arXiv:1712.06468
Shi C, Li Y, Zhang J, Sun Y, Philip SY (2017) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37
Souam F, Aitelhadj A, Baba-Ali R (2014) Dual modularity optimization for detecting overlapping communities in bipartite networks. Knowl Inf Syst 40(2):455–488
Uslu T, Mehler A (2018) PolyViz: a visualization system for a special kind of multipartite graphs. In: Proceedings of the IEEE VIS 2018
Valejo A, Drury B, Valverde-Rebaza J, de Alneu de Andrade Lopes (2014) Identification of related brazilian portuguese verb groups using overlapping community detection. In: Proceeding of the international conference on computational processing of the Portuguese language. Springer, Cham, pp 292–297
Valejo A, Valverde-Rebaza JC, de Andrade Lopes A (2014) A multilevel approach for overlapping community detection. In: Proceedings of the Brazilian conference on intelligent systems (BRACIS). Springer, Berlin
Valejo A, Oliveira MCRF, Filho GP, Lopes AA (2018) Multilevel approach for combinatorial optimization in bipartite network. Knowl-Based Syst 151:45–61. https://doi.org/10.1016/j.knosys.2018.03.021
Yang Z, Perotti JI, Tessone CJ (2017) Hierarchical benchmark graphs for testing community detection algorithms. Phys Rev E 96(052):311
Zhang ZY, Ahn YY (2015) Community detection in bipartite networks using weighted symmetric binary matrix factorization. Int J Mod Phys C 26:1–14
Zhong E, Fan W, Zhu Y, Yang Q (2013) Modeling the dynamics of composite social networks. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 937–945
Acknowledgements
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001. This work has been partially supported by the State of São Paulo Research Foundation (FAPESP) Grants 15/14228-9 and 17/05838-3; and the Brazilian Federal Research Council (CNPq) Grants 302645/2015-2 and 301847/2017-7.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A HNOC: Extension to k-partite and heterogeneous networks
Appendix A HNOC: Extension to k-partite and heterogeneous networks
As in the case of bipartite networks there is a lack of benchmarking tools to create k-partite and heterogeneous networks for assessing community detection and other algorithms. k-partite networks have k vertex types, rather than just two, with edges connecting only vertices of different types. In heterogeneous networks, this second restriction is dropped, i.e., edges can occur between vertices of the same type. Since they are direct generalizations of bipartite networks, we extended BNOC to synthesize general heterogeneous networks. This extension, called HNOC, can be useful to generate HIN models to support development and validation of new methods. HNOC inherits BNOC’s major features as a flexible and robust resource to synthesize a variety of benchmarking networks with distinct properties in reasonable times.
A heterogeneous information network (HIN) [48] consists of m disjoint subsets of vertices of different types (called layers) and edges connecting these elements. A network \(\mathcal {X}\) with m vertex types can be partitioned into subsets \(X_i = \{ x_{i,1}, \dots , x_{i,n_i} \}\) for each type i. A HIN is represented as a graph \(G = (V, E, W)\), where \(V = \bigcup _{i=1}^m X_i\) with \(m > 2\), E is the set of edges, and W is the set of edge weights.
HINs have a inherently complex structure that can be difficult to handle and visualize. The structure of connections can be described by a network schema [48], which defines a meta-template for the network that describes its vertex and connection types. Given a graph G, its network schema, denoted \(T_G(\mathcal {A}, \mathcal {R})\), is a directed graph defined over element types \(\mathcal {A}\) with edges as relations from \(\mathcal {R}\), obtained via mapping functions \(\varphi : V \rightarrow \mathcal {A}\) and \(\psi : E \rightarrow \mathcal {R}\), respectively. Figure 18 exemplifies network schemas for a bipartite network, a k-partite network and an heterogeneous network.
The schema explicitly identifies the m vertex types and the r connection types. Each connection type is described by the types of the two endpoints and the connection meaning, since pairs of entities can admit multiple types of connections, as in the case of multi-relational networks [56]. Thus, a HIN can be interpreted as a composition of r bipartite (or homogeneous, if both endpoints of the connections are of the same type) networks, as illustrated in Fig. 19.
Following this interpretation, we extended BNOC to generate synthetic heterogeneous networks. The extension iterates over each pair of vertex and connection types specified in the network schema, employing similar steps to build the communities in each iteration:
- 1.
Execute m iterations of BNOC’s Steps 1 and 2 to build each layer i with \(V_i\) vertices, set a single community on each layer and introduce the overlapping structures.
- 2.
For each pair of layers specified in the schema, execute BNOC’s Steps 3, 4 and 5 in order to establish the specified connections, weights, density and noise levels.
Since heterogeneous networks have multiple layers and multiple connection types, the extension required modifying some BNOC parameters and introducing a few additional parameters, as described in Table 3. The following parameters were added: the number of layers m and the set of connected layers, henceforth called “schema”, e. Furthermore, the dispersion and noise parameters must be defined for each schema, since each iteration handles a pair of connected layers.
Figure 20 illustrates two networks created with distinct parameter combinations. Unless informed otherwise, the parameter settings correspond to the default values informed in Tables 1 and 3. Figure 20a depicts a 4-partite network with some community overlapping.
The network was built so that all layers have the same number of communities, the probability set to produce balanced communities, different numbers of overlapping vertices in each layer, and the same dispersion (edge density) d in all pairs of connected layers. Figure 20b illustrates a 3-partite network with heterogeneous structure and edges between vertices of the same type in one of the layers. The example can describe a hypothetical author-paper-term network, in which authors are connected with their papers, terms are connected with their neighboring terms in the text and with the papers in which they appear. The upper left, central and upper right layers, represent, respectively, the Term, Paper and Author entities. The network has been created so that the connections between the different pairs of entities display different patterns, e.g., there is a dense topology of connections between terms and papers, whereas connections between terms and terms, or between authors and papers are sparser. The schema is inspired in the largely used real-world data of DBLPFootnote 12 (Digital Bibliography & Library Project), a computer science bibliographic dataset that relates documents, authors and terms.
Rights and permissions
About this article
Cite this article
Valejo, A., Góes, F., Romanetto, L. et al. A benchmarking tool for the generation of bipartite network models with overlapping communities. Knowl Inf Syst 62, 1641–1669 (2020). https://doi.org/10.1007/s10115-019-01411-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01411-9