Abstract
Motifs are small subgraph patterns that play a key role towards understanding the structure and the function of biological and social networks. The current de facto approach towards assessing the statistical significance of a motif \(\mathcal {M}\) relies on counting its occurrences across the network, and comparing that count to its expected count under some null generative model. This approach can be misleading due to combinatorial artifacts. That is, there may be a large count for a motif due to multiple copies sharing many vertices and edges connected to a subgraph, such as a clique, that completes the multiple copies of the motif.
In this work we introduce the novel concept of an (f, q)-spanning motif. A motif \(\mathcal {M}\) is (f, q)-spanning if there exists a q-fraction of the nodes that induces an f-fraction of the occurrences of \(\mathcal {M}\) in G. Intuitively, when f is close to 1, and q close to 0, most of the occurrences of \(\mathcal {M}\) are localized in a small set of nodes, and thus its statistical significance is likely to be due to a combinatorial artifact. We propose efficient heuristics for finding the maximum f for a given q and minimum q for a given f for which a motif is (f, q)-spanning and evaluate them on real-world datasets. Our methods successfully identify combinatorial artifacts that otherwise go undetected using the standard approach for assessing statistical significance.
Finally, we leverage the motif structure of a network to design MotifScope, an algorithm that takes as input a graph and two motifs \(\mathcal {M}_1, \mathcal {M}_2\), and finds subgraphs of the graph where \(\mathcal {M}_1, \mathcal {M}_2\) occur infrequently and frequently respectively. We show that a good selection of \(\mathcal {M}_1, \mathcal {M}_2\) allows us to find anomalies in large networks, including bipartite cliques in social graphs, and subgraphs rated with distrust in Bitcoin markets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
It is worth outlining that forcing \(f=1\), and thus simplifying the definition above to a (1, q)- or just q-spanning motif is not a robust in the following sense. Consider a graph that is the union of a linear number of node disjoint triangles, and a clique of order \(\sqrt{n}\). Each node in the graph participates in a triangle, and thus when \(f=1\), then \(q=1\). However, notice that most of the triangle occurrences appear in the small clique, i.e., \(O({\sqrt{n}})^3)=O(n^{3/2})\gg O(n)\). Thus for \(f=O(\frac{n^{3/2}}{n+n^{3/2}})=1-o(1)\), q suddenly becomes \(O( \frac{\sqrt{n}}{n})=o(1)\). Similarly, a graph could have multiple distinct smaller combinatorial artifacts, in which case f might be a constant further from 1 (e.g., 3 small subgraphs with each around 1/3 of the motif copies).
- 2.
While it aims to solve Problem 2, with minor changes it becomes a heuristic for Problem 1.
References
Artzy-Randrup, Y., Fleishman, S.J., Ben-Tal, N., Stone, L.: Comment on “network motifs: simple building blocks of complex networks” and “superfamilies of evolved and designed networks". Science 305(5687), 1107–1107 (2004)
Benson, A.R., Gleich, D.F., Leskovec, J.: Higher-order organization of complex networks. Science 353(6295), 163–166 (2016)
Bhaskara, A., Charikar, M., Chlamtac, E., Feige, U., Vijayaraghavan, A.: Detecting high log-densities: an \( o(n^{-1/4})\) approximation for densest k-subgraph. In: Proceedings of STOC 2010, pp. 201–210 (2010)
Bloem, P., de Rooij, S.: Large-scale network motif analysis using compression. Data Min. Knowl. Disc. 34(5), 1421–1453 (2020). https://doi.org/10.1007/s10618-020-00691-y
Bollobás, B.: A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. Eur. J. Comb. 1(4), 311–316 (1980)
Boob, D., et al.: Flowless: extracting densest subgraphs without flow computations. In: Proceedings of TheWebConf 2020, pp. 573–583 (2020)
Chanpuriya, S., Musco, C., Sotiropoulos, K., Tsourakakis, C.: On the power of edge independent graph models. Adv. Neural Inf. Process. Syst. 34, 24418–24429 (2021)
Charikar, M.: Greedy approximation algorithms for finding dense components in a graph. In: Jansen, K., Khuller, S. (eds.) APPROX 2000. LNCS, vol. 1913, pp. 84–95. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44436-X_10
Chlamt’ač, E., Dinitz, M., Konrad, C., Kortsarz, G., Rabanca, G.: The densest k-subhypergraph problem. arXiv preprint arXiv:1605.04284 (2016)
Chung, F., Chung, F.R., Graham, F.C., Lu, L., Chung, K.F., et al.: Complex graphs and networks, no. 107, American Mathematical Society (2006)
Chung, F., Lu, L.: The average distances in random graphs with given expected degrees. PNAS 99(25), 15879–15882 (2002)
Cook, S.J., et al.: Whole-animal connectomes of both caenorhabditis elegans sexes. Nature 571(7763), 63–71 (2019)
Erdős, P., Rényi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5(1), 17–60 (1960)
Fosdick, B.K., Larremore, D.B., Nishimura, J., Ugander, J.: Configuring random graph models with fixed degree sequences. Siam Rev. 60(2), 315–355 (2018)
Gionis, A., Tsourakakis, C.E.: Dense subgraph discovery: KDD 2015 tutorial. In: Proceedings of KDD 2015, pp. 2313–2314 (2015)
Goldberg, A.V.: Finding a maximum density subgraph. University of California Berkeley, CA (1984)
Grochow, J.A., Kellis, M.: Network motif discovery using subgraph enumeration and symmetry-breaking. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS, vol. 4453, pp. 92–106. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71681-5_7
Hooi, B., Song, H.A., Beutel, A., Shah, N., Shin, K., Faloutsos, C.: Fraudar: bounding graph fraud in the face of camouflage. In: Proceedings of KDD 2016, pp. 895–904 (2016)
Kannan, R., Tetali, P., Vempala, S.: Simple markov-chain algorithms for generating bipartite graphs and tournaments. Random Struct. Algor. 14(4), 293–308 (1999)
King, O.D.: Comment on “subgraphs in random networks”. Phys. Rev. E 70(5), 058101 (2004)
Kumar, S., Hooi, B., Makhija, D., Kumar, M., Faloutsos, C., Subrahmanian, V.: Rev2: fraudulent user prediction in rating platforms. In: Proceedings of WSDM 2018, pp. 333–341. ACM (2018)
Kumar, S., Spezzano, F., Subrahmanian, V., Faloutsos, C.: Edge weight prediction in weighted signed networks. In: ICDM, pp. 221–230. IEEE (2016)
Lee, J.B., Rossi, R.A., Kong, X., Kim, S., Koh, E., Rao, A.: Graph convolutional networks with motif-based attention. In: Proceedings of CIKM 2019, pp. 499–508 (2019)
Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., Ghahramani, Z.: Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res (JMLR) 11, 985–1042 (2010)
Lin, B.: The parameterized complexity of the k-biclique problem. J. ACM (JACM) 65(5), 1–23 (2018)
Liu, S., Hooi, B., Faloutsos, C.: Holoscope: topology-and-spike aware fraud detection. In: Proceedings of CIKM 2017, pp. 1539–1548 (2017)
Mangan, S., Alon, U.: Structure and function of the feed-forward loop network motif. PNAS 100(21), 11980–11985 (2003)
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002). https://doi.org/10.1126/science.298.5594.824
Milo, R., et al.: Superfamilies of evolved and designed networks. Science 303(5663), 1538–1542 (2004). https://doi.org/10.1126/science.1089167
Mitzenmacher, M., Pachocki, J., Peng, R., Tsourakakis, C., Xu, S.C.: Scalable large near-clique detection in large-scale networks via sampling. In: Proceedings of KDD 2015, pp. 815–824. ACM (2015)
Noble, C.C., Cook, D.J.: Graph-based anomaly detection. In: Proceedings of KDD 2003, pp. 631–636 (2003)
Pachter, L.: Why i read the network nonsense papers. https://liorpachter.wordpress.com/2014/02/12/why-i-read-the-network-nonsense-papers/
Pandit, S., Chau, D.H., Wang, S., Faloutsos, C.: Netprobe: a fast and scalable system for fraud detection in online auction networks. In: WWW (2007)
Prakash, B.A., Sridharan, A., Seshadri, M., Machiraju, S., Faloutsos, C.: EigenSpokes: surprising patterns and scalable community chipping in large graphs. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS (LNAI), vol. 6119, pp. 435–448. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13672-6_42
Reigl, M., Alon, U., Chklovskii, D.B.: Search for computational modules in the c. elegans brain. BMC Biol. 2(1), 1–12 (2004)
Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization. In: AAAI (2015). https://networkrepository.com
Rozemberczki, B., Allen, C., Sarkar, R.: Multi-scale attributed node embedding (2019)
Rozemberczki, B., Sarkar, R.: Characteristic functions on graphs: birds of a feather, from statistical descriptors to parametric models. In: Proceedings of CIKM 2020, pp. 1325–1334 (2020)
Scheffer, L.K., et al.: A connectome analysis of the adult drosophila central brain. Elife 9, e57443 (2020)
Shen-Orr, S., Milo, R., Mangan, S., Alon, U.: Network motifs in the transcriptional regulation network of escherichia coli. Nat. Genet. 31, 64–8 (2002)
Shin, K., Eliassi-Rad, T., Faloutsos, C.: Corescope: graph mining using k-core analysis: patterns, anomalies and algorithms. In: ICDM 2016, pp. 469–478 (2016)
Spricer, K., Britton, T.: The configuration model for partially directed graphs. J. Stat. Phys. 161, 965–985 (2015)
Starnini, M., et al.: Smurf-based anti-money laundering in time-evolving transaction networks. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12978, pp. 171–186. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86514-6_11
Tsourakakis, C.: The k-clique densest subgraph problem. In: Proceedings of WWW 2015, pp. 1122–1132 (2015)
Tsourakakis, C.E., Chen, T., Kakimura, N., Pachocki, J.: Novel dense subgraph discovery primitives: risk aversion and exclusion queries. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11906, pp. 378–394. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46150-8_23
Tsourakakis, C.E., Pachocki, J., Mitzenmacher, M.: Scalable motif-aware graph clustering. In: Proceedings of WWW 2017, pp. 1451–1460 (2017)
Ugander, J., Backstrom, L., Kleinberg, J.: Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. In: Proceedings of WWW 2013, pp. 1307–1318 (2013)
Van Koevering, K., Benson, A., Kleinberg, J.: Random graphs with prescribed k-core sequences: a new null model for network analysis. In: Proceedings of TheWebConf 2021, pp. 367–378 (2021)
Wasserman, S., Faust, K., et al.: Social network analysis: methods and applications (1994)
Wernicke, S., Rasche, F.: Fanmod: a tool for fast network motif detection. Bioinformatics 22(9), 1152–1153 (2006)
Witvliet, D.E.A.: Connectomes across development reveal principles of brain maturation. Nature 596(7871), 257–261 (2021)
Yin, H., Benson, A.R., Leskovec, J., Gleich, D.F.: Local higher-order graph clustering. In: Proceedings of KDD 2017, pp. 555–564 (2017)
You, J., Ying, R., Ren, X., Hamilton, W.L., Leskovec, J.: Graphrnn: generating realistic graphs with deep auto-regressive models. In: ICML (2018)
Yu, H., et al.: High-quality binary protein interaction map of the yeast interactome network. Science (New York, N.Y.) 322, 104–110 (2008)
Zhang, X., Shao, S., Stanley, H., Havlin, S.: Dynamic motifs in socio-economic networks. EPL (Europhys. Lett.) 108, 58001 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, T., Matejek, B., Mitzenmacher, M., Tsourakakis, C.E. (2023). Algorithmic Tools for Understanding the Motif Structure of Networks. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13714. Springer, Cham. https://doi.org/10.1007/978-3-031-26390-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-26390-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26389-7
Online ISBN: 978-3-031-26390-3
eBook Packages: Computer ScienceComputer Science (R0)