Algorithmic Tools for Understanding the Motif Structure of Networks

Chen, Tianyi; Matejek, Brian; Mitzenmacher, Michael; Tsourakakis, Charalampos E.

doi:10.1007/978-3-031-26390-3_1

Tianyi Chen¹³,
Brian Matejek^14,15,
Michael Mitzenmacher¹⁴ &
…
Charalampos E. Tsourakakis^13,14,16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13714))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

965 Accesses

Abstract

Motifs are small subgraph patterns that play a key role towards understanding the structure and the function of biological and social networks. The current de facto approach towards assessing the statistical significance of a motif $\mathcal {M}$ relies on counting its occurrences across the network, and comparing that count to its expected count under some null generative model. This approach can be misleading due to combinatorial artifacts. That is, there may be a large count for a motif due to multiple copies sharing many vertices and edges connected to a subgraph, such as a clique, that completes the multiple copies of the motif.

In this work we introduce the novel concept of an (f, q)-spanning motif. A motif $\mathcal {M}$ is (f, q)-spanning if there exists a q-fraction of the nodes that induces an f-fraction of the occurrences of $\mathcal {M}$ in G. Intuitively, when f is close to 1, and q close to 0, most of the occurrences of $\mathcal {M}$ are localized in a small set of nodes, and thus its statistical significance is likely to be due to a combinatorial artifact. We propose efficient heuristics for finding the maximum f for a given q and minimum q for a given f for which a motif is (f, q)-spanning and evaluate them on real-world datasets. Our methods successfully identify combinatorial artifacts that otherwise go undetected using the standard approach for assessing statistical significance.

Finally, we leverage the motif structure of a network to design MotifScope, an algorithm that takes as input a graph and two motifs $\mathcal {M}_1, \mathcal {M}_2$, and finds subgraphs of the graph where $\mathcal {M}_1, \mathcal {M}_2$ occur infrequently and frequently respectively. We show that a good selection of $\mathcal {M}_1, \mathcal {M}_2$ allows us to find anomalies in large networks, including bipartite cliques in social graphs, and subgraphs rated with distrust in Bitcoin markets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Motif Finding Algorithms: A Performance Comparison

Motif detection speed up by using equations based on the degree sequence

Article 12 July 2016

Network Motif Search

Notes

1.
It is worth outlining that forcing $f=1$, and thus simplifying the definition above to a (1, q)- or just q-spanning motif is not a robust in the following sense. Consider a graph that is the union of a linear number of node disjoint triangles, and a clique of order $\sqrt{n}$. Each node in the graph participates in a triangle, and thus when $f=1$, then $q=1$. However, notice that most of the triangle occurrences appear in the small clique, i.e., $O({\sqrt{n}})^3)=O(n^{3/2})\gg O(n)$. Thus for $f=O(\frac{n^{3/2}}{n+n^{3/2}})=1-o(1)$, q suddenly becomes $O( \frac{\sqrt{n}}{n})=o(1)$. Similarly, a graph could have multiple distinct smaller combinatorial artifacts, in which case f might be a constant further from 1 (e.g., 3 small subgraphs with each around 1/3 of the motif copies).
2.
While it aims to solve Problem 2, with minor changes it becomes a heuristic for Problem 1.

References

Artzy-Randrup, Y., Fleishman, S.J., Ben-Tal, N., Stone, L.: Comment on “network motifs: simple building blocks of complex networks” and “superfamilies of evolved and designed networks". Science 305(5687), 1107–1107 (2004)
Article Google Scholar
Benson, A.R., Gleich, D.F., Leskovec, J.: Higher-order organization of complex networks. Science 353(6295), 163–166 (2016)
Article Google Scholar
Bhaskara, A., Charikar, M., Chlamtac, E., Feige, U., Vijayaraghavan, A.: Detecting high log-densities: an $ o(n^{-1/4})$ approximation for densest k-subgraph. In: Proceedings of STOC 2010, pp. 201–210 (2010)
Google Scholar
Bloem, P., de Rooij, S.: Large-scale network motif analysis using compression. Data Min. Knowl. Disc. 34(5), 1421–1453 (2020). https://doi.org/10.1007/s10618-020-00691-y
Article MathSciNet MATH Google Scholar
Bollobás, B.: A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. Eur. J. Comb. 1(4), 311–316 (1980)
Article MathSciNet MATH Google Scholar
Boob, D., et al.: Flowless: extracting densest subgraphs without flow computations. In: Proceedings of TheWebConf 2020, pp. 573–583 (2020)
Google Scholar
Chanpuriya, S., Musco, C., Sotiropoulos, K., Tsourakakis, C.: On the power of edge independent graph models. Adv. Neural Inf. Process. Syst. 34, 24418–24429 (2021)
Google Scholar
Charikar, M.: Greedy approximation algorithms for finding dense components in a graph. In: Jansen, K., Khuller, S. (eds.) APPROX 2000. LNCS, vol. 1913, pp. 84–95. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44436-X_10
Chapter MATH Google Scholar
Chlamt’ač, E., Dinitz, M., Konrad, C., Kortsarz, G., Rabanca, G.: The densest k-subhypergraph problem. arXiv preprint arXiv:1605.04284 (2016)
Chung, F., Chung, F.R., Graham, F.C., Lu, L., Chung, K.F., et al.: Complex graphs and networks, no. 107, American Mathematical Society (2006)
Google Scholar
Chung, F., Lu, L.: The average distances in random graphs with given expected degrees. PNAS 99(25), 15879–15882 (2002)
Article MathSciNet MATH Google Scholar
Cook, S.J., et al.: Whole-animal connectomes of both caenorhabditis elegans sexes. Nature 571(7763), 63–71 (2019)
Article Google Scholar
Erdős, P., Rényi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5(1), 17–60 (1960)
MathSciNet MATH Google Scholar
Fosdick, B.K., Larremore, D.B., Nishimura, J., Ugander, J.: Configuring random graph models with fixed degree sequences. Siam Rev. 60(2), 315–355 (2018)
Article MathSciNet MATH Google Scholar
Gionis, A., Tsourakakis, C.E.: Dense subgraph discovery: KDD 2015 tutorial. In: Proceedings of KDD 2015, pp. 2313–2314 (2015)
Google Scholar
Goldberg, A.V.: Finding a maximum density subgraph. University of California Berkeley, CA (1984)
Google Scholar
Grochow, J.A., Kellis, M.: Network motif discovery using subgraph enumeration and symmetry-breaking. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS, vol. 4453, pp. 92–106. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71681-5_7
Chapter Google Scholar
Hooi, B., Song, H.A., Beutel, A., Shah, N., Shin, K., Faloutsos, C.: Fraudar: bounding graph fraud in the face of camouflage. In: Proceedings of KDD 2016, pp. 895–904 (2016)
Google Scholar
Kannan, R., Tetali, P., Vempala, S.: Simple markov-chain algorithms for generating bipartite graphs and tournaments. Random Struct. Algor. 14(4), 293–308 (1999)
Article MathSciNet MATH Google Scholar
King, O.D.: Comment on “subgraphs in random networks”. Phys. Rev. E 70(5), 058101 (2004)
Google Scholar
Kumar, S., Hooi, B., Makhija, D., Kumar, M., Faloutsos, C., Subrahmanian, V.: Rev2: fraudulent user prediction in rating platforms. In: Proceedings of WSDM 2018, pp. 333–341. ACM (2018)
Google Scholar
Kumar, S., Spezzano, F., Subrahmanian, V., Faloutsos, C.: Edge weight prediction in weighted signed networks. In: ICDM, pp. 221–230. IEEE (2016)
Google Scholar
Lee, J.B., Rossi, R.A., Kong, X., Kim, S., Koh, E., Rao, A.: Graph convolutional networks with motif-based attention. In: Proceedings of CIKM 2019, pp. 499–508 (2019)
Google Scholar
Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., Ghahramani, Z.: Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res (JMLR) 11, 985–1042 (2010)
MathSciNet MATH Google Scholar
Lin, B.: The parameterized complexity of the k-biclique problem. J. ACM (JACM) 65(5), 1–23 (2018)
Article MathSciNet Google Scholar
Liu, S., Hooi, B., Faloutsos, C.: Holoscope: topology-and-spike aware fraud detection. In: Proceedings of CIKM 2017, pp. 1539–1548 (2017)
Google Scholar
Mangan, S., Alon, U.: Structure and function of the feed-forward loop network motif. PNAS 100(21), 11980–11985 (2003)
Article Google Scholar
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002). https://doi.org/10.1126/science.298.5594.824
Article Google Scholar
Milo, R., et al.: Superfamilies of evolved and designed networks. Science 303(5663), 1538–1542 (2004). https://doi.org/10.1126/science.1089167
Article Google Scholar
Mitzenmacher, M., Pachocki, J., Peng, R., Tsourakakis, C., Xu, S.C.: Scalable large near-clique detection in large-scale networks via sampling. In: Proceedings of KDD 2015, pp. 815–824. ACM (2015)
Google Scholar
Noble, C.C., Cook, D.J.: Graph-based anomaly detection. In: Proceedings of KDD 2003, pp. 631–636 (2003)
Google Scholar
Pachter, L.: Why i read the network nonsense papers. https://liorpachter.wordpress.com/2014/02/12/why-i-read-the-network-nonsense-papers/
Pandit, S., Chau, D.H., Wang, S., Faloutsos, C.: Netprobe: a fast and scalable system for fraud detection in online auction networks. In: WWW (2007)
Google Scholar
Prakash, B.A., Sridharan, A., Seshadri, M., Machiraju, S., Faloutsos, C.: EigenSpokes: surprising patterns and scalable community chipping in large graphs. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS (LNAI), vol. 6119, pp. 435–448. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13672-6_42
Chapter Google Scholar
Reigl, M., Alon, U., Chklovskii, D.B.: Search for computational modules in the c. elegans brain. BMC Biol. 2(1), 1–12 (2004)
Google Scholar
Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization. In: AAAI (2015). https://networkrepository.com
Rozemberczki, B., Allen, C., Sarkar, R.: Multi-scale attributed node embedding (2019)
Google Scholar
Rozemberczki, B., Sarkar, R.: Characteristic functions on graphs: birds of a feather, from statistical descriptors to parametric models. In: Proceedings of CIKM 2020, pp. 1325–1334 (2020)
Google Scholar
Scheffer, L.K., et al.: A connectome analysis of the adult drosophila central brain. Elife 9, e57443 (2020)
Google Scholar
Shen-Orr, S., Milo, R., Mangan, S., Alon, U.: Network motifs in the transcriptional regulation network of escherichia coli. Nat. Genet. 31, 64–8 (2002)
Google Scholar
Shin, K., Eliassi-Rad, T., Faloutsos, C.: Corescope: graph mining using k-core analysis: patterns, anomalies and algorithms. In: ICDM 2016, pp. 469–478 (2016)
Google Scholar
Spricer, K., Britton, T.: The configuration model for partially directed graphs. J. Stat. Phys. 161, 965–985 (2015)
Article MathSciNet MATH Google Scholar
Starnini, M., et al.: Smurf-based anti-money laundering in time-evolving transaction networks. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12978, pp. 171–186. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86514-6_11
Chapter Google Scholar
Tsourakakis, C.: The k-clique densest subgraph problem. In: Proceedings of WWW 2015, pp. 1122–1132 (2015)
Google Scholar
Tsourakakis, C.E., Chen, T., Kakimura, N., Pachocki, J.: Novel dense subgraph discovery primitives: risk aversion and exclusion queries. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11906, pp. 378–394. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46150-8_23
Chapter Google Scholar
Tsourakakis, C.E., Pachocki, J., Mitzenmacher, M.: Scalable motif-aware graph clustering. In: Proceedings of WWW 2017, pp. 1451–1460 (2017)
Google Scholar
Ugander, J., Backstrom, L., Kleinberg, J.: Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. In: Proceedings of WWW 2013, pp. 1307–1318 (2013)
Google Scholar
Van Koevering, K., Benson, A., Kleinberg, J.: Random graphs with prescribed k-core sequences: a new null model for network analysis. In: Proceedings of TheWebConf 2021, pp. 367–378 (2021)
Google Scholar
Wasserman, S., Faust, K., et al.: Social network analysis: methods and applications (1994)
Google Scholar
Wernicke, S., Rasche, F.: Fanmod: a tool for fast network motif detection. Bioinformatics 22(9), 1152–1153 (2006)
Article Google Scholar
Witvliet, D.E.A.: Connectomes across development reveal principles of brain maturation. Nature 596(7871), 257–261 (2021)
Article Google Scholar
Yin, H., Benson, A.R., Leskovec, J., Gleich, D.F.: Local higher-order graph clustering. In: Proceedings of KDD 2017, pp. 555–564 (2017)
Google Scholar
You, J., Ying, R., Ren, X., Hamilton, W.L., Leskovec, J.: Graphrnn: generating realistic graphs with deep auto-regressive models. In: ICML (2018)
Google Scholar
Yu, H., et al.: High-quality binary protein interaction map of the yeast interactome network. Science (New York, N.Y.) 322, 104–110 (2008)
Article Google Scholar
Zhang, X., Shao, S., Stanley, H., Havlin, S.: Dynamic motifs in socio-economic networks. EPL (Europhys. Lett.) 108, 58001 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Boston University, Boston, MA, USA
Tianyi Chen & Charalampos E. Tsourakakis
Harvard University, Cambridge, MA, USA
Brian Matejek, Michael Mitzenmacher & Charalampos E. Tsourakakis
Computer Science Laboratory, SRI International, Washington D.C., USA
Brian Matejek
ISI Foundation, Turin, Italy
Charalampos E. Tsourakakis

Authors

Tianyi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Brian Matejek
View author publications
You can also search for this author in PubMed Google Scholar
Michael Mitzenmacher
View author publications
You can also search for this author in PubMed Google Scholar
Charalampos E. Tsourakakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charalampos E. Tsourakakis .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d'Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 643 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, T., Matejek, B., Mitzenmacher, M., Tsourakakis, C.E. (2023). Algorithmic Tools for Understanding the Motif Structure of Networks. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13714. Springer, Cham. https://doi.org/10.1007/978-3-031-26390-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-26390-3_1
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26389-7
Online ISBN: 978-3-031-26390-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Algorithmic Tools for Understanding the Motif Structure of Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Motif Finding Algorithms: A Performance Comparison

Motif detection speed up by using equations based on the degree sequence

Network Motif Search

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 643 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Algorithmic Tools for Understanding the Motif Structure of Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Motif Finding Algorithms: A Performance Comparison

Motif detection speed up by using equations based on the degree sequence

Network Motif Search

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 643 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation