Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Algorithmic Tools for Understanding the Motif Structure of Networks

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13714))

  • 965 Accesses

Abstract

Motifs are small subgraph patterns that play a key role towards understanding the structure and the function of biological and social networks. The current de facto approach towards assessing the statistical significance of a motif \(\mathcal {M}\) relies on counting its occurrences across the network, and comparing that count to its expected count under some null generative model. This approach can be misleading due to combinatorial artifacts. That is, there may be a large count for a motif due to multiple copies sharing many vertices and edges connected to a subgraph, such as a clique, that completes the multiple copies of the motif.

In this work we introduce the novel concept of an (fq)-spanning motif. A motif \(\mathcal {M}\) is (fq)-spanning if there exists a q-fraction of the nodes that induces an f-fraction of the occurrences of \(\mathcal {M}\) in G. Intuitively, when f is close to 1, and q close to 0, most of the occurrences of \(\mathcal {M}\) are localized in a small set of nodes, and thus its statistical significance is likely to be due to a combinatorial artifact. We propose efficient heuristics for finding the maximum f for a given q and minimum q for a given f for which a motif is (fq)-spanning and evaluate them on real-world datasets. Our methods successfully identify combinatorial artifacts that otherwise go undetected using the standard approach for assessing statistical significance.

Finally, we leverage the motif structure of a network to design MotifScope, an algorithm that takes as input a graph and two motifs \(\mathcal {M}_1, \mathcal {M}_2\), and finds subgraphs of the graph where \(\mathcal {M}_1, \mathcal {M}_2\) occur infrequently and frequently respectively. We show that a good selection of \(\mathcal {M}_1, \mathcal {M}_2\) allows us to find anomalies in large networks, including bipartite cliques in social graphs, and subgraphs rated with distrust in Bitcoin markets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    It is worth outlining that forcing \(f=1\), and thus simplifying the definition above to a (1, q)- or just q-spanning motif is not a robust in the following sense. Consider a graph that is the union of a linear number of node disjoint triangles, and a clique of order \(\sqrt{n}\). Each node in the graph participates in a triangle, and thus when \(f=1\), then \(q=1\). However, notice that most of the triangle occurrences appear in the small clique, i.e., \(O({\sqrt{n}})^3)=O(n^{3/2})\gg O(n)\). Thus for \(f=O(\frac{n^{3/2}}{n+n^{3/2}})=1-o(1)\), q suddenly becomes \(O( \frac{\sqrt{n}}{n})=o(1)\). Similarly, a graph could have multiple distinct smaller combinatorial artifacts, in which case f might be a constant further from 1 (e.g., 3 small subgraphs with each around 1/3 of the motif copies).

  2. 2.

    While it aims to solve Problem 2, with minor changes it becomes a heuristic for Problem 1.

References

  1. Artzy-Randrup, Y., Fleishman, S.J., Ben-Tal, N., Stone, L.: Comment on “network motifs: simple building blocks of complex networks” and “superfamilies of evolved and designed networks". Science 305(5687), 1107–1107 (2004)

    Article  Google Scholar 

  2. Benson, A.R., Gleich, D.F., Leskovec, J.: Higher-order organization of complex networks. Science 353(6295), 163–166 (2016)

    Article  Google Scholar 

  3. Bhaskara, A., Charikar, M., Chlamtac, E., Feige, U., Vijayaraghavan, A.: Detecting high log-densities: an \( o(n^{-1/4})\) approximation for densest k-subgraph. In: Proceedings of STOC 2010, pp. 201–210 (2010)

    Google Scholar 

  4. Bloem, P., de Rooij, S.: Large-scale network motif analysis using compression. Data Min. Knowl. Disc. 34(5), 1421–1453 (2020). https://doi.org/10.1007/s10618-020-00691-y

    Article  MathSciNet  MATH  Google Scholar 

  5. Bollobás, B.: A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. Eur. J. Comb. 1(4), 311–316 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  6. Boob, D., et al.: Flowless: extracting densest subgraphs without flow computations. In: Proceedings of TheWebConf 2020, pp. 573–583 (2020)

    Google Scholar 

  7. Chanpuriya, S., Musco, C., Sotiropoulos, K., Tsourakakis, C.: On the power of edge independent graph models. Adv. Neural Inf. Process. Syst. 34, 24418–24429 (2021)

    Google Scholar 

  8. Charikar, M.: Greedy approximation algorithms for finding dense components in a graph. In: Jansen, K., Khuller, S. (eds.) APPROX 2000. LNCS, vol. 1913, pp. 84–95. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44436-X_10

    Chapter  MATH  Google Scholar 

  9. Chlamt’ač, E., Dinitz, M., Konrad, C., Kortsarz, G., Rabanca, G.: The densest k-subhypergraph problem. arXiv preprint arXiv:1605.04284 (2016)

  10. Chung, F., Chung, F.R., Graham, F.C., Lu, L., Chung, K.F., et al.: Complex graphs and networks, no. 107, American Mathematical Society (2006)

    Google Scholar 

  11. Chung, F., Lu, L.: The average distances in random graphs with given expected degrees. PNAS 99(25), 15879–15882 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  12. Cook, S.J., et al.: Whole-animal connectomes of both caenorhabditis elegans sexes. Nature 571(7763), 63–71 (2019)

    Article  Google Scholar 

  13. Erdős, P., Rényi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5(1), 17–60 (1960)

    MathSciNet  MATH  Google Scholar 

  14. Fosdick, B.K., Larremore, D.B., Nishimura, J., Ugander, J.: Configuring random graph models with fixed degree sequences. Siam Rev. 60(2), 315–355 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  15. Gionis, A., Tsourakakis, C.E.: Dense subgraph discovery: KDD 2015 tutorial. In: Proceedings of KDD 2015, pp. 2313–2314 (2015)

    Google Scholar 

  16. Goldberg, A.V.: Finding a maximum density subgraph. University of California Berkeley, CA (1984)

    Google Scholar 

  17. Grochow, J.A., Kellis, M.: Network motif discovery using subgraph enumeration and symmetry-breaking. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS, vol. 4453, pp. 92–106. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71681-5_7

    Chapter  Google Scholar 

  18. Hooi, B., Song, H.A., Beutel, A., Shah, N., Shin, K., Faloutsos, C.: Fraudar: bounding graph fraud in the face of camouflage. In: Proceedings of KDD 2016, pp. 895–904 (2016)

    Google Scholar 

  19. Kannan, R., Tetali, P., Vempala, S.: Simple markov-chain algorithms for generating bipartite graphs and tournaments. Random Struct. Algor. 14(4), 293–308 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  20. King, O.D.: Comment on “subgraphs in random networks”. Phys. Rev. E 70(5), 058101 (2004)

    Google Scholar 

  21. Kumar, S., Hooi, B., Makhija, D., Kumar, M., Faloutsos, C., Subrahmanian, V.: Rev2: fraudulent user prediction in rating platforms. In: Proceedings of WSDM 2018, pp. 333–341. ACM (2018)

    Google Scholar 

  22. Kumar, S., Spezzano, F., Subrahmanian, V., Faloutsos, C.: Edge weight prediction in weighted signed networks. In: ICDM, pp. 221–230. IEEE (2016)

    Google Scholar 

  23. Lee, J.B., Rossi, R.A., Kong, X., Kim, S., Koh, E., Rao, A.: Graph convolutional networks with motif-based attention. In: Proceedings of CIKM 2019, pp. 499–508 (2019)

    Google Scholar 

  24. Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., Ghahramani, Z.: Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res (JMLR) 11, 985–1042 (2010)

    MathSciNet  MATH  Google Scholar 

  25. Lin, B.: The parameterized complexity of the k-biclique problem. J. ACM (JACM) 65(5), 1–23 (2018)

    Article  MathSciNet  Google Scholar 

  26. Liu, S., Hooi, B., Faloutsos, C.: Holoscope: topology-and-spike aware fraud detection. In: Proceedings of CIKM 2017, pp. 1539–1548 (2017)

    Google Scholar 

  27. Mangan, S., Alon, U.: Structure and function of the feed-forward loop network motif. PNAS 100(21), 11980–11985 (2003)

    Article  Google Scholar 

  28. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002). https://doi.org/10.1126/science.298.5594.824

    Article  Google Scholar 

  29. Milo, R., et al.: Superfamilies of evolved and designed networks. Science 303(5663), 1538–1542 (2004). https://doi.org/10.1126/science.1089167

    Article  Google Scholar 

  30. Mitzenmacher, M., Pachocki, J., Peng, R., Tsourakakis, C., Xu, S.C.: Scalable large near-clique detection in large-scale networks via sampling. In: Proceedings of KDD 2015, pp. 815–824. ACM (2015)

    Google Scholar 

  31. Noble, C.C., Cook, D.J.: Graph-based anomaly detection. In: Proceedings of KDD 2003, pp. 631–636 (2003)

    Google Scholar 

  32. Pachter, L.: Why i read the network nonsense papers. https://liorpachter.wordpress.com/2014/02/12/why-i-read-the-network-nonsense-papers/

  33. Pandit, S., Chau, D.H., Wang, S., Faloutsos, C.: Netprobe: a fast and scalable system for fraud detection in online auction networks. In: WWW (2007)

    Google Scholar 

  34. Prakash, B.A., Sridharan, A., Seshadri, M., Machiraju, S., Faloutsos, C.: EigenSpokes: surprising patterns and scalable community chipping in large graphs. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS (LNAI), vol. 6119, pp. 435–448. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13672-6_42

    Chapter  Google Scholar 

  35. Reigl, M., Alon, U., Chklovskii, D.B.: Search for computational modules in the c. elegans brain. BMC Biol. 2(1), 1–12 (2004)

    Google Scholar 

  36. Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization. In: AAAI (2015). https://networkrepository.com

  37. Rozemberczki, B., Allen, C., Sarkar, R.: Multi-scale attributed node embedding (2019)

    Google Scholar 

  38. Rozemberczki, B., Sarkar, R.: Characteristic functions on graphs: birds of a feather, from statistical descriptors to parametric models. In: Proceedings of CIKM 2020, pp. 1325–1334 (2020)

    Google Scholar 

  39. Scheffer, L.K., et al.: A connectome analysis of the adult drosophila central brain. Elife 9, e57443 (2020)

    Google Scholar 

  40. Shen-Orr, S., Milo, R., Mangan, S., Alon, U.: Network motifs in the transcriptional regulation network of escherichia coli. Nat. Genet. 31, 64–8 (2002)

    Google Scholar 

  41. Shin, K., Eliassi-Rad, T., Faloutsos, C.: Corescope: graph mining using k-core analysis: patterns, anomalies and algorithms. In: ICDM 2016, pp. 469–478 (2016)

    Google Scholar 

  42. Spricer, K., Britton, T.: The configuration model for partially directed graphs. J. Stat. Phys. 161, 965–985 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  43. Starnini, M., et al.: Smurf-based anti-money laundering in time-evolving transaction networks. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12978, pp. 171–186. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86514-6_11

    Chapter  Google Scholar 

  44. Tsourakakis, C.: The k-clique densest subgraph problem. In: Proceedings of WWW 2015, pp. 1122–1132 (2015)

    Google Scholar 

  45. Tsourakakis, C.E., Chen, T., Kakimura, N., Pachocki, J.: Novel dense subgraph discovery primitives: risk aversion and exclusion queries. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11906, pp. 378–394. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46150-8_23

    Chapter  Google Scholar 

  46. Tsourakakis, C.E., Pachocki, J., Mitzenmacher, M.: Scalable motif-aware graph clustering. In: Proceedings of WWW 2017, pp. 1451–1460 (2017)

    Google Scholar 

  47. Ugander, J., Backstrom, L., Kleinberg, J.: Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. In: Proceedings of WWW 2013, pp. 1307–1318 (2013)

    Google Scholar 

  48. Van Koevering, K., Benson, A., Kleinberg, J.: Random graphs with prescribed k-core sequences: a new null model for network analysis. In: Proceedings of TheWebConf 2021, pp. 367–378 (2021)

    Google Scholar 

  49. Wasserman, S., Faust, K., et al.: Social network analysis: methods and applications (1994)

    Google Scholar 

  50. Wernicke, S., Rasche, F.: Fanmod: a tool for fast network motif detection. Bioinformatics 22(9), 1152–1153 (2006)

    Article  Google Scholar 

  51. Witvliet, D.E.A.: Connectomes across development reveal principles of brain maturation. Nature 596(7871), 257–261 (2021)

    Article  Google Scholar 

  52. Yin, H., Benson, A.R., Leskovec, J., Gleich, D.F.: Local higher-order graph clustering. In: Proceedings of KDD 2017, pp. 555–564 (2017)

    Google Scholar 

  53. You, J., Ying, R., Ren, X., Hamilton, W.L., Leskovec, J.: Graphrnn: generating realistic graphs with deep auto-regressive models. In: ICML (2018)

    Google Scholar 

  54. Yu, H., et al.: High-quality binary protein interaction map of the yeast interactome network. Science (New York, N.Y.) 322, 104–110 (2008)

    Article  Google Scholar 

  55. Zhang, X., Shao, S., Stanley, H., Havlin, S.: Dynamic motifs in socio-economic networks. EPL (Europhys. Lett.) 108, 58001 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charalampos E. Tsourakakis .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 643 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, T., Matejek, B., Mitzenmacher, M., Tsourakakis, C.E. (2023). Algorithmic Tools for Understanding the Motif Structure of Networks. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13714. Springer, Cham. https://doi.org/10.1007/978-3-031-26390-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26390-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26389-7

  • Online ISBN: 978-3-031-26390-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics