Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Parallel Clustered Low-Rank Approximation of Graphs and Its Application to Link Prediction

  • Conference paper
Languages and Compilers for Parallel Computing (LCPC 2012)

Abstract

Social network analysis has become a major research area that has impact in diverse applications ranging from search engines to product recommendation systems. A major problem in implementing social network analysis algorithms is the sheer size of many social networks, for example, the Facebook graph has more than 900 million vertices and even small networks may have tens of millions of vertices. One solution to dealing with these large graphs is dimensionality reduction using spectral or SVD analysis of the adjacency matrix of the network, but these global techniques do not necessarily take into account local structures or clusters of the network that are critical in network analysis. A more promising approach is clustered low-rank approximation: instead of computing a global low-rank approximation, the adjacency matrix is first clustered, and then a low-rank approximation of each cluster (i.e., diagonal block) is computed. The resulting algorithm is challenging to parallelize not only because of the large size of the data sets in social network analysis, but also because it requires computing with very diverse data structures ranging from extremely sparse matrices to dense matrices. In this paper, we describe the first parallel implementation of a clustered low-rank approximation algorithm for large social network graphs, and use it to perform link prediction in parallel. Experimental results show that this implementation scales well on large distributed-memory machines; for example, on a Twitter graph with roughly 11 million vertices and 63 million edges, our implementation scales by a factor of 86 on 128 processes and takes less than 2300 seconds, while on a much larger Twitter graph with 41 million vertices and 1.2 billion edges, our implementation scales by a factor of 203 on 256 processes with a running time about 4800 seconds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. ARPACK++, http://www.ime.unicamp.br/~chico/arpack++/

  2. Elemental, http://elemental.googlecode.com/hg/doc/build/html/core/matrix.html

  3. GotoBLAS, http://www.tacc.utexas.edu/tacc-projects/gotoblas2/

  4. Mahout, http://lucene.apache.org/mahout/

  5. Ranger, http://services.tacc.utexas.edu/index.php/ranger-user-guide

  6. SNAP - Stanford Network Analysis Package, http://snap.stanford.edu/snap/

  7. Social Computing Data Repository, http://socialcomputing.asu.edu/datasets/Twitter

  8. Abou-Rjeili, A., Karypis, G.: Multilevel algorithms for partitioning power-law graphs. In: IPDPS (2006)

    Google Scholar 

  9. Alpatov, P., Baker, G., Edwards, C., Gunnels, J., Morrow, G., Overfelt, J., van de Geijn, R., Wu, Y.-J.J.: Plapack: parallel linear algebra package design overview. In: Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, pp. 1–16. ACM (1997)

    Google Scholar 

  10. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)

    Google Scholar 

  11. Blackford, L.S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK user’s guide. Society for Industrial and Applied Mathematics, Philadelphia (1997)

    Book  Google Scholar 

  12. Cong, G., Almasi, G., Saraswat, V.: Fast pgas connected components algorithms. In: Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, PGAS 2009 (2009)

    Google Scholar 

  13. Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)

    Article  Google Scholar 

  14. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press (1996)

    Google Scholar 

  15. Huang, Z.: Link prediction based on graph topology: The predictive value of the generalized clustering coefficient. In: Workshop on Link Analysis, KDD (2006)

    Google Scholar 

  16. Kang, U., Meeder, B., Faloutsos, C.: Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 13–25. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Karypis, G., Kumar, V.: A coarse-grain parallel formulation of multilevel k-way graph partitioning algorithm. In: Proceedings of SIAM International Conference on Parallel Processing for Scientific Computing (1997)

    Google Scholar 

  18. Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18, 39–43 (1953)

    Article  Google Scholar 

  19. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600. ACM, New York (2010)

    Google Scholar 

  20. Lehoucq, R., Sorensen, D., Yang, C.: Arpack Users’ Guide: Solution of Large Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia (1998)

    Book  Google Scholar 

  21. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58(7), 1019–1031 (2007)

    Article  Google Scholar 

  22. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)

    Article  Google Scholar 

  23. Lu, Z., Savas, B., Tang, W., Dhillon, I.S.: Link prediction using multiple sources of information. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), pp. 923–928 (2010)

    Google Scholar 

  24. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)

    Article  Google Scholar 

  25. Savas, B., Dhillon, I.S.: Clustered low rank approximation of graphs in information science applications. In: SIAM Data Mining Conference, pp. 164–175 (2011)

    Google Scholar 

  26. Song, H.H., Savas, B., Cho, T.W., Dave, V., Lu, Z., Dhillon, I.S., Zhang, Y., Qiu, L.: Clustered embedding of massive social networks. In: SIGMETRICS (2012)

    Google Scholar 

  27. Sui, X., Nguyen, D., Burtscher, M., Pingali, K.: Parallel Graph Partitioning on Multicore Architectures. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds.) LCPC 2010. LNCS, vol. 6548, pp. 246–260. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  28. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

  29. Vasuki, V., Natarajan, N., Lu, Z., Savas, B., Dhillon, I.S.: Scalable affiliation recommendation using auxiliary networks. ACM Transactions on Intelligent Systems and Technology 3, 3:1–3:20 (2011)

    Google Scholar 

  30. Whang, J., Sui, X., Dhillon, I.: Scalable and memory-efficient clustering of large-scale social networks. In: Proceedings of the IEEE International Conference on Data Mining (2012)

    Google Scholar 

  31. Yoo, A., Chow, E., Henderson, K., McLendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on bluegene/l. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC 2005, pp. 25–43 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sui, X. et al. (2013). Parallel Clustered Low-Rank Approximation of Graphs and Its Application to Link Prediction. In: Kasahara, H., Kimura, K. (eds) Languages and Compilers for Parallel Computing. LCPC 2012. Lecture Notes in Computer Science, vol 7760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37658-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37658-0_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37657-3

  • Online ISBN: 978-3-642-37658-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics