Abstract
Social network analysis has become a major research area that has impact in diverse applications ranging from search engines to product recommendation systems. A major problem in implementing social network analysis algorithms is the sheer size of many social networks, for example, the Facebook graph has more than 900 million vertices and even small networks may have tens of millions of vertices. One solution to dealing with these large graphs is dimensionality reduction using spectral or SVD analysis of the adjacency matrix of the network, but these global techniques do not necessarily take into account local structures or clusters of the network that are critical in network analysis. A more promising approach is clustered low-rank approximation: instead of computing a global low-rank approximation, the adjacency matrix is first clustered, and then a low-rank approximation of each cluster (i.e., diagonal block) is computed. The resulting algorithm is challenging to parallelize not only because of the large size of the data sets in social network analysis, but also because it requires computing with very diverse data structures ranging from extremely sparse matrices to dense matrices. In this paper, we describe the first parallel implementation of a clustered low-rank approximation algorithm for large social network graphs, and use it to perform link prediction in parallel. Experimental results show that this implementation scales well on large distributed-memory machines; for example, on a Twitter graph with roughly 11 million vertices and 63 million edges, our implementation scales by a factor of 86 on 128 processes and takes less than 2300 seconds, while on a much larger Twitter graph with 41 million vertices and 1.2 billion edges, our implementation scales by a factor of 203 on 256 processes with a running time about 4800 seconds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Elemental, http://elemental.googlecode.com/hg/doc/build/html/core/matrix.html
GotoBLAS, http://www.tacc.utexas.edu/tacc-projects/gotoblas2/
Mahout, http://lucene.apache.org/mahout/
Ranger, http://services.tacc.utexas.edu/index.php/ranger-user-guide
SNAP - Stanford Network Analysis Package, http://snap.stanford.edu/snap/
Social Computing Data Repository, http://socialcomputing.asu.edu/datasets/Twitter
Abou-Rjeili, A., Karypis, G.: Multilevel algorithms for partitioning power-law graphs. In: IPDPS (2006)
Alpatov, P., Baker, G., Edwards, C., Gunnels, J., Morrow, G., Overfelt, J., van de Geijn, R., Wu, Y.-J.J.: Plapack: parallel linear algebra package design overview. In: Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, pp. 1–16. ACM (1997)
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)
Blackford, L.S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK user’s guide. Society for Industrial and Applied Mathematics, Philadelphia (1997)
Cong, G., Almasi, G., Saraswat, V.: Fast pgas connected components algorithms. In: Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, PGAS 2009 (2009)
Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press (1996)
Huang, Z.: Link prediction based on graph topology: The predictive value of the generalized clustering coefficient. In: Workshop on Link Analysis, KDD (2006)
Kang, U., Meeder, B., Faloutsos, C.: Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 13–25. Springer, Heidelberg (2011)
Karypis, G., Kumar, V.: A coarse-grain parallel formulation of multilevel k-way graph partitioning algorithm. In: Proceedings of SIAM International Conference on Parallel Processing for Scientific Computing (1997)
Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18, 39–43 (1953)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600. ACM, New York (2010)
Lehoucq, R., Sorensen, D., Yang, C.: Arpack Users’ Guide: Solution of Large Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia (1998)
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58(7), 1019–1031 (2007)
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)
Lu, Z., Savas, B., Tang, W., Dhillon, I.S.: Link prediction using multiple sources of information. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), pp. 923–928 (2010)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Savas, B., Dhillon, I.S.: Clustered low rank approximation of graphs in information science applications. In: SIAM Data Mining Conference, pp. 164–175 (2011)
Song, H.H., Savas, B., Cho, T.W., Dave, V., Lu, Z., Dhillon, I.S., Zhang, Y., Qiu, L.: Clustered embedding of massive social networks. In: SIGMETRICS (2012)
Sui, X., Nguyen, D., Burtscher, M., Pingali, K.: Parallel Graph Partitioning on Multicore Architectures. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds.) LCPC 2010. LNCS, vol. 6548, pp. 246–260. Springer, Heidelberg (2011)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Vasuki, V., Natarajan, N., Lu, Z., Savas, B., Dhillon, I.S.: Scalable affiliation recommendation using auxiliary networks. ACM Transactions on Intelligent Systems and Technology 3, 3:1–3:20 (2011)
Whang, J., Sui, X., Dhillon, I.: Scalable and memory-efficient clustering of large-scale social networks. In: Proceedings of the IEEE International Conference on Data Mining (2012)
Yoo, A., Chow, E., Henderson, K., McLendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on bluegene/l. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC 2005, pp. 25–43 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sui, X. et al. (2013). Parallel Clustered Low-Rank Approximation of Graphs and Its Application to Link Prediction. In: Kasahara, H., Kimura, K. (eds) Languages and Compilers for Parallel Computing. LCPC 2012. Lecture Notes in Computer Science, vol 7760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37658-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-37658-0_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37657-3
Online ISBN: 978-3-642-37658-0
eBook Packages: Computer ScienceComputer Science (R0)