research-article

Distributed large-scale natural graph factorization

Authors:

Nino Shervashidze,

Shravan Narayanamurthy,

Vanja Josifovski,

Alexander J. SmolaAuthors Info & Claims

WWW '13: Proceedings of the 22nd international conference on World Wide Web

Pages 37 - 48

https://doi.org/10.1145/2488388.2488393

Published: 13 May 2013 Publication History

Abstract

Natural graphs, such as social networks, email graphs, or instant messaging patterns, have become pervasive through the internet. These graphs are massive, often containing hundreds of millions of nodes and billions of edges. While some theoretical models have been proposed to study such graphs, their analysis is still difficult due to the scale and nature of the data.

We propose a framework for large-scale graph decomposition and inference. To resolve the scale, our framework is distributed so that the data are partitioned over a shared-nothing set of machines. We propose a novel factorization technique that relies on partitioning a graph so as to minimize the number of neighboring vertices rather than edges across partitions. Our decomposition is based on a streaming algorithm. It is network-aware as it adapts to the network topology of the underlying computational hardware. We use local copies of the variables and an efficient asynchronous communication protocol to synchronize the replicated values in order to perform most of the computation without having to incur the cost of network communication. On a graph of 200 million vertices and 10 billion edges, derived from an email communication network, our algorithm retains convergence properties while allowing for almost linear scalability in the number of computers.

References

[1]

A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. Smola. Scalable inference in latent variable models. In WSDM, 2012.

Digital Library

[2]

A. Ahmed, Y Low, M. Aly, V. Josifovski, and A. Smola. Scalable Distributed Inference of Dynamic User Interests for Behavioral Targeting. In KDD, 2011.

Digital Library

[3]

D. Aldous. Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis, 11(4):581--598, 1981.

[4]

R. Andersen, D. Gleich, and V. Mirrokni. Overlapping clusters for distributed computation. In WSDM, 2012.

Digital Library

[5]

K. Andreev and H. Racke. Balanced graph partitioning. In Parallelism in algorithms and architectures, pages 120--124, 2004.

Digital Library

[6]

D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 1995.

[7]

D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, second edition, 1999.

[8]

D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993--1022, 2003.

Digital Library

[9]

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1--123, 2010.

Digital Library

[10]

M. Charikar. Similarity estimation techniques from rounding algorithms. In ACM Tymposium on Theory of Computing, pages 380--388, 2002.

Digital Library

[11]

R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In KDD, 69--77, 2011.

Digital Library

[12]

G. H. Golub and C. F. Van Loan. Matrix Computations. John Hopkins Press, 1996.

[13]

T. Griffiths and Z. Ghahramani. Infinite latent feature models and the Indian Buffet Process. NIPS 18, 475--482, 2006.

[14]

J. Gonzalez, Y. Low, H. Gu, D. Bickson and C. Guestrin. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. OSDI, October, 2012.

Digital Library

[15]

G. Karypis and V. Kumar. MeTis: Unstrctured Graph Partitioning and Sparse Matrix Ordering System, Version 2.0, 1995.

[16]

Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30--37, 2009.

Digital Library

[17]

C. Liu, H.-C. Yang, J. Fan, L.-W. He, and Y.-M. Wang. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In WWW, 681--690, 2010.

Digital Library

[18]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence, 2010.

[19]

G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In ACM ICDM, 135--146. 2010.

Digital Library

[20]

D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models. Journal of Machine Learning Research, 10:1801--1828, 2009.

[21]

C. Olston, E. Bortnikov, K. Elmeleegy, F. Junqueira, and B. Reed. Interactive analysis of web-scale data. In CIDR, 2009.

[22]

B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, pages 693--701, 2011.

Digital Library

[23]

A. J. Smola and S. Narayanamurthy. An architecture for parallel topic models. In VLDB, 2010.

Digital Library

[24]

S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, 607--614. 2011.

Digital Library

[25]

K. Yu, S. Zhu, J. Lafferty, and Y. Gong. Fast nonparametric matrix factorization for large-scale collaborative filtering. In SIGIR, pages 211--218, 2009.

Digital Library

[26]

Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the Netflix prize. In Algorithmic Aspects in Information and Management, pages 337--348, 2008.

Digital Library

Cited By

Pan ZCai FLiu XChen H(2025)Distance-Aware Learning for Inductive Link Prediction on Temporal NetworksIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.332892436:1(978-990)Online publication date: Jan-2025
https://doi.org/10.1109/TNNLS.2023.3328924
Liu WHao HWang HZou ZXing WLiu WHao HWang HZou ZXing W(2025)IntroductionGraph Neural Network Methods and Applications in Scene Understanding10.1007/978-981-97-9933-6_1(1-24)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-981-97-9933-6_1
Lee SZhang QKondor RSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Sign rank limitations for inner product graph decodersProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693151(27118-27136)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693151
Show More Cited By

Index Terms

Distributed large-scale natural graph factorization

Recommendations

Factorizations of complete graphs into brooms

Let r and n be positive integers with r<2n. A broom of order 2n is the union of the path on P"2"n"-"r"-"1 and the star K"1","r, plus one edge joining the center of the star to an endpoint of the path. It was shown by Kubesa (2005) [10] that the broom ...
Finding a chain graph in a bipartite permutation graph

We present a polynomial-time algorithm for solving Subgraph Isomorphism where the base graphs are bipartite permutation graphs and the pattern graphs are chain graphs. Subgraph Isomorphism is studied on graph classes.A polynomial-time algorithm is given ...
L(2,1)-labeling of dually chordal graphs and strongly orderable graphs

An L(2,1)-labeling of a graph G=(V,E) is a function f:V(G)->{0,1,2,...} such that |f(u)-f(v)|>=2 whenever uv@__ __E(G) and |f(u)-f(v)|>=1 whenever u and v are at distance two apart. The span of an L(2,1)-labeling f of G, denoted as SP"2(f,G), is the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '13: Proceedings of the 22nd international conference on World Wide Web

May 2013

1628 pages

ISBN:9781450320351

DOI:10.1145/2488388

General Chairs:
Daniel Schwabe
PUC-Rio - Brazil
,
Virgílio Almeida
UFMG - Brazil
,
Hartmut Glaser
CGI.br - Brazil
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Labs - Spain & Chile
,
Sue Moon
KAIST - South Korea

Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
CGIBR: Comite Gestor da Internet no Brazil

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '13

Sponsor:

NICBR
CGIBR

WWW '13: 22nd International World Wide Web Conference

May 13 - 17, 2013

Rio de Janeiro, Brazil

Acceptance Rates

WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

459
Total Citations
View Citations
1,797
Total Downloads

Downloads (Last 12 months)99
Downloads (Last 6 weeks)8

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pan ZCai FLiu XChen H(2025)Distance-Aware Learning for Inductive Link Prediction on Temporal NetworksIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.332892436:1(978-990)Online publication date: Jan-2025
https://doi.org/10.1109/TNNLS.2023.3328924
Liu WHao HWang HZou ZXing WLiu WHao HWang HZou ZXing W(2025)IntroductionGraph Neural Network Methods and Applications in Scene Understanding10.1007/978-981-97-9933-6_1(1-24)Online publication date: 4-Jan-2025
https://doi.org/10.1007/978-981-97-9933-6_1
Lee SZhang QKondor RSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Sign rank limitations for inner product graph decodersProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693151(27118-27136)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693151
Gao ZDong DTan CXia JHu BLi SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)A graph is worth K wordsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692656(14681-14701)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692656
Opanin Gyamfi EQin ZMantebea Danso JAdu-Gyamfi D(2024)Hierarchical Graph Neural Network: A Lightweight Image Matching Model with Enhanced Message Passing of Local and Global Information in Hierarchical Graph Neural NetworksInformation10.3390/info1510060215:10(602)Online publication date: 30-Sep-2024
https://doi.org/10.3390/info15100602
YOU YTANG CLIU XZOU XLIU YJIANG LZHANG C(2024)High-order graph fusion for multi-viewclusteringSCIENTIA SINICA Informationis10.1360/SSI-2023-021754:9(2098)Online publication date: 10-Sep-2024
https://doi.org/10.1360/SSI-2023-0217
Alvarez-Mamani EDechant RBeltran-Castañón CIbáñez A(2024)Graph embedding on mass spectrometry- and sequencing-based biomedical dataBMC Bioinformatics10.1186/s12859-023-05612-625:1Online publication date: 2-Jan-2024
https://doi.org/10.1186/s12859-023-05612-6
Huynh TNguyen TNguyen TNguyen PYin HNguyen QNguyen T(2024)Certified Unlearning for Federated RecommendationACM Transactions on Information Systems10.1145/3706419Online publication date: 2-Dec-2024
https://doi.org/10.1145/3706419
Touahria Miliani MSadat SHaddad MSeba HAmrouche K(2024)Comparing Hyperbolic Graph Embedding models on Anomaly Detection for CybersecurityProceedings of the 19th International Conference on Availability, Reliability and Security10.1145/3664476.3670445(1-11)Online publication date: 30-Jul-2024
https://dl.acm.org/doi/10.1145/3664476.3670445
Yan YHu YZhou QLiu LZeng ZChen YPan MChen HDas MTong HChua TNgo CKa-Wei Lee RKumar RLauw H(2024)PaCEr: Network Embedding From Positional to StructuralProceedings of the ACM Web Conference 202410.1145/3589334.3645516(2485-2496)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645516
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten