Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2488388.2488393acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Distributed large-scale natural graph factorization

Published: 13 May 2013 Publication History

Abstract

Natural graphs, such as social networks, email graphs, or instant messaging patterns, have become pervasive through the internet. These graphs are massive, often containing hundreds of millions of nodes and billions of edges. While some theoretical models have been proposed to study such graphs, their analysis is still difficult due to the scale and nature of the data.
We propose a framework for large-scale graph decomposition and inference. To resolve the scale, our framework is distributed so that the data are partitioned over a shared-nothing set of machines. We propose a novel factorization technique that relies on partitioning a graph so as to minimize the number of neighboring vertices rather than edges across partitions. Our decomposition is based on a streaming algorithm. It is network-aware as it adapts to the network topology of the underlying computational hardware. We use local copies of the variables and an efficient asynchronous communication protocol to synchronize the replicated values in order to perform most of the computation without having to incur the cost of network communication. On a graph of 200 million vertices and 10 billion edges, derived from an email communication network, our algorithm retains convergence properties while allowing for almost linear scalability in the number of computers.

References

[1]
A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. Smola. Scalable inference in latent variable models. In WSDM, 2012.
[2]
A. Ahmed, Y Low, M. Aly, V. Josifovski, and A. Smola. Scalable Distributed Inference of Dynamic User Interests for Behavioral Targeting. In KDD, 2011.
[3]
D. Aldous. Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis, 11(4):581--598, 1981.
[4]
R. Andersen, D. Gleich, and V. Mirrokni. Overlapping clusters for distributed computation. In WSDM, 2012.
[5]
K. Andreev and H. Racke. Balanced graph partitioning. In Parallelism in algorithms and architectures, pages 120--124, 2004.
[6]
D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 1995.
[7]
D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, second edition, 1999.
[8]
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993--1022, 2003.
[9]
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1--123, 2010.
[10]
M. Charikar. Similarity estimation techniques from rounding algorithms. In ACM Tymposium on Theory of Computing, pages 380--388, 2002.
[11]
R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In KDD, 69--77, 2011.
[12]
G. H. Golub and C. F. Van Loan. Matrix Computations. John Hopkins Press, 1996.
[13]
T. Griffiths and Z. Ghahramani. Infinite latent feature models and the Indian Buffet Process. NIPS 18, 475--482, 2006.
[14]
J. Gonzalez, Y. Low, H. Gu, D. Bickson and C. Guestrin. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. OSDI, October, 2012.
[15]
G. Karypis and V. Kumar. MeTis: Unstrctured Graph Partitioning and Sparse Matrix Ordering System, Version 2.0, 1995.
[16]
Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30--37, 2009.
[17]
C. Liu, H.-C. Yang, J. Fan, L.-W. He, and Y.-M. Wang. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In WWW, 681--690, 2010.
[18]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence, 2010.
[19]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In ACM ICDM, 135--146. 2010.
[20]
D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models. Journal of Machine Learning Research, 10:1801--1828, 2009.
[21]
C. Olston, E. Bortnikov, K. Elmeleegy, F. Junqueira, and B. Reed. Interactive analysis of web-scale data. In CIDR, 2009.
[22]
B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, pages 693--701, 2011.
[23]
A. J. Smola and S. Narayanamurthy. An architecture for parallel topic models. In VLDB, 2010.
[24]
S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, 607--614. 2011.
[25]
K. Yu, S. Zhu, J. Lafferty, and Y. Gong. Fast nonparametric matrix factorization for large-scale collaborative filtering. In SIGIR, pages 211--218, 2009.
[26]
Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the Netflix prize. In Algorithmic Aspects in Information and Management, pages 337--348, 2008.

Cited By

View all
  • (2025)Distance-Aware Learning for Inductive Link Prediction on Temporal NetworksIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.332892436:1(978-990)Online publication date: Jan-2025
  • (2025)IntroductionGraph Neural Network Methods and Applications in Scene Understanding10.1007/978-981-97-9933-6_1(1-24)Online publication date: 4-Jan-2025
  • (2024)Sign rank limitations for inner product graph decodersProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693151(27118-27136)Online publication date: 21-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '13: Proceedings of the 22nd international conference on World Wide Web
May 2013
1628 pages
ISBN:9781450320351
DOI:10.1145/2488388

Sponsors

  • NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
  • CGIBR: Comite Gestor da Internet no Brazil

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. asynchronous algorithms
  2. distributed optimization
  3. graph algorithms
  4. graph factorization
  5. large-scale machine learning
  6. matrix factorization

Qualifiers

  • Research-article

Conference

WWW '13
Sponsor:
  • NICBR
  • CGIBR
WWW '13: 22nd International World Wide Web Conference
May 13 - 17, 2013
Rio de Janeiro, Brazil

Acceptance Rates

WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)99
  • Downloads (Last 6 weeks)8
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Distance-Aware Learning for Inductive Link Prediction on Temporal NetworksIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.332892436:1(978-990)Online publication date: Jan-2025
  • (2025)IntroductionGraph Neural Network Methods and Applications in Scene Understanding10.1007/978-981-97-9933-6_1(1-24)Online publication date: 4-Jan-2025
  • (2024)Sign rank limitations for inner product graph decodersProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693151(27118-27136)Online publication date: 21-Jul-2024
  • (2024)A graph is worth K wordsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692656(14681-14701)Online publication date: 21-Jul-2024
  • (2024)Hierarchical Graph Neural Network: A Lightweight Image Matching Model with Enhanced Message Passing of Local and Global Information in Hierarchical Graph Neural NetworksInformation10.3390/info1510060215:10(602)Online publication date: 30-Sep-2024
  • (2024)High-order graph fusion for multi-viewclusteringSCIENTIA SINICA Informationis10.1360/SSI-2023-021754:9(2098)Online publication date: 10-Sep-2024
  • (2024)Graph embedding on mass spectrometry- and sequencing-based biomedical dataBMC Bioinformatics10.1186/s12859-023-05612-625:1Online publication date: 2-Jan-2024
  • (2024)Certified Unlearning for Federated RecommendationACM Transactions on Information Systems10.1145/3706419Online publication date: 2-Dec-2024
  • (2024)Comparing Hyperbolic Graph Embedding models on Anomaly Detection for CybersecurityProceedings of the 19th International Conference on Availability, Reliability and Security10.1145/3664476.3670445(1-11)Online publication date: 30-Jul-2024
  • (2024)PaCEr: Network Embedding From Positional to StructuralProceedings of the ACM Web Conference 202410.1145/3589334.3645516(2485-2496)Online publication date: 13-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media