Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2488388.2488433acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

WTF: the who to follow service at Twitter

Published: 13 May 2013 Publication History

Abstract

WTF ("Who to Follow") is Twitter's user recommendation service, which is responsible for creating millions of connections daily between users based on shared interests, common connections, and other related factors. This paper provides an architectural overview and shares lessons we learned in building and running the service over the past few years. Particularly noteworthy was our design decision to process the entire Twitter graph in memory on a single server, which significantly reduced architectural complexity and allowed us to develop and deploy the service in only a few months. At the core of our architecture is Cassovary, an open-source in-memory graph processing engine we built from scratch for WTF. Besides powering Twitter's user recommendations, Cassovary is also used for search, discovery, promoted products, and other services as well. We describe and evaluate a few graph recommendation algorithms implemented in Cassovary, including a novel approach based on a combination of random walks and SALSA. Looking into the future, we revisit the design of our architecture and comment on its limitations, which are presently being addressed in a second-generation system under development.

References

[1]
L. Backstrom and J. Leskovec. Supervised random walks: Predicting and recommending links in social networks. WSDM, 2011.
[2]
B. Bahmani, A. Chowdhury, and A. Goel. Fast incremental and personalized PageRank. VLDB, 2010.
[3]
Y. Bu, B. Howe, M. Balazinska, and M. Ernst. HaLoop: Efficient iterative data processing on large clusters. VLDB, 2010.
[4]
R. Chen, M. Yang, X. Weng, B. Choi, B. He, and X. Li. Improving large graph processing on partitioned graphs in the cloud. SoCC, 2012.
[5]
F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. KDD, 2009.
[6]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. OSDI, 2004.
[7]
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: A runtime for iterative MapReduce. MAPREDUCE Workshop, 2010.
[8]
D. Fogaras, B. Rácz, K. Csalogány, and T. Sarlós. Towards scaling fully personalized PageRank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.
[9]
A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanamurthy, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a high-level dataflow system on top of MapReduce: The Pig experience. VLDB, 2009.
[10]
M. Hasan and M. Zaki. A survey of link prediction in social networks. In C. Aggarwal, editor, Social Network Data Analytics. Springer, 2011.
[11]
J. Huang, D. Abadi, and K. Ren. Scalable SPARQL querying of large RDF graphs. VLDB, 2011.
[12]
G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48:96--129, 1998.
[13]
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.
[14]
R. Kohavi, R. Henne, and D. Sommerfield. Practical guide to controlled experiments on the web: Listen to your customers not to the HiPPO. SIGKDD, 2007.
[15]
G. Lee, J. Lin, C. Liu, A. Lorek, and D. Ryaboy. The unified logging infrastructure for data analytics at Twitter. VLDB, 2012.
[16]
F. Leibert, J. Mannix, J. Lin, and B. Hamadani. Automatic management of partitioned, replicated search services. SoCC, 2011.
[17]
R. Lempel and S. Moran. SALSA: The Stochastic Approach for Link-Structure Analysis. ACM Transactions on Information Systems, 19(2):131--160, 2001.
[18]
J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29--123, 2009.
[19]
J. Lin. MapReduce is good enough? If all you have is a hammer, throw away everything that's not a nail! arXiv:1209.2191, 2012.
[20]
J. Lin and C. Dyer. Data-Intensive Text Processing with MapReduce. Morgan & Claypool Publishers, 2010.
[21]
J. Lin and A. Kolcz. Large-scale machine learning at Twitter. SIGMOD, 2012.
[22]
J. Lin and M. Schatz. Design patterns for efficient graph algorithms in MapReduce. MLG Workshop, 2010.
[23]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. Hellerstein. GraphLab: A new parallel framework for machine learning. UAI, 2010.
[24]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. Hellerstein. Distributed GraphLab: A framework for machine learning and data mining in the cloud. VLDB, 2012.
[25]
G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. SIGMOD, 2010.
[26]
J. Mondal and A. Deshpande. Managing large dynamic graphs efficiently. SIGMOD, 2012.
[27]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. SIGMOD, 2008.
[28]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford University, 1999.
[29]
L. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103--111, 1990.
[30]
R. Zadeh and A. Goel. Dimension independent similarity computation. arXiv:1206.2082, 2012.
[31]
Y. Zhang, Q. Gao, L. Gao, and C. Wang. PrIter: A distributed framework for prioritized iterative computations. SoCC, 2011.

Cited By

View all
  • (2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: Jan-2025
  • (2024)Revisiting Local PageRank Estimation on Undirected Graphs: Simple and OptimalProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671820(3036-3044)Online publication date: 25-Aug-2024
  • (2024)Topology-monitorable Contrastive Learning on Dynamic GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671777(4700-4711)Online publication date: 25-Aug-2024
  • Show More Cited By

Index Terms

  1. WTF: the who to follow service at Twitter

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '13: Proceedings of the 22nd international conference on World Wide Web
    May 2013
    1628 pages
    ISBN:9781450320351
    DOI:10.1145/2488388

    Sponsors

    • NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
    • CGIBR: Comite Gestor da Internet no Brazil

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graph processing
    2. hadoop
    3. link prediction

    Qualifiers

    • Research-article

    Conference

    WWW '13
    Sponsor:
    • NICBR
    • CGIBR
    WWW '13: 22nd International World Wide Web Conference
    May 13 - 17, 2013
    Rio de Janeiro, Brazil

    Acceptance Rates

    WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)117
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: Jan-2025
    • (2024)Revisiting Local PageRank Estimation on Undirected Graphs: Simple and OptimalProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671820(3036-3044)Online publication date: 25-Aug-2024
    • (2024)Topology-monitorable Contrastive Learning on Dynamic GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671777(4700-4711)Online publication date: 25-Aug-2024
    • (2024)Personalized PageRanks over Dynamic Graphs - The Case for Optimizing Quality of Service2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00038(409-422)Online publication date: 13-May-2024
    • (2024)The Property Graph Data Format (PGDF)IEEE Access10.1109/ACCESS.2024.348568512(159267-159279)Online publication date: 2024
    • (2024)Randomized SearchRank: A semiclassical approach to a quantum search enginePhysical Review Research10.1103/PhysRevResearch.6.0430146:4Online publication date: 4-Oct-2024
    • (2024)Modeling disinformation networks on Twitter: structure, behavior, and impactApplied Network Science10.1007/s41109-024-00610-w9:1Online publication date: 26-Jan-2024
    • (2024)How to create graphs in hardware-constrained environments? Choosing the best creation approach via machine learning-based predictive modelsInternational Journal of Data Science and Analytics10.1007/s41060-023-00495-5Online publication date: 2-Feb-2024
    • (2024)Mutual Interest-Based Twitter Followee Recommendation Using Latent Dirichlet Allocation Topic ModellingJournal of The Institution of Engineers (India): Series B10.1007/s40031-024-01125-9Online publication date: 9-Aug-2024
    • (2023)Evaluating graph neural networks for link predictionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666291(3853-3866)Online publication date: 10-Dec-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media