Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2365952.2365984acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Scalable similarity-based neighborhood methods with MapReduce

Published: 09 September 2012 Publication History
  • Get Citation Alerts
  • Abstract

    Similarity-based neighborhood methods, a simple and popular approach to collaborative filtering, infer their predictions by finding users with similar taste or items that have been similarly rated. If the number of users grows to millions, the standard approach of sequentially examining each item and looking at all interacting users does not scale. To solve this problem, we develop a MapReduce algorithm for the pairwise item comparison and top-N recommendation problem that scales linearly with respect to a growing number of users. This parallel algorithm is able to work on partitioned data and is general in that it supports a wide range of similarity measures. We evaluate our algorithm on a large dataset consisting of 700 million song ratings from Yahoo! Music.

    References

    [1]
    K. Ali and W. van Stam. Tivo: Making show recommendations using a distributed collaborative filtering architecture. KDD, 2004.
    [2]
    Apache Hadoop, http://hadoop.apache.org.
    [3]
    Apache Mahout, http://mahout.apache.org.
    [4]
    R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. WWW, pp. 131--140, 2007.
    [5]
    R. M. Bell and Y. Koren. Lessons from the netflix prize challenge. SIGKDD Newsl., 9:75--79, 2007.
    [6]
    A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. WWW, pp. 271--280, 2007.
    [7]
    J. Davidson, B. Liebald, J. Liu, P. Nandy, T. Van Vleet, U. Gargi, S. Gupta, Y. He, M. Lambert, B. Livingston, and D. Sampath. The youtube video recommendation system. RecSys, pp. 293--296, 2010.
    [8]
    J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107--113, 2008.
    [9]
    D. DeWitt, R. Gerber, G. Graefe, M. Heytens, K. Kumar, and M. Muralikrishna. GAMMA - a high performance data flow database machine. VLDB, pp. 228--237, 1986.
    [10]
    T. Dunning. Accurate methods for the statistics of surprise and coincidence. ACL, 19:61--74, 1993.
    [11]
    M. D. Ekstrand, M. Ludwig, J. A. Konstan, and J. T. Riedl. Rethinking the recommender research ecosystem: reproducibility, openness, and lenskit. RecSys, pp. 133--140, 2011.
    [12]
    S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl. Spinning Fast Iterative Data Flows. PVLDB, 2012.
    [13]
    S. Fushimi, M. Kitsuregawa, and H. Tanaka. An overview of the system software of a parallel relational database machine GRACE. VLDB, pp. 209--219, 1986.
    [14]
    Z. Gantner, S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme. Mymedialite: a free recommender system library. RecSys, pp. 305--308, 2011.
    [15]
    R. Gemulla, E. Nijkamp, P. Haas, and Y. Sismannis. Large-scale matrix factorization with distributed stochastic gradient descent. KDD, pp. 69--77, 2011.
    [16]
    M. Jamali and M. Ester. Trustwalker: a random walk model for combining trust-based and item-based recommendation. KDD, pp. 397--406, 2009.
    [17]
    J. Jiang, J. Lu, G. Zhang, and G. Long. Scaling-up item-based collaborative filtering recommendation algorithm based on hadoop. SERVICES, pp. 490--497, 2011.
    [18]
    Y. Koren. Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Trans. KDD, 4:1:1--1:24, 2010.
    [19]
    G. Linden, B. Smith, and J. York. Amazon.com recommendations: item-to-item collaborative filtering. Internet Computing, IEEE, 7(1):76--80, 2003.
    [20]
    Y. Low and J. Gonzalez and A. Kyrola and D. Bickson and C. Guestrin and J. Hellerstein. Distributed GraphLab: A Framework for Machine Learning in the Cloud. PVLDB, 2012.
    [21]
    P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: an open architecture for collaborative filtering of netnews. CSCW, pp. 175--186, 1994.
    [22]
    F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor. Recommender Systems Handbook. 2011.
    [23]
    B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. WWW, pp. 285--295, 2001.
    [24]
    E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social network. KDD, pp. 678--684, 2005.
    [25]
    P. Symeonidis, E. Tiakas, and Y. Manolopoulos. Product recommendation and rating prediction based on multi-modal social networks. RecSys, pp. 61--68, 2011.
    [26]
    Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the netflix prize. AAIM, pp. 337--348, 2008.

    Cited By

    View all
    • (2022)VSIMJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.07.009158:C(29-46)Online publication date: 22-Apr-2022
    • (2021)Collaborative Filtering Recommendation Using Nonnegative Matrix Factorization in GPU-Accelerated Spark PlatformScientific Programming10.1155/2021/88411332021Online publication date: 1-Jan-2021
    • (2021)Learnings from a Retail Recommendation System on Billions of Interactions at bol.com2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00277(2447-2452)Online publication date: Apr-2021
    • Show More Cited By

    Index Terms

    1. Scalable similarity-based neighborhood methods with MapReduce

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      RecSys '12: Proceedings of the sixth ACM conference on Recommender systems
      September 2012
      376 pages
      ISBN:9781450312707
      DOI:10.1145/2365952
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 September 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. MapReduce
      2. scalable collaborative filtering

      Qualifiers

      • Research-article

      Conference

      RecSys '12
      Sponsor:
      RecSys '12: Sixth ACM Conference on Recommender Systems
      September 9 - 13, 2012
      Dublin, Ireland

      Acceptance Rates

      RecSys '12 Paper Acceptance Rate 24 of 119 submissions, 20%;
      Overall Acceptance Rate 254 of 1,295 submissions, 20%

      Upcoming Conference

      RecSys '24
      18th ACM Conference on Recommender Systems
      October 14 - 18, 2024
      Bari , Italy

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)15
      • Downloads (Last 6 weeks)1

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)VSIMJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.07.009158:C(29-46)Online publication date: 22-Apr-2022
      • (2021)Collaborative Filtering Recommendation Using Nonnegative Matrix Factorization in GPU-Accelerated Spark PlatformScientific Programming10.1155/2021/88411332021Online publication date: 1-Jan-2021
      • (2021)Learnings from a Retail Recommendation System on Billions of Interactions at bol.com2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00277(2447-2452)Online publication date: Apr-2021
      • (2021)SmartDL: energy-aware decremental learning in a mobile-based federation for geo-spatial systemNeural Computing and Applications10.1007/s00521-021-06378-935:5(3677-3696)Online publication date: 9-Aug-2021
      • (2019)Efficient Incremental Cooccurrence Analysis for Item-Based Collaborative FilteringProceedings of the 31st International Conference on Scientific and Statistical Database Management10.1145/3335783.3335784(61-72)Online publication date: 23-Jul-2019
      • (2019)An efficient parallel similarity matrix construction on MapReduce for collaborative filteringThe Journal of Supercomputing10.1007/s11227-018-2271-375:1(123-141)Online publication date: 1-Jan-2019
      • (2019)Resource Efficiency Optimization for Big Data Mining Algorithm with Multi-MapReduce Collaboration ScenarioIntelligent Computing Methodologies10.1007/978-3-030-26766-7_46(503-514)Online publication date: 24-Jul-2019
      • (2019)Datasets for Business and Consumer AnalyticsBusiness and Consumer Analytics: New Ideas10.1007/978-3-030-06222-4_26(965-987)Online publication date: 31-May-2019
      • (2018)Tavsiye Sistemlerinde Büyük Verinin Kullanımı Üzerine Kapsamlı Bir İncelemeMarmara Fen Bilimleri Dergisi10.7240/marufbd.44009530:4(339-357)Online publication date: 31-Dec-2018
      • (2018)Towards Scalable Recommendation Framework with Heterogeneous Data Sources: Preliminary Results2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)10.1109/SITIS.2018.00102(632-636)Online publication date: Nov-2018
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media