Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1718487.1718499acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

fLDA: matrix factorization through latent dirichlet allocation

Published: 04 February 2010 Publication History

Abstract

We propose fLDA, a novel matrix factorization method to predict ratings in recommender system applications where a "bag-of-words" representation for item meta-data is natural. Such scenarios are commonplace in web applications like content recommendation, ad targeting and web search where items are articles, ads and web pages respectively. Because of data sparseness, regularization is key to good predictive accuracy. Our method works by regularizing both user and item factors simultaneously through user features and the bag of words associated with each item. Specifically, each word in an item is associated with a discrete latent factor often referred to as the topic of the word; item topics are obtained by averaging topics across all words in an item. Then, user rating on an item is modeled as user's affinity to the item's topics where user affinity to topics (user factors) and topic assignments to words in items (item factors) are learned jointly in a supervised fashion. To avoid overfitting, user and item factors are regularized through Gaussian linear regression and Latent Dirichlet Allocation (LDA) priors respectively. We show our model is accurate, interpretable and handles both cold-start and warm-start scenarios seamlessly through a single model. The efficacy of our method is illustrated on benchmark datasets and a new dataset from Yahoo! Buzz where fLDA provides superior predictive accuracy in cold-start scenarios and is comparable to state-of-the-art methods in warm-start scenarios. As a by-product, fLDA also identifies interesting topics that explains user-item interactions. Our method also generalizes a recently proposed technique called supervised LDA (sLDA) to collaborative filtering applications. While sLDA estimates item topic vectors in a supervised fashion for a single regression, fLDA incorporates multiple regressions (one for each user) in estimating the item factors.

References

[1]
KDD cup and workshop. 2007.
[2]
D. Agarwal and B.-C. Chen. Regression-based latent factor models. In KDD, 2009.
[3]
D. Agarwal and B.-C. Chen, et al. Online models for content optimization. In NIPS, 2008.
[4]
M. Balabanovic and Y. Shoham. Fab: content-based, collaborative recommendation. Comm. of the ACM, 1997.
[5]
R. Bell, Y. Koren, and C. Volinsky. Modeling relationships at multiple scales to improve accuracy of large recommender systems. In KDD, 2007.
[6]
R.M. Bell and Y. Koren. Scalable collaborative filtering with jointly derived neighborhood interpolation weights. In ICDM, 2007.
[7]
D. Blei and J. McAuliffe. Supervised topic models. In NIPS, 2008.
[8]
D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent dirichlet allocation. JMLR, 3, 2003.
[9]
J. Booth and J. Hobert. Maximizing generalized linear mixed model likelihoods with an automated monte carlo EM algorithm. J.R. Statist. Soc. B, 1999.
[10]
Y. Chen, D. Pavlov, and J.F. Canny. Large-scale behavioral targeting. In KDD, 2009.
[11]
M. Claypool and A. Gokhale, et al. Combining content-based and collaborative filters in an online newspaper. In Recommender Systems Workshop, 1999.
[12]
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J.R. Statist. Soc. B, 1977.
[13]
A.E. Gelfand. Gibbs sampling. JASA, 1995.
[14]
L. Getoor and B. Taskar. Introduction to Statistical Relational Learning. MIT Press, 2007.
[15]
N. Good and J.B. Schafer, et al. Combining collaborative filtering with personal agents for better recommendations. In AAAI, 1999.
[16]
T.L. Griffiths and M. Steyvers. Finding scientific topics. In Proc. of National Academy of Sciences, 2004.
[17]
X. Jin, Y. Zhou, and B. Mobasher. A maximum entropy web recommendation system: Combining collaborative and content features. In KDD, 2005.
[18]
Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD, 2008.
[19]
N. Lawrence and R. Urtasun. Non-linear matrix factorization with gaussian processes. In ICML, 2009.
[20]
S.-T. Park and D. Pennock, et al. Naive filterbots for robust cold-start recommendations. In KDD, 2006.
[21]
I. Porteous, E. Bart, and M. Welling. Multi-hdp: A non parametric bayesian model for tensor factorization. In AAAI, 2008.
[22]
J. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction. In ICML, 2005.
[23]
P.E. Rossi, G. Allenby, and R.P. McCulloch. Bayesian Statistics and Marketing. John Wiley, 2005.
[24]
R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using markov chain monte carlo. In ICML, 2008.
[25]
R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In NIPS, 2008.
[26]
A.I. Schein and R. Popescul, et al. Methods and metrics for cold-start recommendations. In SIGIR, 2002.
[27]
A.P. Singh and G.J. Gordon. Relational learning via collective matrix factorization. In KDD, 2008.
[28]
D.H. Stern, R. Herbrich, and T. Graepel. Matchbox: large scale online bayesian recommendations. In WWW, 2009.
[29]
Y. Wang and H. Bai, et al. Plda: Parallel latent dirichlet allocation for large-scale applications. In AAIM, 2009.
[30]
K. Yu, J. Lafferty, and S. Zhu. Large-scale collaborative prediction using a nonparametric random effects model. In ICML, 2009.
[31]
C.-N. Ziegler, S.M. McNee, J.A. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. In WWW, 2005.

Cited By

View all
  • (2024)Deep learning with the generative models for recommender systems: A surveyComputer Science Review10.1016/j.cosrev.2024.10064653(100646)Online publication date: Aug-2024
  • (2024)Improving selection diversity using hybrid graph-based news recommendersUser Modeling and User-Adapted Interaction10.1007/s11257-024-09399-w34:4(955-993)Online publication date: 1-Sep-2024
  • (2023)Identification of Hydrogen-Energy-Related Emerging Technologies Based on Text MiningSustainability10.3390/su1601014716:1(147)Online publication date: 22-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '10: Proceedings of the third ACM international conference on Web search and data mining
February 2010
468 pages
ISBN:9781605588896
DOI:10.1145/1718487
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 February 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bayesian hierarchical model
  2. collaborative filtering
  3. graphical model
  4. latent factor model
  5. recommender systems
  6. supervised topic model

Qualifiers

  • Research-article

Conference

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)91
  • Downloads (Last 6 weeks)18
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Deep learning with the generative models for recommender systems: A surveyComputer Science Review10.1016/j.cosrev.2024.10064653(100646)Online publication date: Aug-2024
  • (2024)Improving selection diversity using hybrid graph-based news recommendersUser Modeling and User-Adapted Interaction10.1007/s11257-024-09399-w34:4(955-993)Online publication date: 1-Sep-2024
  • (2023)Identification of Hydrogen-Energy-Related Emerging Technologies Based on Text MiningSustainability10.3390/su1601014716:1(147)Online publication date: 22-Dec-2023
  • (2023)Content-Based Recommender Systems TaxonomyFoundations of Computing and Decision Sciences10.2478/fcds-2023-000948:2(211-241)Online publication date: 30-Jun-2023
  • (2023)Topics in the Haystack: Enhancing Topic Quality through Corpus ExpansionComputational Linguistics10.1162/coli_a_0050650:2(619-655)Online publication date: 1-Jun-2023
  • (2023)PUB-VEN: a personalized recommendation system for suggesting publication venuesMultimedia Tools and Applications10.1007/s11042-023-16798-583:14(42103-42124)Online publication date: 14-Oct-2023
  • (2022)Phishing Website Detection With Semantic Features Based on Machine Learning ClassifiersInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.29703218:1(1-24)Online publication date: 23-Feb-2022
  • (2022)Semantic Trajectory Frequent Pattern Mining ModelInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.29703118:1(1-20)Online publication date: 23-Feb-2022
  • (2022)A Context-Independent Ontological Linked Data Alignment Approach to Instance MatchingInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.29597718:1(1-29)Online publication date: 22-Feb-2022
  • (2022)Mc-DNNInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.29555318:1(1-20)Online publication date: 17-Feb-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media