Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2835776.2835781acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Public Access

DiFacto: Distributed Factorization Machines

Published: 08 February 2016 Publication History

Abstract

Factorization Machines offer good performance and useful embeddings of data. However, they are costly to scale to large amounts of data and large numbers of features. In this paper we describe DiFacto, which uses a refined Factorization Machine model with sparse memory adaptive constraints and frequency adaptive regularization. We show how to distribute DiFacto over multiple machines using the Parameter Server framework by computing distributed subgradients on minibatches asynchronously. We analyze its convergence and demonstrate its efficiency in computational advertising datasets with billions examples and features.

References

[1]
A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed large-scale natural graph factorization. In World Wide Web Conference, Rio de Janeiro, 2013.
[2]
R. M. Bell and Y. Koren. Lessons from the netflix prize challenge. SIGKDD Explorations, 9 (2): 75--79, 2007. URL http://doi.acm.org/10.1145/1345448.1345465.
[3]
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In Neural Information Processing Systems, 2012.
[4]
J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12: 2121--2159, 2010.
[5]
S. Ghadimi and G. Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23 (4): 2341--2368, 2013.
[6]
Criteo Labs. Criteo terabyte click logs, 2014. http://labs.criteo.com/downloads/download-terabyte-click-logs.
[7]
Q.V. Le, T. Sarlos, and A. J. Smola. Fastfood -- computing hilbert space expansions in loglinear time. In International Conference on Machine Learning, 2013.
[8]
M. Li, D. G. Andersen, J. Park, A. J. Smola, A. Amhed, V. Josifovski, J. Long, E. Shekita, and B. Y. Su. Scaling distributed machine learning with the parameter server. In OSDI, 2014.
[9]
M. Li, D. G. Andersen, A. J. Smola, and K. Yu. Communication efficient distributed machine learning with the parameter server. In Neural Information Processing Systems, 2014.
[10]
X. Lian, Y. Huang, Y. Li, and J. Liu. Asynchronous parallel stochastic gradient for nonconvex optimization. pharXiv preprint arXiv:1506.08272, 2015.
[11]
B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and l1 regularization. In International Conference on Artificial Intelligence and Statistics, pages 525--533, 2011.
[12]
B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, and D. Golovin. Ad click prediction: a view from the trenches. In KDD, 2013.
[13]
T. Moon, A. J. Smola, Y. Chang, and Z. Zheng. Intervalrank: isotonic regression with listwise and pairwise constraints. In B.D. Davison, T. Suel, N. Craswell, and B. Liu, editors, Proceedings of the Third International Conference on Web Search and Web Data Mining, WSDM, pages 151--160. ACM, 2010.
[14]
S. Negahban, P. Ravikumar, M.J. Wainwright, and B. Yu. A unified framework for high-dimensional analysis of m-est imators with decomposable regularizers. pharXiv preprint arXiv:1010.2731, 2010.
[15]
S. Rendle and L. Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In Web search and data mining, pages 81--90. ACM, 2010.
[16]
S. Rendle. Time-Variant Factorization Models Context-Aware Ranking with Factorization Models. volume 330 of Studies in Computational Intelligence, chapter 9, pages 137--153. 2011. ISBN 978-3-642-16897-0.
[17]
A. J. Smola and S. Narayanamurthy. An architecture for parallel topic models. In Very Large Databases (VLDB), 2010.
[18]
N. Srebro, N. Alon, and T. Jaakkola. Generalization error bounds for collaborative prediction with low-rank matrices. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, Cambridge, MA, 2005. MIT Press.
[19]
B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16, pages 25--32, Cambridge, MA, 2004. MIT Press.
[20]
R. Tibshirani. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol., 58: 267--288, 1996.
[21]
G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.
[22]
J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In D. W.-L. Cheung, I.-Y. Song, W.W. Chu, X. Hu, and J.J. Lin, editors, Conference on Information and Knowledge Management, CIKM, pages 2061--2064. ACM, 2009.

Cited By

View all
  • (2023)Scaling Machine Learning with a Ring-based Distributed FrameworkProceedings of the 2023 7th International Conference on Computer Science and Artificial Intelligence10.1145/3638584.3638667(23-32)Online publication date: 8-Dec-2023
  • (2023)TreeSensing: Linearly Compressing Sketches with FlexibilityProceedings of the ACM on Management of Data10.1145/35889101:1(1-28)Online publication date: 30-May-2023
  • (2023)Trusted AI in Multiagent Systems: An Overview of Privacy and Security for Distributed LearningProceedings of the IEEE10.1109/JPROC.2023.3306773111:9(1097-1132)Online publication date: Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
February 2016
746 pages
ISBN:9781450337168
DOI:10.1145/2835776
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed systems
  2. factorization machines
  3. optimization
  4. recommender systems
  5. statistics

Qualifiers

  • Research-article

Funding Sources

  • Adobe
  • National Science Foundation

Conference

WSDM 2016
WSDM 2016: Ninth ACM International Conference on Web Search and Data Mining
February 22 - 25, 2016
California, San Francisco, USA

Acceptance Rates

WSDM '16 Paper Acceptance Rate 67 of 368 submissions, 18%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)193
  • Downloads (Last 6 weeks)28
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Scaling Machine Learning with a Ring-based Distributed FrameworkProceedings of the 2023 7th International Conference on Computer Science and Artificial Intelligence10.1145/3638584.3638667(23-32)Online publication date: 8-Dec-2023
  • (2023)TreeSensing: Linearly Compressing Sketches with FlexibilityProceedings of the ACM on Management of Data10.1145/35889101:1(1-28)Online publication date: 30-May-2023
  • (2023)Trusted AI in Multiagent Systems: An Overview of Privacy and Security for Distributed LearningProceedings of the IEEE10.1109/JPROC.2023.3306773111:9(1097-1132)Online publication date: Sep-2023
  • (2023)Recent advances and future challenges in federated recommender systemsInternational Journal of Data Science and Analytics10.1007/s41060-023-00442-417:4(337-357)Online publication date: 25-Aug-2023
  • (2022)Fairness-aware Federated Matrix FactorizationProceedings of the 16th ACM Conference on Recommender Systems10.1145/3523227.3546771(168-178)Online publication date: 12-Sep-2022
  • (2021)LraSched: Admitting More Long-Running Applications via Auto-Estimating Container Size and AffinityThe Computer Journal10.1093/comjnl/bxab07265:9(2377-2391)Online publication date: 31-May-2021
  • (2021)LSDDL: Layer-wise Sparsification for Distributed Deep LearningBig Data Research10.1016/j.bdr.2021.100272(100272)Online publication date: Sep-2021
  • (2021)Vertical Federated Learning for Higher-Order Factorization MachinesAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-75765-6_28(346-357)Online publication date: 8-May-2021
  • (2020)Distributed Equivalent Substitution Training for Large-Scale Recommender SystemsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401113(911-920)Online publication date: 25-Jul-2020
  • (2020)Interpretable Click-Through Rate Prediction through Hierarchical AttentionProceedings of the 13th International Conference on Web Search and Data Mining10.1145/3336191.3371785(313-321)Online publication date: 20-Jan-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media