research-article

Public Access

DiFacto: Distributed Factorization Machines

Authors:

Alexander J. Smola,

Yu-Xiang WangAuthors Info & Claims

WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

Pages 377 - 386

https://doi.org/10.1145/2835776.2835781

Published: 08 February 2016 Publication History

Abstract

Factorization Machines offer good performance and useful embeddings of data. However, they are costly to scale to large amounts of data and large numbers of features. In this paper we describe DiFacto, which uses a refined Factorization Machine model with sparse memory adaptive constraints and frequency adaptive regularization. We show how to distribute DiFacto over multiple machines using the Parameter Server framework by computing distributed subgradients on minibatches asynchronously. We analyze its convergence and demonstrate its efficiency in computational advertising datasets with billions examples and features.

References

[1]

A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed large-scale natural graph factorization. In World Wide Web Conference, Rio de Janeiro, 2013.

Digital Library

[2]

R. M. Bell and Y. Koren. Lessons from the netflix prize challenge. SIGKDD Explorations, 9 (2): 75--79, 2007. URL http://doi.acm.org/10.1145/1345448.1345465.

Digital Library

[3]

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In Neural Information Processing Systems, 2012.

[4]

J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12: 2121--2159, 2010.

Digital Library

[5]

S. Ghadimi and G. Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23 (4): 2341--2368, 2013.

Digital Library

[6]

Criteo Labs. Criteo terabyte click logs, 2014. http://labs.criteo.com/downloads/download-terabyte-click-logs.

[7]

Q.V. Le, T. Sarlos, and A. J. Smola. Fastfood -- computing hilbert space expansions in loglinear time. In International Conference on Machine Learning, 2013.

[8]

M. Li, D. G. Andersen, J. Park, A. J. Smola, A. Amhed, V. Josifovski, J. Long, E. Shekita, and B. Y. Su. Scaling distributed machine learning with the parameter server. In OSDI, 2014.

Digital Library

[9]

M. Li, D. G. Andersen, A. J. Smola, and K. Yu. Communication efficient distributed machine learning with the parameter server. In Neural Information Processing Systems, 2014.

[10]

X. Lian, Y. Huang, Y. Li, and J. Liu. Asynchronous parallel stochastic gradient for nonconvex optimization. pharXiv preprint arXiv:1506.08272, 2015.

[11]

B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and l1 regularization. In International Conference on Artificial Intelligence and Statistics, pages 525--533, 2011.

[12]

B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, and D. Golovin. Ad click prediction: a view from the trenches. In KDD, 2013.

Digital Library

[13]

T. Moon, A. J. Smola, Y. Chang, and Z. Zheng. Intervalrank: isotonic regression with listwise and pairwise constraints. In B.D. Davison, T. Suel, N. Craswell, and B. Liu, editors, Proceedings of the Third International Conference on Web Search and Web Data Mining, WSDM, pages 151--160. ACM, 2010.

Digital Library

[14]

S. Negahban, P. Ravikumar, M.J. Wainwright, and B. Yu. A unified framework for high-dimensional analysis of m-est imators with decomposable regularizers. pharXiv preprint arXiv:1010.2731, 2010.

[15]

S. Rendle and L. Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In Web search and data mining, pages 81--90. ACM, 2010.

Digital Library

[16]

S. Rendle. Time-Variant Factorization Models Context-Aware Ranking with Factorization Models. volume 330 of Studies in Computational Intelligence, chapter 9, pages 137--153. 2011. ISBN 978-3-642-16897-0.

[17]

A. J. Smola and S. Narayanamurthy. An architecture for parallel topic models. In Very Large Databases (VLDB), 2010.

Digital Library

[18]

N. Srebro, N. Alon, and T. Jaakkola. Generalization error bounds for collaborative prediction with low-rank matrices. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, Cambridge, MA, 2005. MIT Press.

[19]

B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16, pages 25--32, Cambridge, MA, 2004. MIT Press.

[20]

R. Tibshirani. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol., 58: 267--288, 1996.

[21]

G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.

[22]

J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In D. W.-L. Cheung, I.-Y. Song, W.W. Chu, X. Hu, and J.J. Lin, editors, Conference on Information and Knowledge Management, CIKM, pages 2061--2064. ACM, 2009.

Digital Library

Cited By

Zhao KLeng YZhang H(2023)Scaling Machine Learning with a Ring-based Distributed FrameworkProceedings of the 2023 7th International Conference on Computer Science and Artificial Intelligence10.1145/3638584.3638667(23-32)Online publication date: 8-Dec-2023
https://dl.acm.org/doi/10.1145/3638584.3638667
Liu ZZhang YZhu YZhang RYang TXie KWang SLi TCui B(2023)TreeSensing: Linearly Compressing Sketches with FlexibilityProceedings of the ACM on Management of Data10.1145/35889101:1(1-28)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588910
Ma CLi JWei KLiu BDing MYuan LHan ZVincent Poor H(2023)Trusted AI in Multiagent Systems: An Overview of Privacy and Security for Distributed LearningProceedings of the IEEE10.1109/JPROC.2023.3306773111:9(1097-1132)Online publication date: Sep-2023
https://doi.org/10.1109/JPROC.2023.3306773
Show More Cited By

Index Terms

DiFacto: Distributed Factorization Machines
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction
2. Mathematics of computing
  1. Probability and statistics
    1. Nonparametric statistics

Recommendations

Convex factorization machines
ECMLPKDD'15: Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

Factorization machines are a generic framework which allows to mimic many factorization models simply by feature engineering. In this way, they combine the high predictive accuracy of factorization models with the flexibility of feature engineering. ...
An Elementary View on Factorization Machines
RecSys '17: Proceedings of the Eleventh ACM Conference on Recommender Systems

Factorization Machines (FMs) are a model class capable of learning pairwise (and in general higher order) feature interactions from high dimensional, sparse data. In this paper we adopt an elementary view on FMs. Specifically, we view FMs as a sum of ...
Verification of Usefulness of Student Modeling with Real Educational Data using Convex Factorization Machines
Abstract
Extracting useful information generated from educational settings involves the application of data mining, machine learning, and statistics to the large amount of electronic data collected by educational systems. To generate better higher learning ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

February 2016

746 pages

ISBN:9781450337168

DOI:10.1145/2835776

General Chairs:
Paul N. Bennett
Microsoft Research
,
Vanja Josifovski
Pinterest
,
Program Chairs:
Jennifer Neville
Purdue University
,
Filip Radlinski
Microsoft

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Adobe
National Science Foundation

Conference

WSDM 2016

Sponsor:

WSDM 2016: Ninth ACM International Conference on Web Search and Data Mining

February 22 - 25, 2016

California, San Francisco, USA

Acceptance Rates

WSDM '16 Paper Acceptance Rate 67 of 368 submissions, 18%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
1,279
Total Downloads

Downloads (Last 12 months)193
Downloads (Last 6 weeks)28

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao KLeng YZhang H(2023)Scaling Machine Learning with a Ring-based Distributed FrameworkProceedings of the 2023 7th International Conference on Computer Science and Artificial Intelligence10.1145/3638584.3638667(23-32)Online publication date: 8-Dec-2023
https://dl.acm.org/doi/10.1145/3638584.3638667
Liu ZZhang YZhu YZhang RYang TXie KWang SLi TCui B(2023)TreeSensing: Linearly Compressing Sketches with FlexibilityProceedings of the ACM on Management of Data10.1145/35889101:1(1-28)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588910
Ma CLi JWei KLiu BDing MYuan LHan ZVincent Poor H(2023)Trusted AI in Multiagent Systems: An Overview of Privacy and Security for Distributed LearningProceedings of the IEEE10.1109/JPROC.2023.3306773111:9(1097-1132)Online publication date: Sep-2023
https://doi.org/10.1109/JPROC.2023.3306773
Harasic MKeese FMattern DPaschke A(2023)Recent advances and future challenges in federated recommender systemsInternational Journal of Data Science and Analytics10.1007/s41060-023-00442-417:4(337-357)Online publication date: 25-Aug-2023
https://doi.org/10.1007/s41060-023-00442-4
Liu SGe YXu SZhang YMarian A(2022)Fairness-aware Federated Matrix FactorizationProceedings of the 16th ACM Conference on Recommender Systems10.1145/3523227.3546771(168-178)Online publication date: 12-Sep-2022
https://dl.acm.org/doi/10.1145/3523227.3546771
Cai BGuo QYu J(2021)LraSched: Admitting More Long-Running Applications via Auto-Estimating Container Size and AffinityThe Computer Journal10.1093/comjnl/bxab07265:9(2377-2391)Online publication date: 31-May-2021
https://doi.org/10.1093/comjnl/bxab072
Hong YHan P(2021)LSDDL: Layer-wise Sparsification for Distributed Deep LearningBig Data Research10.1016/j.bdr.2021.100272(100272)Online publication date: Sep-2021
https://doi.org/10.1016/j.bdr.2021.100272
Atarashi KIshihata M(2021)Vertical Federated Learning for Higher-Order Factorization MachinesAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-75765-6_28(346-357)Online publication date: 8-May-2021
https://doi.org/10.1007/978-3-030-75765-6_28
Rong HWang YZhou FZhai JWu HLan RLi FZhang HYang YGuo ZWang DHuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)Distributed Equivalent Substitution Training for Large-Scale Recommender SystemsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401113(911-920)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401113
Li ZCheng WChen YChen HWang WCaverlee JHu XLalmas MWang W(2020)Interpretable Click-Through Rate Prediction through Hierarchical AttentionProceedings of the 13th International Conference on Web Search and Data Mining10.1145/3336191.3371785(313-321)Online publication date: 20-Jan-2020
https://dl.acm.org/doi/10.1145/3336191.3371785
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents