research-article

MSGD: A Novel Matrix Factorization Approach for Large-Scale Collaborative Filtering Recommender Systems on GPUs

Authors:

Keqin LiAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 29, Issue 7

Pages 1530 - 1544

https://doi.org/10.1109/TPDS.2017.2718515

Published: 01 July 2018 Publication History

Abstract

Real-time accurate recommendation of large-scale recommender systems is a challenging task. Matrix factorization (MF), as one of the most accurate and scalable techniques to predict missing ratings, has become popular in the collaborative filtering (CF) community. Currently, stochastic gradient descent (SGD) is one of the most famous approaches for MF. However, it is non-trivial to parallelize SGD for large-scale CF MF problems due to the dependence on the user and item pair, which can cause parallelization over-writing. To remove the dependence on the user and item pair, we propose a multi-stream SGD (MSGD) approach, for which the update process is theoretically convergent. On that basis, we propose a <italic>Compute Unified Device Architecture</italic> (CUDA) parallelization MSGD (CUMSGD) approach. CUMSGD can obtain high parallelism and scalability on <inline-formula><tex-math notation="LaTeX">${Graphic \;Processing\; Units}$</tex-math><alternatives><inline-graphic xlink:href="li-ieq1-2718515.gif"/></alternatives> </inline-formula> (GPUs). On Tesla K20m and K40c GPUs, the experimental results show that CUMSGD outperforms prior works that accelerated MF on shared memory systems, e.g., DSGD, FPSGD, Hogwild!, and CCD++. For large-scale CF problems, we propose multiple GPUs (multi-GPU) CUMSGD (MCUMSGD). The experimental results show that MCUMSGD can improve MSGD performance further. With a K20m GPU card, CUMSGD can be 5-10 times as fast compared with the state-of-the-art approaches on shared memory platform.

References

[1]

Y. Cai, H.-F. Leung, Q. Li, H. Min, J. Tang, and J. Li, “ Typicality-based collaborative filtering recommendation,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 3, pp. 766–779, Mar. 2014.

Digital Library

[2]

G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 734–749, Jun. 2005 .

Digital Library

[3]

N. Srebro, J. Rennie, and T. S. Jaakkola, “ Maximum-margin matrix factorization,” in Proc. Advances Neural Inf. Process. Syst. , 2004, pp. 1329–1336.

[4]

Y. Koren, “Factor in the neighbors: Scalable and accurate collaborative filtering,” ACM Trans. Knowl. Discovery Data, vol. 4, no. 1, 2010, Art. no.

[5]

Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, Aug. 2009.

Digital Library

[6]

J. Bennett and S. Lanning, “The netflix prize,” in Proc. KDD Cup Workshop, 2007, vol. 2007, Art. no.

[7]

N. Srebro, “Learning with matrix factorizations,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., Massachusetts Inst. Technol., Cambridge, MA, USA, 2004.

[8]

Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan, “Large-scale parallel collaborative filtering for the netflix prize,” in Algorithmic Aspects in Information and Management . Berlin, Germany: Springer, 2008, pp. 337–348.

Digital Library

[9]

W. Tan, L. Cao, and L. Fong, “Faster and cheaper: Parallelizing large-scale matrix factorization on GPUs,” in Proc. 25th ACM Int. Symp. High-Performance Parallel Distrib. Comput., 2016, pp. 219– 230.

[10]

H.-F. Yu, C.-J. Hsieh, S. Si, and I. Dhillon, “Scalable coordinate descent approaches to parallel matrix factorization for recommender systems,” in Proc. IEEE 12th Int. Conf. Data Mining, 2012, pp. 765–774 .

[11]

H.-F. Yu, C.-J. Hsieh, S. Si, and I. S. Dhillon, “Parallel matrix factorization for recommender systems,” Knowl. Inf. Syst., vol. 41, no. 3, pp. 793–819, 2014.

Digital Library

[12]

C.-J. Hsieh and I. S. Dhillon, “Fast coordinate descent methods with variable selection for non-negative matrix factorization,” in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2011, pp. 1064–1072.

[13]

D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” in Proc. Advances Neural Inf. Process. Syst., 2001, pp. 556–562.

[14]

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin, “A fast parallel stochastic gradient method for matrix factorization in shared memory systems,” ACM Trans. Intell. Syst. Technol., vol. 6, no. 1, 2015, Art. no.

[15]

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin, “A learning-rate schedule for stochastic gradient methods to matrix factorization,” in Proc. Pacific-Asia Conf. Knowl. Discovery Data Mining, 2015, pp. 442–455.

[16]

R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis, “Large-scale matrix factorization with distributed stochastic gradient descent,” in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2011, pp. 69–77.

[17]

C. Teflioudi, F. Makari, and R. Gemulla, “ Distributed matrix completion,” in Proc. IEEE 12th Int. Conf. Data Mining, 2012, pp. 655–664.

[18]

F. Niu, B. Recht, C. Re, and S. Wright, “HOGWILD!: A lock-free approach to parallelizing stochastic gradient descent,” in Proc. Advances Neural Inf. Process. Syst., 2011, pp. 693–701.

[19]

B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Application of dimensionality reduction in recommender system-A case study,” Minnesota Univ Minneapolis Dept of Computer Science, 2000, http://www.dtic.mil/docs/citations/ADA439541

[20]

M.-F. Weng and Y.-Y. Chuang, “Collaborative video reindexing via matrix factorization,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 8, no. 2, 2012, Art. no.

[21]

Z. Zheng, H. Ma, M. R. Lyu, and I. King, “Collaborative web service QoS prediction via neighborhood integrated matrix factorization,” IEEE Trans. Services Comput., vol. 6, no. 3, pp. 289–299, Jul.–Sep. 2013.

Digital Library

[22]

D. Lian, C. Zhao, X. Xie, G. Sun, E. Chen, and Y. Rui, “ GeoMF: Joint geographical modeling and matrix factorization for point-of-interest recommendation,” in Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2014, pp. 831–840.

[23]

J. Langford, M. Zinkevich, and A. J. Smola, “Slow learners are fast,” in Proc. Advances Neural Inf. Process. Syst., 2009, pp. 2331–2339.

[24]

M. Zinkevich, M. Weimer, L. Li, and A. J. Smola, “Parallelized stochastic gradient descent,” in Proc. Advances Neural Inf. Process. Syst., 2010, pp. 2595–2603.

[25]

A. Agarwal and J. C. Duchi, “Distributed delayed stochastic optimization,” in Proc. Advances Neural Inf. Process. Syst., 2011, pp. 873–881.

[26]

H. Yun, H.-F. Yu, C.-J. Hsieh, S. Vishwanathan, and I. Dhillon, “NOMAD: Non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion,” Proc. VLDB Endowment, vol. 7, no. 11, pp. 975–986, 2014.

[27]

J. Jin, S. Lai, S. Hu, J. Lin, and X. Lin, “GPUSGD: A GPU-accelerated stochastic gradient descent algorithm for matrix factorization,” Concurrency Comput.: Practice Experience, vol. 28, pp. 3844–3865, 2015.

[28]

J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU computing,” Proc. IEEE , vol. 96, no. 5, pp. 879–899, May 2008.

[29]

G. Pratx and L. Xing, “GPU computing in medical physics: A review,” Med. Physics, vol. 38, no. 5, pp. 2685–2697, 2011.

[30]

P. Guo, L. Wang, and P. Chen, “A performance modeling and optimization analysis tool for sparse matrix-vector multiplication on GPUs,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 5, pp. 1112–1123, May 2014.

Digital Library

[31]

K. Li, W. Yang, and K. Li, “Performance analysis and optimization for SpMV on GPU using probabilistic modeling,” IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 1, pp. 196– 205, Jan. 2015.

[32]

Z. Gao, Y. Liang, and Y. Jiang, “Implement of item-based recommendation on GPU,” in Proc. IEEE 2nd Int. Conf. Cloud Comput. Intell. Syst., 2012, vol. 2, pp. 587–590.

[33]

K. Kato and T. Hosino, “Singular value decomposition for collaborative filtering on a GPU,” in Proc. IOP Conf. Series: Mater. Sci. Eng., 2010, pp. 012– 017.

[34]

A. Cano and C. Garcia-Martinez, “100 million dimensions large-scale global optimization using distributed GPU computing,” in Proc. IEEE Congr. Evol. Comput., 2016, pp. 3566–3573.

[35]

W. Yang, K. Li, Z. Mo, and K. Li, “Performance optimization using partitioned SpMV on GPUs and multicore CPUs,” IEEE Trans. Comput., vol. 64, no. 9, pp. 2623–2636, Sep. 2015.

Digital Library

[36]

J. A. Stuart and J. D. Owens, “Multi-GPU MapReduce on GPU clusters,” in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2011, pp. 1068– 1079.

[37]

C. Chen, K. Li, A. Ouyang, and K. Li, “GFlink: An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data,” in Proc. IEEE 45th Int. Conf. Parallel Process., 2016, pp. 542–551.

[38]

L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proc. Int. Conf. Comput. Statist., 2010, pp. 177–186.

[39]

R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using predictive variance reduction,” in Proc. Advances Neural Inf. Process. Syst., 2013, pp. 315 –323.

[40]

T. Zhang, “Solving large scale linear prediction problems using stochastic gradient descent algorithms,” in Proc. 21st Int. Conf. Mach. Learn., 2004, Art. no. 116.

[41]

N. L. Roux, M. Schmidt, and F. R. Bach, “A stochastic gradient method with an exponential convergence _rate for finite training sets,” in Proc. Advances Neural Inf. Process. Syst., 2012, pp. 2663– 2671.

[42]

S. J. Reddi, A. Hefny, S. Sra, B. Poczos, and A. Smola, “Stochastic variance reduction for nonconvex optimization,” in Proc. 33rd Int. Conf. Mach. Learn., 2016, pp. 314– 323.

[43]

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, “Robust stochastic approximation approach to stochastic programming,” SIAM J. Optimization, vol. 19, no. 4, pp. 1574–1609, 2009.

Digital Library

[44]

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Berlin, Germany: Springer, 2013.

Digital Library

[45]

C. Nvidia, “C Programming Guide, v7.,” Oct. 2015.

[46]

S. D. Babacan, M. Luessi, R. Molina, and A. K. Katsaggelos, “Sparse Bayesian methods for low-rank matrix estimation,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 3964–3977, Aug. 2012.

Digital Library

[47]

J. Ye, J.-H. Chow, J. Chen, and Z. Zheng, “Stochastic gradient boosted distributed decision trees,” in Proc. 18th ACM Conf. Inf. Knowl. Manage., 2009, pp. 2061–2064.

[48]

J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with CUDA,” Queue, vol. 6, no. 2, pp. 40 –53, 2008.

Digital Library

[49]

E. Hazan, “Introduction to online convex optimization,” Found. Trends® Optimization, vol. 2, no. 3/4, pp. 157 –325, 2016.

Digital Library

Cited By

Lin ZHong WXu XLin MXu Z(2025)A fast self-adaptive intuitionistic fuzzy latent factor modelInformation Sciences: an International Journal10.1016/j.ins.2024.121713694:COnline publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1016/j.ins.2024.121713
Bae HKim YKim HKim SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Negative Sampling in Next-POI Recommendations: Observation, Approach, and EvaluationProceedings of the ACM Web Conference 202410.1145/3589334.3645681(3888-3899)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645681
Elahi FFazlali MMalazi HElahi M(2024)Parallel Fractional Stochastic Gradient Descent With Adaptive Learning for Recommender SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318521235:3(470-483)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1109/TPDS.2022.3185212
Show More Cited By

Index Terms

MSGD: A Novel Matrix Factorization Approach for Large-Scale Collaborative Filtering Recommender Systems on GPUs
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals

Index terms have been assigned to the content through auto-classification.

Recommendations

Collaborative filtering using non-negative matrix factorisation

Collaborative filtering is a popular strategy in recommender systems area. This approach gathers users' ratings and then predicts what users will rate based on their similarity to other users. However, most of the collaborative filtering methods have ...
High Performance Coordinate Descent Matrix Factorization for Recommender Systems
CF'17: Proceedings of the Computing Frontiers Conference

Coordinate descent (CD) has been proved to be an effective technique for matrix factorization (MF) in recommender systems. To speed up factorizing performance, various methods of implementing parallel CDMF have been proposed to leverage modern multi-...
Enhancing memory-based collaborative filtering for group recommender systems

Enhancing memory-based collaborative filtering techniques for group recommender systems by resolving the data sparsity problem.Comparing the proposed method's accuracy with basic memory-based techniques and latent factor model.Makeing accurate ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 29, Issue 7

July 2018

236 pages

ISSN:1045-9219

Issue’s Table of Contents

1045-9219 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 July 2018

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lin ZHong WXu XLin MXu Z(2025)A fast self-adaptive intuitionistic fuzzy latent factor modelInformation Sciences: an International Journal10.1016/j.ins.2024.121713694:COnline publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1016/j.ins.2024.121713
Bae HKim YKim HKim SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Negative Sampling in Next-POI Recommendations: Observation, Approach, and EvaluationProceedings of the ACM Web Conference 202410.1145/3589334.3645681(3888-3899)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645681
Elahi FFazlali MMalazi HElahi M(2024)Parallel Fractional Stochastic Gradient Descent With Adaptive Learning for Recommender SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318521235:3(470-483)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1109/TPDS.2022.3185212
Qin WLuo X(2024)Asynchronous Parallel Fuzzy Stochastic Gradient Descent for High-Dimensional Incomplete Data RepresentationIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.330037032:2(445-459)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TFUZZ.2023.3300370
Büyükkaya KKarsavuran MAykanat C(2024)Stochastic Gradient Descent for matrix completionKnowledge-Based Systems10.1016/j.knosys.2023.111176283:COnline publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1016/j.knosys.2023.111176
Behera GNain NSoni R(2024)Integrating user-side information into matrix factorization to address data sparsity of collaborative filteringMultimedia Systems10.1007/s00530-024-01261-830:2Online publication date: 18-Feb-2024
https://dl.acm.org/doi/10.1007/s00530-024-01261-8
Zhang CSmith SSun BTian JSoifer JYu XSong SHe YTao DGallivan KNikolopoulos DBeivide RGallopoulos E(2023)HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUsProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593717(324-335)Online publication date: 21-Jun-2023
https://dl.acm.org/doi/10.1145/3577193.3593717
Hijma PHeldens SSclocco Avan Werkhoven BBal H(2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
https://dl.acm.org/doi/10.1145/3570638
Huang YLiu YBai YChen SLi R(2023)UMA-MF: A Unified Multi-CPU/GPU Asynchronous Computing Framework for SGD-Based Matrix FactorizationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.331753534:11(2978-2993)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1109/TPDS.2023.3317535
Wang GLiu YSun YYu JSun Z(2023)Generalized zeroing neural dynamics model for online solving time-varying cube roots problem with various external disturbances in different domainsInformation Sciences: an International Journal10.1016/j.ins.2022.12.058623:C(648-670)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1016/j.ins.2022.12.058
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents