Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

MSGD: A Novel Matrix Factorization Approach for Large-Scale Collaborative Filtering Recommender Systems on GPUs

Published: 01 July 2018 Publication History

Abstract

Real-time accurate recommendation of large-scale recommender systems is a challenging task. Matrix factorization (MF), as one of the most accurate and scalable techniques to predict missing ratings, has become popular in the collaborative filtering (CF) community. Currently, stochastic gradient descent (SGD) is one of the most famous approaches for MF. However, it is non-trivial to parallelize SGD for large-scale CF MF problems due to the dependence on the user and item pair, which can cause parallelization over-writing. To remove the dependence on the user and item pair, we propose a multi-stream SGD (MSGD) approach, for which the update process is theoretically convergent. On that basis, we propose a <italic>Compute Unified Device Architecture</italic> (CUDA) parallelization MSGD (CUMSGD) approach. CUMSGD can obtain high parallelism and scalability on <inline-formula><tex-math notation="LaTeX">${Graphic \;Processing\; Units}$</tex-math><alternatives><inline-graphic xlink:href="li-ieq1-2718515.gif"/></alternatives> </inline-formula> (GPUs). On Tesla K20m and K40c GPUs, the experimental results show that CUMSGD outperforms prior works that accelerated MF on shared memory systems, e.g., DSGD, FPSGD, Hogwild!, and CCD++. For large-scale CF problems, we propose multiple GPUs (multi-GPU) CUMSGD (MCUMSGD). The experimental results show that MCUMSGD can improve MSGD performance further. With a K20m GPU card, CUMSGD can be 5-10 times as fast compared with the state-of-the-art approaches on shared memory platform.

References

[1]
Y. Cai, H.-F. Leung, Q. Li, H. Min, J. Tang, and J. Li, “ Typicality-based collaborative filtering recommendation,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 3, pp. 766–779, Mar. 2014.
[2]
G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 734–749, Jun. 2005 .
[3]
N. Srebro, J. Rennie, and T. S. Jaakkola, “ Maximum-margin matrix factorization,” in Proc. Advances Neural Inf. Process. Syst. , 2004, pp. 1329–1336.
[4]
Y. Koren, “Factor in the neighbors: Scalable and accurate collaborative filtering,” ACM Trans. Knowl. Discovery Data, vol. 4, no. 1, 2010, Art. no.
[5]
Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, Aug. 2009.
[6]
J. Bennett and S. Lanning, “The netflix prize,” in Proc. KDD Cup Workshop, 2007, vol. 2007, Art. no.
[7]
N. Srebro, “Learning with matrix factorizations,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., Massachusetts Inst. Technol., Cambridge, MA, USA, 2004.
[8]
Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan, “Large-scale parallel collaborative filtering for the netflix prize,” in Algorithmic Aspects in Information and Management . Berlin, Germany: Springer, 2008, pp. 337–348.
[9]
W. Tan, L. Cao, and L. Fong, “Faster and cheaper: Parallelizing large-scale matrix factorization on GPUs,” in Proc. 25th ACM Int. Symp. High-Performance Parallel Distrib. Comput., 2016, pp. 219– 230.
[10]
H.-F. Yu, C.-J. Hsieh, S. Si, and I. Dhillon, “Scalable coordinate descent approaches to parallel matrix factorization for recommender systems,” in Proc. IEEE 12th Int. Conf. Data Mining, 2012, pp. 765–774 .
[11]
H.-F. Yu, C.-J. Hsieh, S. Si, and I. S. Dhillon, “Parallel matrix factorization for recommender systems,” Knowl. Inf. Syst., vol. 41, no. 3, pp. 793–819, 2014.
[12]
C.-J. Hsieh and I. S. Dhillon, “Fast coordinate descent methods with variable selection for non-negative matrix factorization,” in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2011, pp. 1064–1072.
[13]
D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” in Proc. Advances Neural Inf. Process. Syst., 2001, pp. 556–562.
[14]
W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin, “A fast parallel stochastic gradient method for matrix factorization in shared memory systems,” ACM Trans. Intell. Syst. Technol., vol. 6, no. 1, 2015, Art. no.
[15]
W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin, “A learning-rate schedule for stochastic gradient methods to matrix factorization,” in Proc. Pacific-Asia Conf. Knowl. Discovery Data Mining, 2015, pp. 442–455.
[16]
R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis, “Large-scale matrix factorization with distributed stochastic gradient descent,” in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2011, pp. 69–77.
[17]
C. Teflioudi, F. Makari, and R. Gemulla, “ Distributed matrix completion,” in Proc. IEEE 12th Int. Conf. Data Mining, 2012, pp. 655–664.
[18]
F. Niu, B. Recht, C. Re, and S. Wright, “HOGWILD!: A lock-free approach to parallelizing stochastic gradient descent,” in Proc. Advances Neural Inf. Process. Syst., 2011, pp. 693–701.
[19]
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Application of dimensionality reduction in recommender system-A case study,” Minnesota Univ Minneapolis Dept of Computer Science, 2000, http://www.dtic.mil/docs/citations/ADA439541
[20]
M.-F. Weng and Y.-Y. Chuang, “Collaborative video reindexing via matrix factorization,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 8, no. 2, 2012, Art. no.
[21]
Z. Zheng, H. Ma, M. R. Lyu, and I. King, “Collaborative web service QoS prediction via neighborhood integrated matrix factorization,” IEEE Trans. Services Comput., vol. 6, no. 3, pp. 289–299, Jul.–Sep. 2013.
[22]
D. Lian, C. Zhao, X. Xie, G. Sun, E. Chen, and Y. Rui, “ GeoMF: Joint geographical modeling and matrix factorization for point-of-interest recommendation,” in Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2014, pp. 831–840.
[23]
J. Langford, M. Zinkevich, and A. J. Smola, “Slow learners are fast,” in Proc. Advances Neural Inf. Process. Syst., 2009, pp. 2331–2339.
[24]
M. Zinkevich, M. Weimer, L. Li, and A. J. Smola, “Parallelized stochastic gradient descent,” in Proc. Advances Neural Inf. Process. Syst., 2010, pp. 2595–2603.
[25]
A. Agarwal and J. C. Duchi, “Distributed delayed stochastic optimization,” in Proc. Advances Neural Inf. Process. Syst., 2011, pp. 873–881.
[26]
H. Yun, H.-F. Yu, C.-J. Hsieh, S. Vishwanathan, and I. Dhillon, “NOMAD: Non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion,” Proc. VLDB Endowment, vol. 7, no. 11, pp. 975–986, 2014.
[27]
J. Jin, S. Lai, S. Hu, J. Lin, and X. Lin, “GPUSGD: A GPU-accelerated stochastic gradient descent algorithm for matrix factorization,” Concurrency Comput.: Practice Experience, vol. 28, pp. 3844–3865, 2015.
[28]
J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU computing,” Proc. IEEE , vol. 96, no. 5, pp. 879–899, May 2008.
[29]
G. Pratx and L. Xing, “GPU computing in medical physics: A review,” Med. Physics, vol. 38, no. 5, pp. 2685–2697, 2011.
[30]
P. Guo, L. Wang, and P. Chen, “A performance modeling and optimization analysis tool for sparse matrix-vector multiplication on GPUs,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 5, pp. 1112–1123, May 2014.
[31]
K. Li, W. Yang, and K. Li, “Performance analysis and optimization for SpMV on GPU using probabilistic modeling,” IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 1, pp. 196– 205, Jan. 2015.
[32]
Z. Gao, Y. Liang, and Y. Jiang, “Implement of item-based recommendation on GPU,” in Proc. IEEE 2nd Int. Conf. Cloud Comput. Intell. Syst., 2012, vol. 2, pp. 587–590.
[33]
K. Kato and T. Hosino, “Singular value decomposition for collaborative filtering on a GPU,” in Proc. IOP Conf. Series: Mater. Sci. Eng., 2010, pp. 012– 017.
[34]
A. Cano and C. Garcia-Martinez, “100 million dimensions large-scale global optimization using distributed GPU computing,” in Proc. IEEE Congr. Evol. Comput., 2016, pp. 3566–3573.
[35]
W. Yang, K. Li, Z. Mo, and K. Li, “Performance optimization using partitioned SpMV on GPUs and multicore CPUs,” IEEE Trans. Comput., vol. 64, no. 9, pp. 2623–2636, Sep. 2015.
[36]
J. A. Stuart and J. D. Owens, “Multi-GPU MapReduce on GPU clusters,” in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2011, pp. 1068– 1079.
[37]
C. Chen, K. Li, A. Ouyang, and K. Li, “GFlink: An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data,” in Proc. IEEE 45th Int. Conf. Parallel Process., 2016, pp. 542–551.
[38]
L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proc. Int. Conf. Comput. Statist., 2010, pp. 177–186.
[39]
R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using predictive variance reduction,” in Proc. Advances Neural Inf. Process. Syst., 2013, pp. 315 –323.
[40]
T. Zhang, “Solving large scale linear prediction problems using stochastic gradient descent algorithms,” in Proc. 21st Int. Conf. Mach. Learn., 2004, Art. no. 116.
[41]
N. L. Roux, M. Schmidt, and F. R. Bach, “A stochastic gradient method with an exponential convergence _rate for finite training sets,” in Proc. Advances Neural Inf. Process. Syst., 2012, pp. 2663– 2671.
[42]
S. J. Reddi, A. Hefny, S. Sra, B. Poczos, and A. Smola, “Stochastic variance reduction for nonconvex optimization,” in Proc. 33rd Int. Conf. Mach. Learn., 2016, pp. 314– 323.
[43]
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, “Robust stochastic approximation approach to stochastic programming,” SIAM J. Optimization, vol. 19, no. 4, pp. 1574–1609, 2009.
[44]
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Berlin, Germany: Springer, 2013.
[45]
C. Nvidia, “C Programming Guide, v7.,” Oct. 2015.
[46]
S. D. Babacan, M. Luessi, R. Molina, and A. K. Katsaggelos, “Sparse Bayesian methods for low-rank matrix estimation,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 3964–3977, Aug. 2012.
[47]
J. Ye, J.-H. Chow, J. Chen, and Z. Zheng, “Stochastic gradient boosted distributed decision trees,” in Proc. 18th ACM Conf. Inf. Knowl. Manage., 2009, pp. 2061–2064.
[48]
J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with CUDA,” Queue, vol. 6, no. 2, pp. 40 –53, 2008.
[49]
E. Hazan, “Introduction to online convex optimization,” Found. Trends® Optimization, vol. 2, no. 3/4, pp. 157 –325, 2016.

Cited By

View all
  • (2024)Negative Sampling in Next-POI Recommendations: Observation, Approach, and EvaluationProceedings of the ACM Web Conference 202410.1145/3589334.3645681(3888-3899)Online publication date: 13-May-2024
  • (2024)Parallel Fractional Stochastic Gradient Descent With Adaptive Learning for Recommender SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318521235:3(470-483)Online publication date: 1-Mar-2024
  • (2024)Asynchronous Parallel Fuzzy Stochastic Gradient Descent for High-Dimensional Incomplete Data RepresentationIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.330037032:2(445-459)Online publication date: 1-Feb-2024
  • Show More Cited By

Index Terms

  1. MSGD: A Novel Matrix Factorization Approach for Large-Scale Collaborative Filtering Recommender Systems on GPUs
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Parallel and Distributed Systems
    IEEE Transactions on Parallel and Distributed Systems  Volume 29, Issue 7
    July 2018
    236 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 July 2018

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Negative Sampling in Next-POI Recommendations: Observation, Approach, and EvaluationProceedings of the ACM Web Conference 202410.1145/3589334.3645681(3888-3899)Online publication date: 13-May-2024
    • (2024)Parallel Fractional Stochastic Gradient Descent With Adaptive Learning for Recommender SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318521235:3(470-483)Online publication date: 1-Mar-2024
    • (2024)Asynchronous Parallel Fuzzy Stochastic Gradient Descent for High-Dimensional Incomplete Data RepresentationIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.330037032:2(445-459)Online publication date: 1-Feb-2024
    • (2024)Stochastic Gradient Descent for matrix completionKnowledge-Based Systems10.1016/j.knosys.2023.111176283:COnline publication date: 11-Jan-2024
    • (2024)Integrating user-side information into matrix factorization to address data sparsity of collaborative filteringMultimedia Systems10.1007/s00530-024-01261-830:2Online publication date: 18-Feb-2024
    • (2023)HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUsProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593717(324-335)Online publication date: 21-Jun-2023
    • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
    • (2023)UMA-MF: A Unified Multi-CPU/GPU Asynchronous Computing Framework for SGD-Based Matrix FactorizationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.331753534:11(2978-2993)Online publication date: 1-Nov-2023
    • (2023)Generalized zeroing neural dynamics model for online solving time-varying cube roots problem with various external disturbances in different domainsInformation Sciences: an International Journal10.1016/j.ins.2022.12.058623:C(648-670)Online publication date: 1-Apr-2023
    • (2023)Load balanced locality-aware parallel SGD on multicore architectures for latent factor based collaborative filteringFuture Generation Computer Systems10.1016/j.future.2023.04.007146:C(207-221)Online publication date: 1-Sep-2023
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media