Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3159652.3159662acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

DSANLS: Accelerating Distributed Nonnegative Matrix Factorization via Sketching

Published: 02 February 2018 Publication History

Abstract

Nonnegative matrix factorization (NMF) has been successfully applied in different fields, such as text mining, image processing, and video analysis. NMF is the problem of determining two nonnegative low rank matrices U and V, for a given input matrix M, such that m ≈ UV⊥. There is an increasing interest in parallel and distributed NMF algorithms, due to the high cost of centralized NMF on large matrices. In this paper, we propose a distributed sketched alternating nonnegative least squares(DSANLS) framework for NMF, which utilizes a matrix sketching technique to reduce the size of nonnegative least squares subproblems in each iteration for U and V. We design and analyze two different random matrix generation techniques and two subproblem solvers. Our theoretical analysis shows that DSANLS converges to the stationary point of the original NMF problem and it greatly reduces the computational cost in each subproblem as well as the communication cost within the cluster. DSANLS is implemented using MPI for communication, and tested on both dense and sparse real datasets. The results demonstrate the efficiency and scalability of our framework, compared to the state-of-art distributed NMF MPI implementation.

References

[1]
N. Ailon and B. Chazelle. Approximate nearest neighbors and the fast johnson-lindenstrauss transform. In STOC, pages 557--563. ACM, 2006.
[2]
A. B. Chan, V. Mahadevan, and N. Vasconcelos. Generalized stauffer--grimson background subtraction for dynamic scenes. Machine Vision and Applications, 22(5):751--766, 2011.
[3]
M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. Theoretical Computer Science, 312(1):3--15, 2004.
[4]
A. Cichocki and P. Anh-Huy. Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE transactions on fundamentals of electronics, communications and computer sciences, 92(3):708--721, 2009.
[5]
K. L. Clarkson and D. P. Woodruff. Low rank approximation and regression in input sparsity time. In STOC, pages 81--90. ACM, 2013.
[6]
M. E. Daube-Witherspoon and G. Muehllehner. An iterative image space reconstruction algorthm suitable for volume ect. IEEE transactions on medical imaging, 5(2):61--66, 1986.
[7]
J. P. Fairbanks, R. Kannan, H. Park, and D. A. Bader. Behavioral clusters in dynamic graphs. Parallel Computing, 47:38--50, 2015.
[8]
N. Gillis. The why and how of nonnegative matrix factorization. Regularization, Optimization, Kernels, and Support Vector Machines, 12(257), 2014.
[9]
R. M. Gower and P. Richtárik. Randomized iterative methods for linear systems. SIAM Journal on Matrix Analysis and Applications, 36(4):1660--1690, 2015.
[10]
L. Grippo and M. Sciandrone. On the convergence of the block nonlinear gauss--seidel method under convex constraints. Operations research letters, 26(3):127--136, 2000.
[11]
D. Grove, J. Milthorpe, and O. Tardieu. Supporting array programming in x10. In ARRAY, page 38. ACM, 2014.
[12]
N. Guan, D. Tao, Z. Luo, and B. Yuan. Nenmf: an optimal gradient method for nonnegative matrix factorization. IEEE Transactions on Signal Processing, 60(6):2882--2898, 2012.
[13]
K. Kanjani. Parallel non negative matrix factorization for document clustering. CPSC-659 (Parallel and Distributed Numerical Algorithms) course. Texas A&M University, Tech. Rep, 2007.
[14]
R. Kannan, G. Ballard, and H. Park. A high-performance parallel algorithm for nonnegative matrix factorization. In PPoPP, page 9. ACM, 2016.
[15]
R. Kannan, G. Ballard, and H. Park. Mpi-faun: An mpi-based framework for alternating-updating nonnegative matrix factorization. arXiv preprint arXiv:1609.09154, 2016.
[16]
H. Kim and H. Park. Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM journal on matrix analysis and applications, 30(2):713--730, 2008.
[17]
J. Kim, Y. He, and H. Park. Algorithms for nonnegative matrix and tensor factorizations: A unified view based on block coordinate descent framework. Journal of Global Optimization, 58(2):285--319, 2014.
[18]
J. Kim and H. Park. Fast nonnegative matrix factorization: An active-set-like method and comparisons. SIAM Journal on Scientific Computing, 33(6):3261--3281, 2011.
[19]
I. Kotsia, S. Zafeiriou, and I. Pitas. A novel discriminant non-negative matrix factorization algorithm with applications to facial image characterization problems. IEEE Transactions on Information Forensics and Security, 2(3):588--595, 2007.
[20]
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788--791, 1999.
[21]
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, pages 556--562, 2001.
[22]
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. JMLR, 5(Apr):361--397, 2004.
[23]
R. Liao, Y. Zhang, J. Guan, and S. Zhou. Cloudnmf: a mapreduce implementation of nonnegative matrix factorization for large-scale biological datasets. Genomics, proteomics & bioinformatics, 12(1):48--51, 2014.
[24]
C.-J. Lin. Projected gradient methods for nonnegative matrix factorization. Neural computation, 19(10):2756--2779, 2007.
[25]
C. Liu, H.-c. Yang, J. Fan, L.-W. He, and Y.-M. Wang. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In WWW, pages 681--690. ACM, 2010.
[26]
Y. Lu, P. Dhillon, D. P. Foster, and L. Ungar. Faster ridge regression via the subsampled randomized hadamard transform. In NIPS, pages 369--377, 2013.
[27]
E. Mej'ıa-Roa, D. Tabas-Madrid, J. Setoain, C. Garc'ıa, F. Tirado, and A. Pascual-Montano. Nmf-mgpu: non-negative matrix factorization on multi-gpu systems. BMC bioinformatics, 16(1):1, 2015.
[28]
X. Meng, J. Bradley, B. Yuvaz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, et al. Mllib: Machine learning in apache spark. JMLR, 17(34):1--7, 2016.
[29]
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19(4):1574--1609, 2009.
[30]
V. P. Pauca, F. Shahnaz, M. W. Berry, and R. J. Plemmons. Text mining using non-negative matrix factorizations. In SDM, pages 452--456. SIAM, 2004.
[31]
N. Pham and R. Pagh. Fast and scalable polynomial kernels via explicit feature maps. In ACM SIGKDD, pages 239--247. ACM, 2013.
[32]
M. Pilanci and M. J. Wainwright. Iterative hessian sketch: Fast and accurate solution approximation for constrained least-squares. JMLR, pages 1--33, 2015.
[33]
M. Pilanci and M. J. Wainwright. Newton sketch: A linear-time optimization algorithm with linear-quadratic convergence. arXiv preprint arXiv:1505.02250, 2015.
[34]
I. Psorakis, S. Roberts, M. Ebden, and B. Sheldon. Overlapping community detection using bayesian non-negative matrix factorization. Physical Review E, 83(6):066114, 2011.
[35]
Y. Qian, C. Tan, N. Mamoulis, and D. W. Cheung. Dsanls: Accelerating distributed nonnegative matrix factorization via sketching. Technical report, HKU CS, 2017.
[36]
S. A. Robila and L. G. Maciak. A parallel unmixing algorithm for hyperspectral images. In Optics East, pages 63840F--63840F. International Society for Optics and Photonics, 2006.
[37]
R. T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM journal on control and optimization, 14(5):877--898, 1976.
[38]
F. Wang and P. Li. Efficient nonnegative matrix factorization with random projections. In SDM, pages 281--292. SIAM, 2010.
[39]
W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In SIGIR, pages 267--273. ACM, 2003.
[40]
J. Yin, L. Gao, and Z. M. Zhang. Scalable nonnegative matrix factorization with block-wise updates. In ECML-PKDD, pages 337--352. Springer, 2014.
[41]
R. Zdunek and A. Cichocki. Non-negative matrix factorization with quasi-newton optimization. In ICAISC, pages 870--879. Springer, 2006.

Cited By

View all
  • (2022)Fast and Secure Distributed Nonnegative Matrix FactorizationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.298596434:2(653-666)Online publication date: 1-Feb-2022
  • (2022)Probabilistic methods for approximate archetypal analysisInformation and Inference: A Journal of the IMA10.1093/imaiai/iaac00812:1(466-493)Online publication date: 12-May-2022

Index Terms

  1. DSANLS: Accelerating Distributed Nonnegative Matrix Factorization via Sketching

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining
    February 2018
    821 pages
    ISBN:9781450355810
    DOI:10.1145/3159652
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 February 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    • European Union's Horizon 2020 research and innovation programme
    • Hong Kong Research Grant Council

    Conference

    WSDM 2018

    Acceptance Rates

    WSDM '18 Paper Acceptance Rate 81 of 514 submissions, 16%;
    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Fast and Secure Distributed Nonnegative Matrix FactorizationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.298596434:2(653-666)Online publication date: 1-Feb-2022
    • (2022)Probabilistic methods for approximate archetypal analysisInformation and Inference: A Journal of the IMA10.1093/imaiai/iaac00812:1(466-493)Online publication date: 12-May-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media