Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2939672.2939800acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Efficient Frequent Directions Algorithm for Sparse Matrices

Published: 13 August 2016 Publication History

Abstract

This paper describes Sparse Frequent Directions, a variant of Frequent Directions for sketching sparse matrices. It resembles the original algorithm in many ways: both receive the rows of an input matrix An x d one by one in the streaming setting and compute a small sketch BRl x d. Both share the same strong (provably optimal) asymptotic guarantees with respect to the space-accuracy tradeoff in the streaming setting. However, unlike Frequent Directions which runs in O(ndl) time regardless of the sparsity of the input matrix A, Sparse Frequent Directions runs in Õ(nnz(A)l + nl2) time. Our analysis loosens the dependence on computing the Singular Value Decomposition (SVD) as a black box within the Frequent Directions algorithm. Our bounds require recent results on the properties of fast approximate SVD computations. Finally, we empirically demonstrate that these asymptotic improvements are practical and significant on real and synthetic data.

References

[1]
Nick Asendorf, Madison McGaffin, Matt Prelee, and Ben Schwartz. Algorithms for completing a user ratings matrix.
[2]
Christos Boutsidis, Petros Drineas, and Malik Magdon-Ismail. Near optimal column-based matrix reconstruction. In Foundations of Computer Science, 2011 IEEE 52nd Annual Symposium on, pages 305--314. IEEE, 2011.
[3]
Christos Boutsidis, Michael W Mahoney, and Petros Drineas. An improved approximation algorithm for the column subset selection problem. In Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, 2009.
[4]
Christos Boutsidis and David P Woodruff. Optimal cur matrix decompositions. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, pages 353--362. ACM, 2014.
[5]
Matthew Brand. Incremental singular value decomposition of uncertain data with missing values. In Computer Vision--ECCV 2002, pages 707--720. Springer, 2002.
[6]
Kenneth L. Clarkson and David P. Woodruff. Low rank approximation and regression in input sparsity time. In Proceedings of the 45th Annual ACM Symposium on Symposium on Theory of Computing, 2013.
[7]
Amey Desai, Mina Ghashami, and Jeff M Phillips. Improved practical matrix sketching with guarantees. arXiv preprint arXiv:1501.06561, 2015.
[8]
Amit Deshpande and Santosh Vempala. Adaptive sampling and fast low-rank matrix approximation. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 2006.
[9]
Inderjit S Dhillon and Dharmendra S Modha. Concept decompositions for large sparse text data using clustering. Machine learning, 42(1--2):143--175, 2001.
[10]
Petros Drineas and Ravi Kannan. Pass efficient algorithms for approximating large matrices. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, 2003.
[11]
Petros Drineas, Ravi Kannan, and Michael W. Mahoney. Fast monte carlo algorithms for matrices ii: Computing a low-rank approximation to a matrix. SIAM Journal on Computing, 36(1):158--183, 2006.
[12]
Petros Drineas, Ravi Kannan, and Michael W Mahoney. Fast monte carlo algorithms for matrices iii: Computing a compressed approximate matrix decomposition. SIAM Journal on Computing, 36(1):184--206, 2006.
[13]
Petros Drineas, Iordanis Kerenidis, and Prabhakar Raghavan. Competitive recommendation systems. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 82--90. ACM, 2002.
[14]
Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. Relative-error cur matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844--881, 2008.
[15]
Petros Drineas, Michael W Mahoney, S Muthukrishnan, and Tamás Sarlós. Faster least squares approximation. Numerische Mathematik, 117:219--249, 2011.
[16]
Alan Frieze, Ravi Kannan, and Santosh Vempala. Fast monte-carlo algorithms for finding low-rank approximations. Journal of the ACM (JACM), 51(6):1025--1041, 2004.
[17]
Mina Ghashami, Edo Liberty, Jeff M Phillips, and David P Woodruff. Frequent directions: Simple and deterministic matrix sketching. arXiv preprint arXiv:1501.01711, 2015.
[18]
Mina Ghashami and Jeff M Phillips. Relative errors for deterministic low-rank matrix approximations. In Proceedings of 25th ACM-SIAM Symposium on Discrete Algorithms, 2014.
[19]
Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHUP, 2012.
[20]
Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217--288, 2011.
[21]
Peter Hall, David Marshall, and Ralph Martin. Incremental eigenanalysis for classification. In Proceedings of the British Machine Vision Conference, 1998.
[22]
Ken Lang. Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, pages 331--339, 1995.
[23]
A Levey and Michael Lindenbaum. Sequential karhunen-loeve basis extraction and its application to images. Image Processing, IEEE Transactions on, 9(8):1371--1374, 2000.
[24]
Edo Liberty. Simple and deterministic matrix sketching. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013.
[25]
Edo Liberty. Simple and deterministic matrix sketching. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013.
[26]
Edo Liberty, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert. Randomized algorithms for the low-rank approximation of matrices. Proceedings of the National Academy of Sciences, 104(51):20167--20172, 2007.
[27]
Michael W Mahoney and Petros Drineas. Cur matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697--702, 2009.
[28]
Jiří Matoušek. On variants of the johnson--lindenstrauss lemma. Random Structures & Algorithms, 33(2):142--156, 2008.
[29]
Cameron Musco and Christopher Musco. Stronger approximate singular value decomposition via the block lanczos and power methods. arXiv preprint arXiv:1504.05477, 2015.
[30]
Jelani Nelson and Huy L. Nguyen. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Proceedings of 54th IEEE Symposium on Foundations of Computer Science, 2013.
[31]
Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. Latent semantic indexing: A probabilistic analysis. In Proceedings of the 17th ACM Symposium on Principles of Database Systems, 1998.
[32]
Vladimir Rokhlin, Arthur Szlam, and Mark Tygert. A randomized algorithm for principal component analysis. SIAM Journal on Matrix Analysis and Applications, 31(3):1100--1124, 2009.
[33]
David A Ross, Jongwoo Lim, Ruei-Sung Lin, and Ming-Hsuan Yang. Incremental learning for robust visual tracking. International Journal of Computer Vision, 77(1--3):125--141, 2008.
[34]
Mark Rudelson and Roman Vershynin. Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM, 54(4):21, 2007.
[35]
Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, 2006.
[36]
Santosh S Vempala. The random projection method, volume 65. American Mathematical Soc., 2004.
[37]
Rafi Witten and Emmanuel Candès. Randomized algorithms for low-rank matrix factorizations: sharp performance bounds. Algorithmica, 72(1):264--281, 2013.
[38]
Franco Woolfe, Edo Liberty, Vladimir Rokhlin, and Mark Tygert. A fast randomized algorithm for the approximation of matrices. Applied and Computational Harmonic Analysis, 25(3):335--366, 2008.

Cited By

View all
  • (2024)Quantivine: A Visualization Approach for Large-Scale Quantum Circuit Representation and AnalysisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332714830:1(573-583)Online publication date: 1-Jan-2024
  • (2021)Near optimal frequent directions for sketching dense and sparse matricesThe Journal of Machine Learning Research10.5555/3322706.336199720:1(2018-2040)Online publication date: 9-Mar-2021
  • (2021)A simple proof of a new set disjointness with applications to data streamsProceedings of the 36th Computational Complexity Conference10.4230/LIPIcs.CCC.2021.37Online publication date: 20-Jul-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. frequent directions
  2. matrix sketching
  3. sparse matrix

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '16
Sponsor:

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)26
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Quantivine: A Visualization Approach for Large-Scale Quantum Circuit Representation and AnalysisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332714830:1(573-583)Online publication date: 1-Jan-2024
  • (2021)Near optimal frequent directions for sketching dense and sparse matricesThe Journal of Machine Learning Research10.5555/3322706.336199720:1(2018-2040)Online publication date: 9-Mar-2021
  • (2021)A simple proof of a new set disjointness with applications to data streamsProceedings of the 36th Computational Complexity Conference10.4230/LIPIcs.CCC.2021.37Online publication date: 20-Jul-2021
  • (2021)Link prediction in dynamic networks using random dot product graphsData Mining and Knowledge Discovery10.1007/s10618-021-00784-235:5(2168-2199)Online publication date: 1-Sep-2021
  • (2021)Overview of accurate coresetsWIREs Data Mining and Knowledge Discovery10.1002/widm.142911:6Online publication date: 16-Sep-2021
  • (2020)Data Sketching for Real Time AnalyticsProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3406480(3567-3568)Online publication date: 23-Aug-2020
  • (2020)Tight Sensitivity Bounds For Smaller CoresetsProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403256(2051-2061)Online publication date: 23-Aug-2020
  • (2020)A Fast and Accurate Frequent Directions Algorithm for Low Rank Approximation via Block Krylov IterationICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP40776.2020.9054022(3167-3171)Online publication date: May-2020
  • (2018)Utility-driven graph summarizationProceedings of the VLDB Endowment10.14778/3297753.329775512:4(335-347)Online publication date: 1-Dec-2018
  • (2018)Sketched Follow-The-Regularized-Leader for Online Factorization MachineProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3220044(1900-1909)Online publication date: 19-Jul-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media