research-article

Public Access

Efficient Frequent Directions Algorithm for Sparse Matrices

Authors:

Jeff M. PhillipsAuthors Info & Claims

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 845 - 854

https://doi.org/10.1145/2939672.2939800

Published: 13 August 2016 Publication History

Abstract

This paper describes Sparse Frequent Directions, a variant of Frequent Directions for sketching sparse matrices. It resembles the original algorithm in many ways: both receive the rows of an input matrix A^{n x d} one by one in the streaming setting and compute a small sketch B ∈ R^{l x d}. Both share the same strong (provably optimal) asymptotic guarantees with respect to the space-accuracy tradeoff in the streaming setting. However, unlike Frequent Directions which runs in O(ndl) time regardless of the sparsity of the input matrix A, Sparse Frequent Directions runs in Õ(nnz(A)l + nl²) time. Our analysis loosens the dependence on computing the Singular Value Decomposition (SVD) as a black box within the Frequent Directions algorithm. Our bounds require recent results on the properties of fast approximate SVD computations. Finally, we empirically demonstrate that these asymptotic improvements are practical and significant on real and synthetic data.

References

[1]

Nick Asendorf, Madison McGaffin, Matt Prelee, and Ben Schwartz. Algorithms for completing a user ratings matrix.

[2]

Christos Boutsidis, Petros Drineas, and Malik Magdon-Ismail. Near optimal column-based matrix reconstruction. In Foundations of Computer Science, 2011 IEEE 52nd Annual Symposium on, pages 305--314. IEEE, 2011.

Digital Library

[3]

Christos Boutsidis, Michael W Mahoney, and Petros Drineas. An improved approximation algorithm for the column subset selection problem. In Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, 2009.

Digital Library

[4]

Christos Boutsidis and David P Woodruff. Optimal cur matrix decompositions. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, pages 353--362. ACM, 2014.

Digital Library

[5]

Matthew Brand. Incremental singular value decomposition of uncertain data with missing values. In Computer Vision--ECCV 2002, pages 707--720. Springer, 2002.

Digital Library

[6]

Kenneth L. Clarkson and David P. Woodruff. Low rank approximation and regression in input sparsity time. In Proceedings of the 45th Annual ACM Symposium on Symposium on Theory of Computing, 2013.

Digital Library

[7]

Amey Desai, Mina Ghashami, and Jeff M Phillips. Improved practical matrix sketching with guarantees. arXiv preprint arXiv:1501.06561, 2015.

[8]

Amit Deshpande and Santosh Vempala. Adaptive sampling and fast low-rank matrix approximation. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 2006.

Digital Library

[9]

Inderjit S Dhillon and Dharmendra S Modha. Concept decompositions for large sparse text data using clustering. Machine learning, 42(1--2):143--175, 2001.

[10]

Petros Drineas and Ravi Kannan. Pass efficient algorithms for approximating large matrices. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, 2003.

Digital Library

[11]

Petros Drineas, Ravi Kannan, and Michael W. Mahoney. Fast monte carlo algorithms for matrices ii: Computing a low-rank approximation to a matrix. SIAM Journal on Computing, 36(1):158--183, 2006.

Digital Library

[12]

Petros Drineas, Ravi Kannan, and Michael W Mahoney. Fast monte carlo algorithms for matrices iii: Computing a compressed approximate matrix decomposition. SIAM Journal on Computing, 36(1):184--206, 2006.

Digital Library

[13]

Petros Drineas, Iordanis Kerenidis, and Prabhakar Raghavan. Competitive recommendation systems. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 82--90. ACM, 2002.

Digital Library

[14]

Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. Relative-error cur matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844--881, 2008.

Digital Library

[15]

Petros Drineas, Michael W Mahoney, S Muthukrishnan, and Tamás Sarlós. Faster least squares approximation. Numerische Mathematik, 117:219--249, 2011.

Digital Library

[16]

Alan Frieze, Ravi Kannan, and Santosh Vempala. Fast monte-carlo algorithms for finding low-rank approximations. Journal of the ACM (JACM), 51(6):1025--1041, 2004.

Digital Library

[17]

Mina Ghashami, Edo Liberty, Jeff M Phillips, and David P Woodruff. Frequent directions: Simple and deterministic matrix sketching. arXiv preprint arXiv:1501.01711, 2015.

[18]

Mina Ghashami and Jeff M Phillips. Relative errors for deterministic low-rank matrix approximations. In Proceedings of 25th ACM-SIAM Symposium on Discrete Algorithms, 2014.

Digital Library

[19]

Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHUP, 2012.

[20]

Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217--288, 2011.

Digital Library

[21]

Peter Hall, David Marshall, and Ralph Martin. Incremental eigenanalysis for classification. In Proceedings of the British Machine Vision Conference, 1998.

[22]

Ken Lang. Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, pages 331--339, 1995.

[23]

A Levey and Michael Lindenbaum. Sequential karhunen-loeve basis extraction and its application to images. Image Processing, IEEE Transactions on, 9(8):1371--1374, 2000.

Digital Library

[24]

Edo Liberty. Simple and deterministic matrix sketching. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013.

Digital Library

[25]

Edo Liberty. Simple and deterministic matrix sketching. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013.

Digital Library

[26]

Edo Liberty, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert. Randomized algorithms for the low-rank approximation of matrices. Proceedings of the National Academy of Sciences, 104(51):20167--20172, 2007.

[27]

Michael W Mahoney and Petros Drineas. Cur matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697--702, 2009.

[28]

Jiří Matoušek. On variants of the johnson--lindenstrauss lemma. Random Structures & Algorithms, 33(2):142--156, 2008.

Digital Library

[29]

Cameron Musco and Christopher Musco. Stronger approximate singular value decomposition via the block lanczos and power methods. arXiv preprint arXiv:1504.05477, 2015.

[30]

Jelani Nelson and Huy L. Nguyen. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Proceedings of 54th IEEE Symposium on Foundations of Computer Science, 2013.

Digital Library

[31]

Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. Latent semantic indexing: A probabilistic analysis. In Proceedings of the 17th ACM Symposium on Principles of Database Systems, 1998.

Digital Library

[32]

Vladimir Rokhlin, Arthur Szlam, and Mark Tygert. A randomized algorithm for principal component analysis. SIAM Journal on Matrix Analysis and Applications, 31(3):1100--1124, 2009.

Digital Library

[33]

David A Ross, Jongwoo Lim, Ruei-Sung Lin, and Ming-Hsuan Yang. Incremental learning for robust visual tracking. International Journal of Computer Vision, 77(1--3):125--141, 2008.

Digital Library

[34]

Mark Rudelson and Roman Vershynin. Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM, 54(4):21, 2007.

Digital Library

[35]

Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, 2006.

Digital Library

[36]

Santosh S Vempala. The random projection method, volume 65. American Mathematical Soc., 2004.

[37]

Rafi Witten and Emmanuel Candès. Randomized algorithms for low-rank matrix factorizations: sharp performance bounds. Algorithmica, 72(1):264--281, 2013.

Digital Library

[38]

Franco Woolfe, Edo Liberty, Vladimir Rokhlin, and Mark Tygert. A fast randomized algorithm for the approximation of matrices. Applied and Computational Harmonic Analysis, 25(3):335--366, 2008.

Cited By

Wen ZLiu YTan SChen JZhu MHan DYin JXu MChen W(2024)Quantivine: A Visualization Approach for Large-Scale Quantum Circuit Representation and AnalysisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332714830:1(573-583)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TVCG.2023.3327148
Huang Z(2021)Near optimal frequent directions for sketching dense and sparse matricesThe Journal of Machine Learning Research10.5555/3322706.336199720:1(2018-2040)Online publication date: 9-Mar-2021
https://dl.acm.org/doi/10.5555/3322706.3361997
Kamath APrice EWoodruff D(2021)A simple proof of a new set disjointness with applications to data streamsProceedings of the 36th Computational Complexity Conference10.4230/LIPIcs.CCC.2021.37Online publication date: 20-Jul-2021
https://dl.acm.org/doi/10.4230/LIPIcs.CCC.2021.37
Show More Cited By

Index Terms

Efficient Frequent Directions Algorithm for Sparse Matrices
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Factorization methods
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Near optimal frequent directions for sketching dense and sparse matrices

Given a large matrix A ∈ ℝ^n×d, we consider the problem of computing a sketch matrix B ∈ ℝ^l×d which is significantly smaller than but still well approximates A. We consider the problems in the streaming model, where the algorithm can only make one pass ...
``Compress and Eliminate” Solver for Symmetric Positive Definite Sparse Matrices

We propose a new approximate factorization for solving linear systems with symmetric positive definite sparse matrices. In a nutshell the algorithm applies hierarchically block Gaussian elimination and additionally compresses the fill-in. The systems that ...
Sparse Multifrontal Rank Revealing QR Factorization

We describe an algorithm to compute an approximate rank revealing sparse QR factorization. We use a two phase algorithm to provide especially high accuracy in the labeling of some columns as ``redundant,'' which ensures robustness in the use of our ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2016

2176 pages

ISBN:9781450342322

DOI:10.1145/2939672

General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF

Conference

KDD '16

Sponsor:

KDD '16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 13 - 17, 2016

California, San Francisco, USA

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
449
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)26

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wen ZLiu YTan SChen JZhu MHan DYin JXu MChen W(2024)Quantivine: A Visualization Approach for Large-Scale Quantum Circuit Representation and AnalysisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332714830:1(573-583)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TVCG.2023.3327148
Huang Z(2021)Near optimal frequent directions for sketching dense and sparse matricesThe Journal of Machine Learning Research10.5555/3322706.336199720:1(2018-2040)Online publication date: 9-Mar-2021
https://dl.acm.org/doi/10.5555/3322706.3361997
Kamath APrice EWoodruff D(2021)A simple proof of a new set disjointness with applications to data streamsProceedings of the 36th Computational Complexity Conference10.4230/LIPIcs.CCC.2021.37Online publication date: 20-Jul-2021
https://dl.acm.org/doi/10.4230/LIPIcs.CCC.2021.37
Sanna Passino FBertiger ANeil JHeard N(2021)Link prediction in dynamic networks using random dot product graphsData Mining and Knowledge Discovery10.1007/s10618-021-00784-235:5(2168-2199)Online publication date: 1-Sep-2021
https://dl.acm.org/doi/10.1007/s10618-021-00784-2
Jubran IMaalouf AFeldman D(2021)Overview of accurate coresetsWIREs Data Mining and Knowledge Discovery10.1002/widm.142911:6Online publication date: 16-Sep-2021
https://doi.org/10.1002/widm.1429
Ting DMalkin JRhodes LGupta RLiu YShah MRajan STang JPrakash B(2020)Data Sketching for Real Time AnalyticsProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3406480(3567-3568)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1145/3394486.3406480
Maalouf AStatman AFeldman DGupta RLiu YShah MRajan STang JPrakash B(2020)Tight Sensitivity Bounds For Smaller CoresetsProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403256(2051-2061)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1145/3394486.3403256
Yi QWang CLiao XWang Y(2020)A Fast and Accurate Frequent Directions Algorithm for Low Rank Approximation via Block Krylov IterationICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP40776.2020.9054022(3167-3171)Online publication date: May-2020
https://doi.org/10.1109/ICASSP40776.2020.9054022
Kumar KEfstathopoulos P(2018)Utility-driven graph summarizationProceedings of the VLDB Endowment10.14778/3297753.329775512:4(335-347)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.14778/3297753.3297755
Luo LZhang WZhang ZZhu WZhang TPei JGuo YFarooq F(2018)Sketched Follow-The-Regularized-Leader for Online Factorization MachineProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3220044(1900-1909)Online publication date: 19-Jul-2018
https://dl.acm.org/doi/10.1145/3219819.3220044
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents