Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2915228acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Matrix Sketching Over Sliding Windows

Published: 14 June 2016 Publication History

Abstract

Large-scale matrix computation becomes essential for many data data applications, and hence the problem of sketching matrix with small space and high precision has received extensive study for the past few years. This problem is often considered in the row-update streaming model, where the data set is a matrix A -- Rn x d, and the processor receives a row (1 x d) of A at each timestamp. The goal is to maintain a smaller matrix (termed approximation matrix, or simply approximation) B -- Rl x d as an approximation to A, such that the covariance error |AT A - BTB| is small and l ll n.
This paper studies continuous tracking approximations to the matrix defined by a sliding window of most recent rows. We consider both sequence-based and time-based window. We show that maintaining ATA exactly requires linear space in the sliding window model, as opposed to O(d2) space in the streaming model. With this observation, we present three general frameworks for matrix sketching on sliding windows. The sampling techniques give random samples of the rows in the window according to their squared norms. The Logarithmic Method converts a mergeable streaming matrix sketch into a matrix sketch on time-based sliding windows. The Dyadic Interval framework converts arbitrary streaming matrix sketch into a matrix sketch on sequence-based sliding windows. In addition to proving all algorithmic properties theoretically, we also conduct extensive empirical study with real data sets to demonstrate the efficiency of these algorithms.

References

[1]
D. Achlioptas. Database-friendly random projections. In PODS, pages 274--281. ACM, 2001.
[2]
P. K. Agarwal, G. Cormode, Z. Huang, J. M. Phillips, Z. Wei, and K. Yi. Mergeable summaries. TODS, 38(4):26, 2013.
[3]
A. Arasu and G. S. Manku. Approximate counts and quantiles over sliding windows. In PODS, pages 286--296. ACM, 2004.
[4]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS, pages 1--16. ACM, 2002.
[5]
B. Babcock, M. Datar, and R. Motwani. Sampling from a moving window over streaming data. In SODA, pages 633--634. SIAM, 2002.
[6]
R. Badeau, G. Richard, and B. David. Sliding window adaptive svd algorithms. IEEE Transactions on Signal Processing, 52(1):1--10, 2004.
[7]
C. Boutsidis, D. Garber, Z. Karnin, and E. Liberty. Online principal components analysis. In SODA, pages 887--901. SIAM, 2015.
[8]
K. L. Clarkson and D. P. Woodruff. Low rank approximation and regression in input sparsity time. In STOC, pages 81--90, 2013.
[9]
G. Cormode, M. Garofalakis, P. J. Haas, and C. Jermaine. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases, 4(1--3):1--294, 2012.
[10]
G. Cormode, S. Muthukrishnan, K. Yi, and Q. Zhang. Continuous sampling from distributed streams. JACM, 59(2):10, 2012.
[11]
M. Datar, A. Gionis, P. INDYK, and R. Motwani. Maintaining Stream Statistics over Sliding Windows. SIAM Journal on Computing, 31(6):1794--1813, Jan. 2002.
[12]
A. Deshpande and L. Rademacher. Efficient volume sampling for row/column subset selection. In FOCS, pages 329--338, 2010.
[13]
P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering large graphs via the singular value decomposition. Machine learning, 56(1--3):9--33, 2004.
[14]
P. Drineas and R. Kannan. Pass efficient algorithms for approximating large matrices. In SODA, pages 223--232, 2003.
[15]
P. Drineas, R. Kannan, and M. W. Mahoney. Fast monte carlo algorithms for matrices i: Approximating matrix multiplication. SIAM Journal on Computing, 36(1):132--157, 2006.
[16]
P. S. Efraimidis and P. G. Spirakis. Weighted random sampling with a reservoir. Information Processing Letters, 97(5):181--185, 2006.
[17]
A. Frieze, R. Kannan, and S. Vempala. Fast monte-carlo algorithms for finding low-rank approximations. JACM, 51(6):1025--1041, 2004.
[18]
R. Gemulla and W. Lehner. Sampling time-based sliding windows in bounded space. In SIGMOD, pages 379--392. ACM, 2008.
[19]
M. Ghashami, A. Desai, and J. M. Phillips. Improved practical matrix sketching with guarantees. In ESA, pages 467--479. Springer, 2014.
[20]
M. Ghashami and J. M. Phillips. Relative errors for deterministic low-rank matrix approximations. In SODA, pages 707--717. SIAM, 2014.
[21]
M. Ghashami, J. M. Phillips, and F. Li. Continuous matrix approximation on distributed data. VLDB, 7(10):809--820, 2014.
[22]
Y. Jiao. Maintaining stream statistics over multiscale sliding windows. TODS, 31(4):1305--1334, 2006.
[23]
A. Lakhina, M. Crovella, and C. Diot. Diagnosing network-wide traffic anomalies. In ACM SIGCOMM Computer Communication Review, pages 219--230. ACM, 2004.
[24]
A. Lakhina, K. Papagiannaki, M. Crovella, C. Diot, E. D. Kolaczyk, and N. Taft. Structural analysis of network traffic flows. In SIGMETRICS, 2004.
[25]
E. Liberty. Simple and deterministic matrix sketching. In SIGKDD, pages 581--588. ACM, 2013.
[26]
Z. Longbo, L. Zhanhuai, Z. Yiqiang, Y. Min, and Z. Yang. A priority random sampling algorithm for time-based sliding windows over weighted streaming data. In SAC, pages 453--456. ACM, 2007.
[27]
G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In VLDB, pages 346--357, 2002.
[28]
J. Misra and D. Gries. Finding repeated elements. Science of computer programming, 2:143--152, 1982.
[29]
C. Papadimitriou, P. Drineas, and M. Magdon-Ismail. Near optimal column-based matrix reconstruction. In FOCS, pages 305--314. IEEE, 2011.
[30]
S. Papadimitriou, J. Sun, and C. Faloutsos. Streaming pattern discovery in multiple time-series. In VLDB, pages 697--708, 2005.
[31]
S. Papadimitriou and P. Yu. Optimal multi-scale patterns in time series streams. In SIGMOD, pages 647--658. ACM, 2006.
[32]
O. Papapetrou, M. Garofalakis, and A. Deligiannakis. Sketching distributed sliding-window data streams. The VLDB Journal, 24(3):345--368, 2015.
[33]
A. A. Qahtan, B. Alharbi, S. Wang, and X. Zhang. A pca-based change detection framework for multidimensional data streams: Change detection in multidimensional data streams. In SIGKDD, pages 935--944. ACM, 2015.
[34]
T. Sarlos. Improved approximation algorithms for large matrices via random projections. In FOCS, pages 143--152. IEEE, 2006.
[35]
Q. Song, J. Cheng, and H. Lu. Incremental matrix factorization via feature space re-learning for recommender system. In ACM Conference on Recommender Systems, pages 277--280. ACM, 2015.
[36]
S. K. Tanbeer, C. F. Ahmed, B.-S. Jeong, and Y.-K. Lee. Sliding window-based frequent pattern mining over data streams. Information sciences, 179(22):3843--3865, 2009.
[37]
Y. Tao and D. Papadias. Maintaining sliding window skylines on data streams. TKDE, 18(3):377--391, 2006.
[38]
K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In ICML, pages 1113--1120. ACM, 2009.
[39]
D. P. Woodruff. Sketching as a tool for numerical linear algebra. Theoretical Computer Science, 10(1--2):1--157, 2014.

Cited By

View all
  • (2024)Approximate Matrix Multiplication over Sliding WindowsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671819(3896-3906)Online publication date: 25-Aug-2024
  • (2023)Near-optimal k-clustering in the sliding window modelProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667116(22934-22960)Online publication date: 10-Dec-2023
  • (2022)Truly Perfect Samplers for Data Streams and Sliding WindowsProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3524139(29-40)Online publication date: 12-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. matrix sketching
  2. sliding window

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)3
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Approximate Matrix Multiplication over Sliding WindowsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671819(3896-3906)Online publication date: 25-Aug-2024
  • (2023)Near-optimal k-clustering in the sliding window modelProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667116(22934-22960)Online publication date: 10-Dec-2023
  • (2022)Truly Perfect Samplers for Data Streams and Sliding WindowsProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3524139(29-40)Online publication date: 12-Jun-2022
  • (2022)Tight Bounds for Adversarially Robust Streams and Sliding Windows via Difference Estimators2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS52979.2021.00116(1183-1196)Online publication date: Feb-2022
  • (2022)Fast and accurate stream processing by filtering the coldThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00560-128:5(735-763)Online publication date: 11-Mar-2022
  • (2021)Near optimal frequent directions for sketching dense and sparse matricesThe Journal of Machine Learning Research10.5555/3322706.336199720:1(2018-2040)Online publication date: 9-Mar-2021
  • (2021)Symmetric Norm Estimation and Regression on Sliding WindowsComputing and Combinatorics10.1007/978-3-030-89543-3_44(528-539)Online publication date: 24-Oct-2021
  • (2020)Distributed Principal Component Analysis for Real-time Big Data ProcessingProceedings of the 7th International Conference on Networking, Systems and Security10.1145/3428363.3428369(89-99)Online publication date: 22-Dec-2020
  • (2020)Near Optimal Linear Algebra in the Online and Sliding Window Models2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS46700.2020.00055(517-528)Online publication date: Nov-2020
  • (2020)Collective spatial keyword search on activity trajectoriesGeoinformatica10.1007/s10707-019-00358-x24:1(61-84)Online publication date: 1-Jan-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media