Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3310435.3310545acmotherconferencesArticle/Chapter ViewAbstractPublication PagessodaConference Proceedingsconference-collections
research-article

Tight bounds for lp oblivious subspace embeddings

Published: 06 January 2019 Publication History

Abstract

An lp oblivious subspace embedding is a distribution over r x n matrices Π such that for any fixed n x d matrix A,
Pr[for all x, ||Ax||p ≤||ΠAx||pk||Ax||p] ≥ 9/10,Π
where r is the dimension of the embedding, k is the distortion of the embedding, and for an n-dimensional vector y, ||y||p = (Σni=1 |yi|)1/p is the lp-norm. Another important property is the sparsity of Π, that is, the maximum number of non-zero entries per column, as this determines the running time of computing Π · A. While for p = 2 there are nearly optimal tradeoffs in terms of the dimension, distortion, and sparsity, for the important case of 1 ≤ p < 2, much less was known. In this paper we obtain nearly optimal tradeoffs for lp oblivious subspace embeddings for every 1 ≤ p < 2. Our main results are as follows:
1. We show for every 1 ≤ p < 2, any oblivious subspace embedding with dimension r has distortion k = [MATH HERE]. When r = poly(d) ⩽ n in applications, this gives a k = Ω(d1/p log−2/p d) lower bound, and shows the oblivious subspace embedding of Sohler and Woodruff (STOC, 2011) for p = 1 and the oblivious subspace embedding of Meng and Mahoney (STOC, 2013) for 1 < p < 2 are optimal up to poly(log(d)) factors.
2. We give sparse oblivious subspace embeddings for every 1 ≤ p < 2 which are optimal in dimension and distortion, up to poly(log d) factors. Importantly for p = 1, we achieve r = O(d log d), k = O(d log d) and s = O(log d) non-zero entries per column. The best previous construction with s ≤ poly(log d) is due to Woodruff and Zhang (COLT, 2013), giving k = Ω(d2poly(log d)) or [MATH HERE] and rd · poly(log d); in contrast our r = O(d log d) and k = O(d log d) are optimal up to poly(log(d)) factors even for dense matrices.
We also give (1) nearly-optimal lp oblivious subspace embeddings with an expected 1 + ε number of non-zero entries per column for arbitrarily small ε > 0, and (2) the first oblivious subspace embeddings for 1 ≤ p < 2 with O(1)-distortion and dimension independent of n. Oblivious subspace embeddings are crucial for distributed and streaming environments, as well as entrywise lp low rank approximation. Our results give improved algorithms for these applications.

References

[1]
Nir Ailon and Bernard Chazelle. Approximate nearest neighbors and the fast johnson-lindenstrauss transform. In Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, pages 557--563. ACM, 2006.
[2]
Alexandr Andoni. High frequency moments via max-stability. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, pages 6364--6368. IEEE, 2017.
[3]
Alexandr Andoni, Khanh Do Ba, Piotr Indyk, and David Woodruff. Efficient sketches for earth-mover distance, with applications. In Foundations of Computer Science, 2009. FOCS'09. 50th Annual IEEE Symposium on, pages 324--330. IEEE, 2009.
[4]
Herman Auerbach. On the area of convex curves with conjugate diameters. PhD thesis, PhD thesis, University of Lwów, 1930.
[5]
J. Bourgain, J. Lindenstrauss, and V. Milman. Approximation of zonoids by zonotopes. Acta mathematica, 162(1):73--141, 1989.
[6]
Jean Bourgain, Sjoerd Dirksen, and Jelani Nelson. Toward a unified theory of sparse dimensionality reduction in euclidean space. Geometric and Functional Analysis, 25(4):1009--1088, 2015.
[7]
Bo Brinkman and Moses Charikar. On the impossibility of dimension reduction in l<sub>1</sub>. Journal of the ACM (JACM), 52(5):766--788, 2005.
[8]
Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. Automata, languages and programming, pages 784--784, 2002.
[9]
Moses Charikar and Amit Sahai. Dimension reduction in the l<sub>1</sub> norm. In Foundations of Computer Science, 2002. Proceedings. The 43rd Annual IEEE Symposium on, pages 551--560. IEEE, 2002.
[10]
Kenneth L. Clarkson, Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, Xiangrui Meng, and David P. Woodruff. The fast cauchy transform and faster robust linear regression. SIAM Journal on Computing, 45(3):763--810, 2016.
[11]
Kenneth L. Clarkson and David P. Woodruff. Numerical linear algebra in the streaming model. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pages 205--214, 2009.
[12]
Kenneth L. Clarkson and David P. Woodruff. Low rank approximation and regression in input sparsity time. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 81--90. ACM, 2013.
[13]
Michael B. Cohen. Nearly tight oblivious subspace embeddings by trace inequalities. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pages 278--287. Society for Industrial and Applied Mathematics, 2016.
[14]
Michael B. Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, and Aaron Sidford. Uniform sampling for matrix approximation. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pages 181--190. ACM, 2015.
[15]
Michael B. Cohen and Richard Peng. l<sub>p</sub> row sampling by lewis weights. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 183--192. ACM, 2015.
[16]
Anirban Dasgupta, Petros Drineas, Boulos Harb, Ravi Kumar, and Michael W. Mahoney. Sampling algorithms and coresets for l<sub>p</sub> regression. SIAM Journal on Computing, 38(5):2060--2078, 2009.
[17]
Devdatt Dubhashi and Desh Ranjan. Balls and bins: A study in negative dependence. BRICS Report Series, 3(25), 1996.
[18]
James R. Lee and Assaf Naor. Embedding the diamond graph in L<sub>p</sub> and dimension reduction in L<sub>1</sub>. Geometric and Functional Analysis, 14(4):745--747, 2004.
[19]
Mu Li, Gary L. Miller, and Richard Peng. Iterative row sampling. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 127--136. IEEE, 2013.
[20]
Michael W. Mahoney. Randomized algorithms for matrices and data. Foundations and Trends® in Machine Learning, 3(2):123--224, 2011.
[21]
Andreas Maurer. A bound on the deviation probability for sums of non-negative random variables. J. Inequalities in Pure and Applied Mathematics, 4(1):15, 2003.
[22]
Xiangrui Meng and Michael W. Mahoney. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 91--100. ACM, 2013.
[23]
Jelani Nelson and Huy L. Nguyễn. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 117--126. IEEE, 2013.
[24]
Jelani Nelson and Huy L. Nguyễn. Sparsity lower bounds for dimensionality reducing maps. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 101--110. ACM, 2013.
[25]
Jelani Nelson and Huy L. Nguyễn. Lower bounds for oblivious subspace embeddings. In International Colloquium on Automata, Languages, and Programming, pages 883--894. Springer, 2014.
[26]
J. P. Nolan. Stable Distributions - Models for Heavy Tailed Data. Birkhauser, Boston, 2018. In progress, Chapter 1 online at http://fs2.american.edu/jpnolan/www/stable/stable.html.
[27]
Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on, pages 143--152. IEEE, 2006.
[28]
Christian Sohler and David P. Woodruff. Subspace embeddings for the l1-norm with applications. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 755--764. ACM, 2011.
[29]
Zhao Song, David P. Woodruff, and Peilin Zhong. Low rank approximation with entrywise l<sub>1</sub>-norm error. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 688--701. ACM, 2017.
[30]
Ruosong Wang and David P. Woodruff. Tight bounds for l<sub>p</sub> oblivious subspace embeddings. arXiv preprint arXiv:1801.04414, 2018.
[31]
David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 10(1--2):1--157, 2014.
[32]
David P. Woodruff and Qin Zhang. Subspace embeddings and l<sub>p</sub>-regression using exponential random variables. In Conference on Learning Theory, pages 546--567, 2013.
[33]
Jiyan Yang, Xiangrui Meng, and Michael Mahoney. Quantile regression for large-scale applications. In International Conference on Machine Learning, pages 881--887, 2013.
[34]
Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of complexity. In Foundations of Computer Science, 1977., 18th Annual Symposium on, pages 222--227. IEEE, 1977.

Cited By

View all
  • (2022)Tight Bounds for ℓ1 Oblivious Subspace EmbeddingsACM Transactions on Algorithms10.1145/347753718:1(1-32)Online publication date: 24-Jan-2022
  • (2020)Nearly linear row sampling algorithm for quantile regressionProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525493(5979-5989)Online publication date: 13-Jul-2020
  1. Tight bounds for lp oblivious subspace embeddings

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SODA '19: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms
    January 2019
    2993 pages

    Sponsors

    • SIAM Activity Group on Discrete Mathematics

    In-Cooperation

    Publisher

    Society for Industrial and Applied Mathematics

    United States

    Publication History

    Published: 06 January 2019

    Check for updates

    Qualifiers

    • Research-article

    Conference

    SODA '19
    Sponsor:
    SODA '19: Symposium on Discrete Algorithms
    January 6 - 9, 2019
    California, San Diego

    Acceptance Rates

    Overall Acceptance Rate 411 of 1,322 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 31 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Tight Bounds for ℓ1 Oblivious Subspace EmbeddingsACM Transactions on Algorithms10.1145/347753718:1(1-32)Online publication date: 24-Jan-2022
    • (2020)Nearly linear row sampling algorithm for quantile regressionProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525493(5979-5989)Online publication date: 13-Jul-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media