Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Robust Estimators in High-Dimensions Without the Computational Intractability

Published: 01 January 2019 Publication History

Abstract

We study high-dimensional distribution learning in an agnostic setting where an adversary is allowed to arbitrarily corrupt an $\varepsilon$-fraction of the samples. Such questions have a rich history spanning statistics, machine learning, and theoretical computer science. Even in the most basic settings, the only known approaches are either computationally inefficient or lose dimension-dependent factors in their error guarantees. This raises the following question: Is high-dimensional agnostic distribution learning even possible, algorithmically? In this work, we obtain the first computationally efficient algorithms with dimension-independent error guarantees for agnostically learning several fundamental classes of high-dimensional distributions: (1) a single Gaussian, (2) a product distribution on the hypercube, (3) mixtures of two product distributions (under a natural balancedness condition), and (4) mixtures of spherical Gaussians. Our algorithms achieve error that is independent of the dimension, and in many cases scales nearly linearly with the fraction of adversarially corrupted samples. Moreover, we develop a general recipe for detecting and correcting corruptions in high-dimensions that may be applicable to many other problems.

References

[1]
P. Awasthi, M. F. Balcan, and P. M. Long, The power of localization for efficiently learning linear separators with noise, J. ACM, 63 (2017), 50.
[2]
J. Acharya, I. Diakonikolas, J. Li, and L. Schmidt, Sample-optimal density estimation in nearly-linear time, in Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, Philadelphia, 2017, pp. 1278--1289.
[3]
A. Anandkumar, R. Ge, D. Hsu, and S. Kakade, A tensor spectral approach to learning mixed membership community models, in Proceedings of the 26th Annual Conference on Learning Theory, 2013, pp. 867--881.
[4]
S. Arora, R. Ge, and A. Moitra, Learning topic models---going beyond SVD, in Proceedings of the 53rd Annual Symposium on Foundations of Computer Science, IEEE Computer Soc., Los Alamitos, CA, 2012, pp. 1--10.
[5]
S. Arora, R. Ge, A. Moitra, and S. Sachdeva, Provable ICA with unknown Gaussian noise, and implications for Gaussian mixtures and autoencoders, Algorithmica, 72 (2015), pp. 215--236.
[6]
A. Anandkumar, D. Hsu, and S. Kakade, A method of moments for mixture models and hidden Markov models, J. Mach. Learn. Res. - Proceedings Track, 23 (2012), pp. 33.1--33.34.
[7]
J. Acharya, A. Jafarpour, A. Orlitsky, and A. T. Suresh, Sorting with adversarial comparators and application to density estimation, in Proceedings of the 2014 IEEE International Symposium on Information Theory (ISIT), 2014, pp. 1682--1686.
[8]
S. Arora and R. Kannan, Learning mixtures of arbitrary Gaussians, in Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, ACM, New York, 2001, pp. 247--257.
[9]
R. E. Barlow, D. J. Bartholomew, J. M. Bremner, and H. D. Brunk, Statistical Inference under Order Restrictions, John Wiley & Sons, London, New York, Sydney, 1972.
[10]
S. Balmand and A. Dalalyan, Convex Programming Approach to Robust Estimation of a Multivariate Gaussian Model, preprint, https://arxiv.org/abs/1512.04734, 2015.
[11]
S. Balakrishnan, S. S. Du, J. Li, and A. Singh, Computationally efficient robust sparse estimation in high dimensions, in Proceedings of the 30th Annual Conference on Learning Theory, 2017, pp. 169--212.
[12]
T. Bernholt, Robust Estimators Are Hard to Compute, Technical Report, University of Dortmund, Dortmund, Germany, 2006.
[13]
S. C. Brubaker, Extensions of Principle Components Analysis, PhD. thesis, Georgia Institute of Technology, Atlanta, GA, 2009.
[14]
M. Belkin and K. Sinha, Polynomial learning of distribution families, in Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Soc., Los Alamitos, CA, 2010, pp. 103--112.
[15]
S. C. Brubaker and S. Vempala, Isotropic PCA and affine-invariant clustering, in Proceedings of the 49th IEEE Symposium on Foundations of Computer Science, IEEE Computer Soc., Los Alamitos, CA, 2008, pp. 551--560.
[16]
S. Chan, I. Diakonikolas, R. Servedio, and X. Sun, Learning mixtures of structured distributions over discrete domains, in Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, Philadelphia, 2013, pp. 1380--1394.
[17]
S. Chan, I. Diakonikolas, R. Servedio, and X. Sun, Efficient density estimation via piecewise polynomial approximation, in Proceedings of the 46th Annual ACM Symposium on Theory of Computing, ACM, New York, 2014, pp. 604--613.
[18]
S. Chan, I. Diakonikolas, R. Servedio, and X. Sun, Near-optimal density estimation in near-linear time using variable-width histograms, in Proceedings of the 27th International Conference on Neural Information Processing Systems, MIT Press, Cambridge, MA, 2014, pp. 1844--1852.
[19]
M. Cryan, L. A. Goldberg, and P. W. Goldberg, Evolutionary trees can be learned in polynomial time in the two-state general Markov model, SIAM J. Comput., 31 (2002), pp. 375--397, https://doi.org/10.1137/S0097539798342496.
[20]
M. B. Cohen, Y. T. Lee, G. L. Miller, J. W. Pachocki, and A. Sidford, Geometric median in nearly linear time, in Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, ACM, New York, 2016, pp. 9--21.
[21]
E. J. Candès, X. Li, Y. Ma, and J. Wright, Robust principal component analysis?, J. ACM, pp. 58 (2011), 11.
[22]
M. Charikar, J. Steinhardt, and G. Valiant, Learning from untrusted data, in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, ACM, New York, 2017, pp. 47--60.
[23]
T. Cover and J. Thomas, Elements of Information Theory, 2nd ed., John Wiley & Sons, Hoboken, NJ, 2006.
[24]
A. Carbery and J. Wright, Distributional and $L^q$ norm inequalities for polynomials over convex bodies in $R^n$, Math. Res. Lett., 8 (2001), pp, 233--248.
[25]
S. Dasgupta, Learning mixtures of Gaussians, in Proceedings of the 40th Annual Symposium on Foundations of Computer Science, IEEE Computer Soc., Los Alamitos, CA, 1999, pp. 634--644.
[26]
S. S. Du, S. Balakrishnan, and A. Singh, Computationally Efficient Robust Estimation of Sparse Functionals, preprint, https://arxiv.org/abs/1702.07709, 2017.
[27]
C. Daskalakis, A. De, G. Kamath, and C. Tzamos, A size-free CLT for Poisson multinomials and its applications, in Proceedings of the 48th Annual ACM Symposium on the Theory of Computing, ACM, New York, 2016, pp. 1074--1086.
[28]
C. Daskalakis, I. Diakonikolas, R. O'Donnell, R.A. Servedio, and L. Tan, Learning sums of independent integer random variables, in Proceedings of the 54th Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Soc., Los Alamitos, CA, 2013 pp. 217--226.
[29]
C. Daskalakis, I. Diakonikolas, and R. A. Servedio, Learning $k$-modal distributions via testing, in Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, New York, 2012, pp. 1371--1385.
[30]
C. Daskalakis, I. Diakonikolas, and R. A. Servedio, Learning Poisson binomial distributions, Algorithmica, 72 (2015), pp. 316--357.
[31]
A. De, I. Diakonikolas, and R. Servedio, Learning from satisfying assignments, in Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, Philadelphia, 2015, pp. 478--497.
[32]
L. Devroye and L. Györfi, Nonparametric Density Estimation: The $L_1$ View, John Wiley & Sons, New York, 1985.
[33]
C. Daskalakis and G. Kamath, Faster and sample near-optimal algorithms for proper learning mixtures of Gaussians, in Proceedings of the 27th Annual Conference on Learning Theory, 2014, pp. 1183--1213.
[34]
I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart, Robust estimators in high dimensions without the computational intractability, in Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Society, Washington, DC, 2016, pp. 655--664.
[35]
I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart, Being robust (in high dimensions) can be practical, in Proceedings of the 34th International Conference on Machine Learning, JMLR, Inc., 2017, pp. 999--1008.
[36]
I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, J. Steinhardt, and A. Stewart, Sever: A Robust Meta-Algorithm for Stochastic Optimization, preprint, https://arxiv.org/abs/1803.02815, 2018.
[37]
I. Diakonikolas, D. M. Kane, and A. Stewart, The Fourier transform of Poisson multinomial distributions and its algorithmic applications, in Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, ACM, New York, 2016, pp. 1060--1073.
[38]
I. Diakonikolas, D. M. Kane, and A. Stewart, Optimal learning via the Fourier transform for sums of independent integer random variables, in Proceedings of the 29th Annual Conference on Learning Theory, 2016, pp. 831--849.
[39]
I. Diakonikolas, D. M. Kane, and A. Stewart, Robust Learning of Fixed-Structure Bayesian Networks, preprint, https://arxiv.org/abs/1606.07384, 2016.
[40]
I. Diakonikolas, D. M. Kane, and A. Stewart, Statistical query lower bounds for robust estimation of high-dimensional Gaussians and Gaussian mixtures, in Proceedings of the 58th Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Society, Washington, DC, 2017, pp. 73--84.
[41]
I. Diakonikolas, D. M. Kane, and A. Stewart, Learning geometric concepts with nasty noise, in Proceedings of the 50th Annual ACM Symposium on Theory of Computing, ACM, New York, 2018, pp. 1061--1073.
[42]
I. Diakonikolas, D. M. Kane, and A. Stewart, List-decodable robust mean estimation and learning mixtures of spherical Gaussians, in Proceedings of the 50th Annual ACM Symposium on Theory of Computing, ACM, New York, 2018, pp. 1047--1060.
[43]
C. Daskalakis, G. Kamath, and C. Tzamos, On the structure, covering, and learning of Poisson multinomial distributions, in Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Society, Washington, DC, 2015, pp. 1203--1217.
[44]
L. Devroye and G. Lugosi, A universally acceptable smoothing factor for kernel density estimates, Ann. Statist., 24 (1996), pp. 2499--2512.
[45]
L. Devroye and G. Lugosi, Nonasymptotic universal smoothing factors, kernel complexity and Yatracos classes, Ann. Statist., 25 (1997), pp. 2626--2637.
[46]
L. Devroye and G. Lugosi, Combinatorial Methods in Density Estimation, Springer Ser. Statist., Springer-Verlag, New York, 2001.
[47]
Y. Freund and Y. Mansour, Estimating a mixture of two product distributions, in Proceedings of the Twelfth Annual Conference on Computational Learning Theory, ACM, New York, 1999, pp. 53--62.
[48]
J. Feldman, R. O'Donnell, and R. A. Servedio, Learning mixtures of product distributions over discrete domains, SIAM J. Comput., 37 (2008), pp. 1536--1564, https://doi.org/10.1137/060670705.
[49]
M. Grötschel, L. Lovász, and A. Schrijver, Geometric Algorithms and Combinatorial Optimization, Vol. 2, Springer-Verlag, Berlin, 1988.
[50]
D. Haussler, Decision theoretic generalizations of the PAC model for neural net and other learning applications, Inform. and Comput., 100 (1992), pp. 78--150.
[51]
D. Hsu and S. M. Kakade, Learning mixtures of spherical Gaussians: Moment methods and spectral decompositions, in Innovations in Theoretical Computer Science, ACM, New York, 2013, pp. 11--19.
[52]
S. B. Hopkins and J. Li, Mixture models, robustness, and sum of squares proofs, in Proceedings of the 50th Annual ACM Symposium on Theory of Computing, ACM, New York, 2018, pp. 1021--1034.
[53]
M. Hardt and A. Moitra, Algorithms and hardness for robust subspace recovery, in Proceedings of the 26th Annual Conference on Learning Theory, 2013, pp. 354--375.
[54]
M. Hardt and E. Price, Sharp bounds for learning a mixture of two Gaussians, in Proceedings of the 47th Annual ACM Symposium on Theory of Computing, ACM, New York, 2015, pp. 753--760.
[55]
P. J. Huber and E. M. Ronchetti, Robust Statistics, John Wiley & Sons, New York, 2009.
[56]
F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, Robust Statistics. The Approach Based on Influence Functions, John Wiley & Sons, New York, 1986.
[57]
P. J. Huber, The behavior of maximum likelihood estimates under nonstandard conditions, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, University of California Press, Berkeley, CA, 1967, pp. 221--233.
[58]
D. S. Johnson and F. P. Preparata, The densest hemisphere problem, Theoret. Comput. Sci., 6 (1978), pp. 93--107.
[59]
A. Klivans, P. K. Kothari, and R. Meka, Efficient Algorithms for Outlier-Robust Regression, preprint, https://arxiv.org/abs/1803.03241, 2018.
[60]
M. J. Kearns and M. Li, Learning in the presence of malicious errors, SIAM J. Comput., 22 (1993), pp. 807--837, https://doi.org/10.1137/0222052.
[61]
A. Klivans, P. Long, and R. Servedio, Learning halfspaces with malicious noise, in Automata, Languages and Programming. Part I, Lecture Notes in Comput. Sci. 5555, Springer, Berlin, 2009, pp. 609--621.
[62]
M. Kearns, Y. Mansour, D. Ron, R. Rubinfeld, R. Schapire, and L. Sellie, On the learnability of discrete distributions, in Proceedings of the Twenty-Sixth Annual ACM Symposium on Theory of Computing, ACM, New York, 1994, pp. 273--282.
[63]
A. T. Kalai, A. Moitra, and G. Valiant, Efficiently learning mixtures of two Gaussians, in Proceedings of the 2010 ACM International Symposium on Theory of Computing, ACM, New York, pp. 553--562.
[64]
M. Kearns, R. Schapire, and L. Sellie, Toward efficient agnostic learning, Mach. Learn., 17 (1994), pp. 115--141.
[65]
P. K. Kothari, J. Steinhardt, and D. Steurer, Robust moment estimation and improved clustering via sum of squares, in Proceedings of the 50th Annual ACM Symposium on Theory of Computing, ACM, New York, 2018, pp. 1035--1046.
[66]
J. Li, Robust Sparse Estimation Tasks in High Dimensions, preprint, https://arxiv.org/abs/1702.05860, 2017.
[67]
B. Laurent and P. Massart, Adaptive estimation of a quadratic functional by model selection, Ann. Statist., 28 (2000), pp. 1302--1338.
[68]
G. Lerman, M. B. McCoy, J. A. Tropp, and T. Zhang, Robust computation of linear models by convex relaxation, Found. Comput. Math., 15 (2015), pp. 363--410.
[69]
K. A. Lai, A. B. Rao, and S. Vempala, Agnostic estimation of mean and covariance, in Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Society, Los Alamitos, CA, 2016, pp. 665--674.
[70]
J. Li and L. Schmidt, Robust proper learning for mixtures of Gaussians via systems of polynomial inequalities, in Proceedings of the 30th Annual Conference on Learning Theory, 2017, pp. 1302--1382.
[71]
P. L. Loh and X. L. Tan, High-Dimensional Robust Precision Matrix Estimation: Cellwise Corruption under $\varepsilon$-Contamination, preprint, https://arxiv.org/abs/1509.07229, 2015.
[72]
E. Mossel and S. Roch, Learning nonsingular phylogenies and hidden Markov models, in Proceedings of the 37th Annual ACM Symposium on Theory of Computing, ACM, New York, 2005, pp. 366--375.
[73]
A. Moitra and G. Valiant, Settling the polynomial learnability of mixtures of Gaussians, in Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Soc., Los Alamitos, CA, 2010, pp. 93--102.
[74]
M. Meister and G. Valiant, A Data Prism: Semi-Verified Learning in the Small-Alpha Regime, preprint, https://arxiv.org/abs/1708.02740, 2017.
[75]
A. Prasad, A. S. Suggala, S. Balakrishnan, and P. Ravikumar, Robust Estimation via Robust Gradient Estimation, preprint, https://arxiv.org/abs/1802.06485, 2018.
[76]
M. Qiao and G. Valiant, Learning discrete distributions from untrusted batches, in Proceedings of the 9th Conference on Innovations in Theoretical Computer Science, ACM, New York, 2018, 47.
[77]
P. Rigollet and J.-C. Hütter, High-Dimensional Statistics, available at http://www-math.mit.edu/~rigollet/PDFs/RigNotes17.pdf, 2017.
[78]
D. W. Scott, Multivariate Density Estimation: Theory, Practice and Visualization, John Wiley & Sons, Hoboken, NJ, 1992.
[79]
J. Steinhardt, M. Charikar, and G. Valiant, Resilience: A criterion for learning in the presence of arbitrary outliers, in Proceedings of the 9th Conference on Innovations in Theoretical Computer Science, ACM, New York, 2018, 45.
[80]
R. Servedio, Smooth boosting and learning with malicious noise, in Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, 2001, pp. 473--489.
[81]
R. Servedio, Smooth boosting and learning with malicious noise, J. Mach. Learn. Res., 4 (2003), pp. 633--648.
[82]
B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall, London, 1986.
[83]
A. T. Suresh, A. Orlitsky, J. Acharya, and A. Jafarpour, Near-optimal-sample estimators for spherical Gaussian mixtures, in Proceedings of the 27th International Conference on Neural Information Processing Systems, MIT Press, Cambridge, MA, 2014, pp. 1395--1403.
[84]
J. W. Tukey, Mathematics and picturing of data, in Proceedings of the International Congress of Mathematicians (Vancouver, B. C., 1974), Vol. 2, Canad. Math. Congress, Montreal, Que., Canada, 1975, pp. 523--531.
[85]
L. Valiant, Learning disjunctions of conjunctions, in Proceedings of the 9th International Joint Conference on Artificial Intelligence, Morgan Kaufmann Publishers, San Francisco, 1985, pp. 560--566.
[86]
R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, in Compressed Sensing, Cambridge University Press, Cambridge, UK, 2012, pp. 210--268.
[87]
S. Vempala and G. Wang, A spectral algorithm for learning mixtures of distributions, in Proceeding of the 43rd Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Soc., Los Alamitos, CA, 2002, pp. 113--122.
[88]
H. White, Maximum likelihood estimation of misspecified models, Econometrica, 50 (1982), pp. 1--25.
[89]
Y. G. Yatracos, Rates of convergence of minimum distance estimators and Kolmogorov's entropy, Ann. Statist., 13 (1985), pp. 768--774.
[90]
T. Zhang and G. Lerman, A novel m-estimator for robust PCA, J. Mach. Learn. Res., 15 (2014), pp. 749--808.

Cited By

View all
  • (2024)Near-Optimal Mean Estimation with Unknown, Heteroskedastic VariancesProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649754(194-200)Online publication date: 10-Jun-2024
  • (2024)Byzantine Machine Learning: A PrimerACM Computing Surveys10.1145/361653756:7(1-39)Online publication date: 9-Apr-2024
  • (2023)On differentially private sampling from Gaussian and product distributionsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669521(77783-77809)Online publication date: 10-Dec-2023
  • Show More Cited By

Index Terms

  1. Robust Estimators in High-Dimensions Without the Computational Intractability
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image SIAM Journal on Computing
        SIAM Journal on Computing  Volume 48, Issue 2
        DOI:10.1137/smjcat.48.2
        Issue’s Table of Contents

        Publisher

        Society for Industrial and Applied Mathematics

        United States

        Publication History

        Published: 01 January 2019

        Author Tags

        1. robust learning
        2. high-dimensions
        3. Gaussian distribution
        4. mixture models
        5. product distributions

        Author Tags

        1. 68Q25
        2. 68Q32

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 13 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Near-Optimal Mean Estimation with Unknown, Heteroskedastic VariancesProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649754(194-200)Online publication date: 10-Jun-2024
        • (2024)Byzantine Machine Learning: A PrimerACM Computing Surveys10.1145/361653756:7(1-39)Online publication date: 9-Apr-2024
        • (2023)On differentially private sampling from Gaussian and product distributionsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669521(77783-77809)Online publication date: 10-Dec-2023
        • (2023)Robust mean estimation without moments for symmetric distributionsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667613(34371-34409)Online publication date: 10-Dec-2023
        • (2023)Label robust and differentially private linear regressionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667120(23019-23033)Online publication date: 10-Dec-2023
        • (2023)Incentivizing honesty among competitors in collaborative learning and optimizationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666459(7659-7696)Online publication date: 10-Dec-2023
        • (2023)Learning mixtures of Gaussians with censored dataProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619798(33396-33415)Online publication date: 23-Jul-2023
        • (2023)Efficient list-decodable regression using batchesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618688(7025-7065)Online publication date: 23-Jul-2023
        • (2023)On the privacy-robustness-utility trilemma in distributed learningProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618435(569-626)Online publication date: 23-Jul-2023
        • (2023)Data structures for density estimationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618409(1-18)Online publication date: 23-Jul-2023
        • Show More Cited By

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media