research-article

Approximate resilience, monotonicity, and the complexity of agnostic learning

Authors:

Dana Dachman-Soled,

Vitaly Feldman,

Karl WimmerAuthors Info & Claims

SODA '15: Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms

Pages 498 - 511

Published: 04 January 2015 Publication History

Abstract

A function f is d-resilient if all its Fourier coefficients of degree at most d are zero, i.e. f is uncorrelated with all low-degree parities. We study the notion of approximate resilience of Boolean functions, where we say that f is α-approximately d-resilient if f is α-close to a [−1, 1]-valued d-resilient function in l₁ distance. We show that approximate resilience essentially characterizes the complexity of agnostic learning of a concept class C over the uniform distribution. Roughly speaking, if all functions in a class C are far from being d-resilient then C can be learned agnostically in time n^O(d) and conversely, if C contains a function close to being d-resilient then agnostic learning of C in the statistical query (SQ) framework of Kearns has complexity of at least n^Ω(d).

Focusing on monotone Boolean functions, we exhibit the existence of near-optimal α-approximately [EQUATION](α[EQUATION]n)-resilient monotone functions for all α > 0. Prior to our work, it was conceivable even that every monotone function is Ω(1)-far from any 1-resilient function. Furthermore, we construct simple, explicit monotone functions based on Tribes and CycleRun that are close to highly resilient functions. Our constructions are based on general resilience analysis and amplification techniques we introduce. These structural results, together with the characterization, imply nearly optimal lower bounds for agnostic learning of monotone juntas, a natural variant of the well-studied junta learning problem. In particular we show that no SQ algorithm can efficiently agnostically learn monotone k-juntas for any k = ω(1) and any constant error less than 1/2.

References

[1]

{AH11} Per Austrin and Johan Håstad. Randomly supported independence and resistance. SIAM Journal on Computing, 40(1):1--27, 2011. 1.1

Digital Library

[2]

{AM09} Per Austrin and Elchanan Mossel. Approximation resistant predicates from pairwise independence. Computational Complexity, 18(2):249--271, 2009. 1.1

[3]

{BBL98} A. Blum, C. Burch, and J. Langford. On learning monotone boolean functions. In Proceedings of FOCS, pages 408--415, 1998. 1.2, 1.2, 1.2, 3.1, 3.1

Digital Library

[4]

{BDLSS12} Shai Ben-David, David Loker, Nathan Srebro, and Karthik Sridharan. Minimizing the misclassification error rate using a surrogate convex loss. In ICML, 2012. 1.3

[5]

{Bec75} William Beckner. Inequalities in Fourier analysis. Ann. of Math. (2), 102(1):159--182, 1975. 1.2, 3.2

[6]

{BFJ⁺94} Avrim Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning dnf and characterizing statistical query learning using fourier analysis. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pages 253--262. ACM, 1994. 1.3, 2

Digital Library

[7]

{Bon70} Aline Bonami. Étude des coefficients de Fourier des fonctions de L^p(G). Ann. Inst. Fourier (Grenoble), 20(fasc. 2):335--402 (1971), 1970. 1.2, 3.2

[8]

{BT96} N. Bshouty and C. Tamon. On the Fourier spectrum of monotone functions. Journal of the ACM, 43(4):747--770, 1996. 1.2, 1.2, 1.2

Digital Library

[9]

{CGH⁺85} Benny Chor, Oded Goldreich, Johan Hasted, Joel Freidmann, Steven Rudich, and Roman Smolensky. The bit extraction problem or t-resilient functions. In Foundations of Computer Science, 1985., 26th Annual Symposium on, pages 396--407. IEEE, 1985. 1.1

Digital Library

[10]

{CKKL12} M. Cheraghchi, A. Klivans, P. Kothari, and H. Lee. Submodular functions are noise stable. In SODA, pages 1586--1592, 2012. 2

Digital Library

[11]

{DLSS14} Amit Daniely, Nati Linial, and Shai Shalev-Shwartz. The complexity of learning halfspaces using generalized linear methods. In COLT, pages 244--286, 2014. 1.3

[12]

{DSFT⁺14} Dana Dachman-Soled, Vitaly Feldman, Li-Yang Tan, Andrew Wan, and Karl Wimmer. Approximate resilience, monotonicity, and the complexity of agnostic learning. arXiv, CoRR, abs/1405.5268, 2014. 1.1, 1.1, 3.2.1, 3.3, 3.3, 3.3, 3.4.1

[13]

{Fel12} V. Feldman. A complete characterization of statistical query learning with applications to evolvability. Journal of Computer System Sciences, 78(5):1444--1459, 2012. 1.3, 2

Digital Library

[14]

{FGKP09} V. Feldman, P. Gopalan, S. Khot, and A. Ponuswami. On agnostic learning of parities, monomials and halfspaces. SIAM Journal on Computing, 39(2):606--645, 2009. 4

Digital Library

[15]

{FGRW12} Vitaly Feldman, Venkatesan Guruswami, Prasad Raghavendra, and Yi Wu. Agnostic learning of monomials by halfspaces is hard. SIAM J. Comput., 41(6):1558--1590, 2012. 1

[16]

{FK14} V. Feldman and P. Kothari. Agnostic learning of disjunctions on symmetric distributions. arXiv, CoRR, abs/1405.6791, 2014. 1.1, 1.3

[17]

{FLS11} V. Feldman, H. Lee, and R. Servedio. Lower bounds and hardness amplification for learning shallow monotone formulas. In Journal of Machine Learning Research - COLT Proceedings, volume 19, pages 273--292, 2011. 1.3

[18]

{Hau92} D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78--150, 1992. 1

Digital Library

[19]

{IT68} Aleksandr Ioffe and Vladimir Tikhomirov. Duality of convex functions and extremum problems. Russ. Math. Surv., 23, 1968. 1.1, 2

[20]

{Kea98} M. Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM, 45(6):983--1006, 1998. 1.1, 1.3

Digital Library

[21]

{KKL88} J. Kahn, G. Kalai, and N. Linial. The influence of variables on Boolean functions. In Proceedings of FOCS, pages 68--80, 1988. 3.4.1

Digital Library

[22]

{KKMS08} A. Kalai, A. Klivans, Y. Mansour, and R. Servedio. Agnostically learning halfspaces. SIAM Journal on Computing, 37(6):1777--1805, 2008. 1, 1.1, 1.3, 1.2, 1.2, 1.3, 4

Digital Library

[23]

{KS07} Adam R Klivans and Alexander A Sherstov. Unconditional lower bounds for learning intersections of halfspaces. Machine Learning, 69(2-3):97--114, 2007. 1.3

Digital Library

[24]

{KS09} Adam R. Klivans and Alexander A. Sherstov. Cryptographic hardness for learning intersections of halfspaces. J. Comput. Syst. Sci., 75(1):2--12, 2009. 4

Digital Library

[25]

{KS10} Adam R. Klivans and Alexander A. Sherstov. Lower bounds for agnostic learning via approximate rank. Computational Complexity, 19(4):581--604, 2010. 1, 1.3

Digital Library

[26]

{KSS94} M. Kearns, R. Schapire, and L. Sellie. Toward Efficient Agnostic Learning. Machine Learning, 17(2/3):115--141, 1994. 1

Digital Library

[27]

{Lov87} László Lovász. An algorithmic theory of numbers, graphs and convexity, volume 50. SIAM, 1987. 2

[28]

{LS11} Philip M. Long and Rocco A. Servedio. Learning large-margin halfspaces with more malicious noise. In NIPS, pages 91--99, 2011. 1.3

[29]

{LW95} Michael Luby and Avi Wigderson. Pairwise independence and derandomization. Citeseer, 1995. 1.1

[30]

{Man95} Y. Mansour. An O(n^{log log n}) learning algorithm for DNF under the uniform distribution. Journal of Computer and System Sciences, 50:543--550, 1995. 3.2.1

Digital Library

[31]

{MO02} Elchanan Mossel and Ryan O'Donnell. On the noise sensitivity of monotone functions. In Mathematics and Computer Science II, pages 481--495. Springer, 2002. 1.2

[32]

{MOS04} E. Mossel, R. O'Donnell, and R. Servedio. Learning functions of k relevant variables. Journal of Computer & System Sciences, 69(3):421--434, 2004. Previously published as "Learning juntas". 1.2, 1.2

Digital Library

[33]

{O'D03} R. O'Donnell. Computational Applications of Noise Sensitivity. PhD thesis, 2003. 1.2

Digital Library

[34]

{O'D13} Ryan O'Donnell. Analysis of boolean functions. http://analysisofbooleanfunctions.org, 2013. 3.2

[35]

{OS07} R. O'Donnell and R. Servedio. Learning monotone decision trees in polynomial time. SIAM J. Comput., 37(3):827--844, 2007. 1.2

Digital Library

[36]

{OW13} Ryan O'Donnell and Karl Wimmer. Kkl, kruskal-katona, and monotone nets. SIAM J. Comput., 42(6):2375--2399, 2013. 1.2

[37]

{OW14} Ryan O'Donnell and David Witmer. Goldreich's prg: Evidence for near-optimal polynomial stretch. In Conference on Computational Complexity, 2014. 1.1, 1.2

Digital Library

[38]

{Ser01} R. Servedio. On learning monotone DNF under product distributions. In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, pages 558--573, 2001. 1.2

Digital Library

[39]

{She11} Alexander A. Sherstov. The pattern matrix method. SIAM J. Comput., 40(6):1969--2000, 2011. 1.1, 1.3, 4, 4

Digital Library

[40]

{Sie84} Thomas Siegenthaler. Correlation-immunity of nonlinear combining functions for cryptographic applications. IEEE Transactions on Information Theory, 30(5):776--780, 1984. 1.2

Digital Library

[41]

{Sim07} H. Simon. A characterization of strong learnability in the statistical query model. In Proceedings of Symposium on Theoretical Aspects of Computer Science, pages 393--404, 2007. 1.3

Digital Library

[42]

{Szö09} Balázs Szörényi. Characterizing statistical query learning: simplified notions and proofs. In Algorithmic Learning Theory, pages 186--200. Springer, 2009. 1.3, 2

Digital Library

[43]

{Tal96} M. Talagrand. How much are increasing sets positively correlated? Combinatorica, 16(2):243--258, 1996. 1.2

[44]

{Val12} Gregory Valiant. Finding correlations in subquadratic time, with applications to learning parities and juntas. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on, pages 11--20. IEEE, 2012. 1.2, 1.3

Digital Library

[45]

{Wie} Udi Wieder. Tennis for the people ii. http://windowsontheory.org/2012/11/16/tennis-for-the-people-ii/. 1.2

[46]

{Yan05} Ke Yang. New lower bounds for statistical query learning. Journal of Computer and System Sciences, 70(4):485--509, 2005. 2

Digital Library

Cited By

Servedio RTan LChan T(2019)Pseudorandomness for read-k DNF formulasProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3310435.3310474(621-638)Online publication date: 6-Jan-2019
https://dl.acm.org/doi/10.5555/3310435.3310474
Servedio RTan LWright JZuckerman D(2015)Adaptivity helps for testing juntasProceedings of the 30th Conference on Computational Complexity10.5555/2833227.2833240(264-279)Online publication date: 17-Jun-2015
https://dl.acm.org/doi/10.5555/2833227.2833240
Feldman VKothari P(2015)Agnostic learning of disjunctions on symmetric distributionsThe Journal of Machine Learning Research10.5555/2789272.291210816:1(3455-3467)Online publication date: 1-Jan-2015
https://dl.acm.org/doi/10.5555/2789272.2912108

Approximate resilience, monotonicity, and the complexity of agnostic learning
1. Computing methodologies
2. Theory of computation

Recommendations

On Agnostic Learning of Parities, Monomials, and Halfspaces

We study the learnability of several fundamental concept classes in the agnostic learning framework of [D. Haussler, Inform. and Comput., 100 (1992), pp. 78-150] and [M. Kearns, R. Schapire, and L. Sellie, Machine Learning, 17 (1994), pp. 115-141]. We ...
Agnostic Learning of Monomials by Halfspaces Is Hard
FOCS '09: Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science

We prove the following strong hardness result for learning: Given a distribution on labeled examples from the hypercube such that there exists a monomial (or conjunction) consistent with (1-ε)-fraction of the examples, it is NP-hard to find a halfspace ...
The complexity of properly learning simple concept classes

We consider the complexity of properly learning concept classes, i.e. when the learner must output a hypothesis of the same form as the unknown concept. We present the following new upper and lower bounds on well-known concept classes:*We show that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SODA '15: Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms

January 2015

2056 pages

Program Chair:
Piotr Indyk
Massachusetts Institute of Technology

Sponsors

SIAM: Society for Industrial and Applied Mathematics

In-Cooperation

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 04 January 2015

Check for updates

Qualifiers

Research-article

Conference

SODA '15

Sponsor:

SIAM

SODA '15: ACM SIAM Symposium on Discrete Algorithms

January 4 - 6, 2015

California, San Diego

Acceptance Rates

SODA '15 Paper Acceptance Rate 137 of 495 submissions, 28%;

Overall Acceptance Rate 411 of 1,322 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
50
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Servedio RTan LChan T(2019)Pseudorandomness for read-k DNF formulasProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3310435.3310474(621-638)Online publication date: 6-Jan-2019
https://dl.acm.org/doi/10.5555/3310435.3310474
Servedio RTan LWright JZuckerman D(2015)Adaptivity helps for testing juntasProceedings of the 30th Conference on Computational Complexity10.5555/2833227.2833240(264-279)Online publication date: 17-Jun-2015
https://dl.acm.org/doi/10.5555/2833227.2833240
Feldman VKothari P(2015)Agnostic learning of disjunctions on symmetric distributionsThe Journal of Machine Learning Research10.5555/2789272.291210816:1(3455-3467)Online publication date: 1-Jan-2015
https://dl.acm.org/doi/10.5555/2789272.2912108

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten