research-article

Adversarially robust PAC learnability of real-valued functions

AUTHORs:

Steve HannekeAuthors Info & Claims

ICML'23: Proceedings of the 40th International Conference on Machine Learning

Article No.: 49, Pages 1172 - 1199

Published: 23 July 2023 Publication History

Abstract

We study robustness to test-time adversarial attacks in the regression setting with ℓ_p losses and arbitrary perturbation sets. We address the question of which function classes are PAC learnable in this setting. We show that classes of finite fat-shattering dimension are learnable in both realizable and agnostic settings. Moreover, for convex function classes, they are even properly learnable. In contrast, some non-convex function classes provably require improper learning algorithms. Our main technique is based on a construction of an adversarially robust sample compression scheme of a size determined by the fat-shattering dimension. Along the way, we introduce a novel agnostic sample compression scheme for realvalued functions, which may be of independent interest.

References

[1]

Agarwal, N., Bullins, B., Hazan, E., Kakade, S., and Singh, K. Online control with adversarial disturbances. In International Conference on Machine Learning, pp. 111-119. PMLR, 2019.

[2]

Alon, N., Ben-David, S., Cesa-Bianchi, N., and Haussler, D. Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM (JACM), 44(4):615-631, 1997.

[3]

Anava, O., Hazan, E., Mannor, S., and Shamir, O. Online learning for time series prediction. In Conference on learning theory, pp. 172-184. PMLR, 2013.

[4]

Angluin, D. Queries and concept learning. Machine learning, 2:319-342, 1988.

Digital Library

[5]

Anthony, M. and Bartlett, P. L. Function learning from interpolation. Combinatorics, Probability and Computing, 9(3):213-225, 2000.

Digital Library

[6]

Anthony, M., Bartlett, P. L., Bartlett, P. L., et al. Neural network learning: Theoretical foundations, volume 9. cambridge university press Cambridge, 1999.

Digital Library

[7]

Ashtiani, H., Pathak, V., and Urner, R. Black-box certification and learning under adversarial perturbations. In International Conference on Machine Learning, pp. 388- 398. PMLR, 2020.

[8]

Attias, I., Kontorovich, A., and Mansour, Y. Improved generalization bounds for robust learning. In Algorithmic Learning Theory, pp. 162-183. PMLR, 2019.

[9]

Attias, I., Hanneke, S., and Mansour, Y. A characterization of semi-supervised adversarially-robust pac learnability. arXiv preprint arXiv:2202.05420, 2022.

[10]

Awasthi, P., Frank, N., and Mohri, M. Adversarial learning guarantees for linear hypotheses and neural networks. In International Conference on Machine Learning, pp. 431-441. PMLR, 2020.

[11]

Awasthi, P., Frank, N., Mao, A., Mohri, M., and Zhong, Y. Calibration and consistency of adversarial surrogate losses. Advances in Neural Information Processing Systems, 34, 2021a.

[12]

Awasthi, P., Frank, N., and Mohri, M. On the existence of the adversarial bayes classifier. Advances in Neural Information Processing Systems, 34, 2021b.

[13]

Awasthi, P., Mao, A., Mohri, M., and Zhong, Y. H-consistency bounds for surrogate loss minimizers. In International Conference on Machine Learning, pp. 1117- 1174. PMLR, 2022a.

[14]

Awasthi, P., Mao, A., Mohri, M., and Zhong, Y. Multi-class h-consistency bounds. Advances in Neural Information Processing Systems, 35:782-795, 2022b.

[15]

Awasthi, P., Mao, A., Mohri, M., and Zhong, Y. Theoretically grounded loss functions and algorithms for adversarial robustness. In International Conference on Artificial Intelligence and Statistics, pp. 10077-10094. PMLR, 2023.

[16]

Bartlett, P. L. and Long, P. M. Prediction, learning, uniform convergence, and scale-sensitive dimensions. Journal of Computer and System Sciences, 56(2):174-190, 1998.

Digital Library

[17]

Bhattacharjee, R., Jha, S., and Chaudhuri, K. Sample complexity of robust linear classification on separated data. In International Conference on Machine Learning, pp. 884-893. PMLR, 2021.

[18]

Biggio, B., Corona, I., Maiorca, D., Nelson, B., ? Srndić, N., Laskov, P., Giacinto, G., and Roli, F. Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases, pp. 387-402. Springer, 2013.

Digital Library

[19]

Bubeck, S., Lee, Y. T., Price, E., and Razenshteyn, I. Adversarial examples from computational constraints. In International Conference on Machine Learning, pp. 831- 840. PMLR, 2019.

[20]

Candes, E. and Recht, B. Exact matrix completion via convex optimization. Communications of the ACM, 55 (6):111-119, 2012.

Digital Library

[21]

Cullina, D., Bhagoji, A. N., and Mittal, P. Pac-learning in the presence of adversaries. In Advances in Neural Information Processing Systems, pp. 230-241, 2018.

[22]

Dan, C., Wei, Y., and Ravikumar, P. Sharp statistical guaratees for adversarially robust gaussian classification. In International Conference on Machine Learning, pp. 2345- 2355. PMLR, 2020.

[23]

Daniely, A. and Shalev-Shwartz, S. Optimal learners for multiclass problems. In Conference on Learning Theory, pp. 287-316. PMLR, 2014.

[24]

Daniely, A., Sabato, S., Ben-David, S., and Shalev-Shwartz, S. Multiclass learnability and the erm principle. In Proceedings of the 24th Annual Conference on Learning Theory, pp. 207-232. JMLR Workshop and Conference Proceedings, 2011.

[25]

David, O., Moran, S., and Yehudayoff, A. Supervised learning through the lens of compression. Advances in Neural Information Processing Systems, 29:2784-2792, 2016.

[26]

Diochnos, D., Mahloujifar, S., and Mahmoody, M. Adversarial risk and robustness: General definitions and implications for the uniform distribution. Advances in Neural Information Processing Systems, 31, 2018.

[27]

Dudley, R. M. A course on empirical processes. In Ecole d'été de Probabilités de Saint-Flour XII-1982, pp. 1-142. Springer, 1984.

[28]

Duffy, N. and Helmbold, D. Boosting methods for regression. Machine Learning, 47(2):153-200, 2002.

Digital Library

[29]

Feige, U., Mansour, Y., and Schapire, R. Learning and inference in the presence of corrupted inputs. In Conference on Learning Theory, pp. 637-657, 2015.

[30]

Floyd, S. and Warmuth, M. Sample compression, learnability, and the vapnik-chervonenkis dimension. Machine learning, 21(3):269-304, 1995.

Digital Library

[31]

Freund, Y. and Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119-139, 1997.

Digital Library

[32]

Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.

[33]

Gourdeau, P., Kanade, V., Kwiatkowska, M., and Worrell, J. On the hardness of robust classification. The Journal of Machine Learning Research, 22(1):12521-12549, 2021.

Digital Library

[34]

Graepel, T., Herbrich, R., and Shawe-Taylor, J. Pacbayesian compression bounds on the prediction error of learning algorithms for classification. Machine Learning, 59(1-2):55-76, 2005.

Digital Library

[35]

Hanneke, S. The optimal sample complexity of pac learning. The Journal of Machine Learning Research, 17(1):1319- 1333, 2016.

Digital Library

[36]

Hanneke, S., Kontorovich, A., and Sadigurschi, M. Sample compression for real-valued learners. In Algorithmic Learning Theory, pp. 466-488. PMLR, 2019.

[37]

Hazan, E. and Ma, T. A non-generative framework and convex relaxations for unsupervised learning. Advances in Neural Information Processing Systems, 29, 2016.

[38]

Hazan, E., Kale, S., and Shalev-Shwartz, S. Near-optimal algorithms for online matrix prediction. In Conference on Learning Theory, pp. 38-1. JMLR Workshop and Conference Proceedings, 2012.

[39]

Hazan, E., Livni, R., and Mansour, Y. Classification with low rank and missing data. In International conference on machine learning, pp. 257-266. PMLR, 2015.

[40]

Kearns, M. J. and Schapire, R. E. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3):464-497, 1994.

Digital Library

[41]

Kégl, B. Robust regression by boosting the median. In Learning Theory and Kernel Machines, pp. 258-272. Springer, 2003.

[42]

Khim, J. and Loh, P.-L. Adversarial risk bounds via function transformation. arXiv preprint arXiv:1810.09519, 2018.

[43]

Kleer, P. and Simon, H. Primal and dual combinatorial dimensions. arXiv preprint arXiv:2108.10037, 2021.

[44]

Kontorovich, A. and Attias, I. Fat-shattering dimension of k-fold maxima. arXiv preprint arXiv:2110.04763, 2021.

[45]

Littlestone, N. and Warmuth, M. Relating data compression and learnability. 1986.

[46]

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.

[47]

Mao, A., Mohri, M., and Zhong, Y. Cross-entropy loss functions: Theoretical analysis and applications. arXiv preprint arXiv:2304.07288, 2023.

[48]

Mendelson, S. An optimal unrestricted learning procedure. Journal of the ACM, 66(6):1--42, 2019.

Digital Library

[49]

Montasser, O., Hanneke, S., and Srebro, N. Vc classes are adversarially robustly learnable, but only improperly. arXiv preprint arXiv:1902.04217, 2019.

[50]

Montasser, O., Goel, S., Diakonikolas, I., and Srebro, N. Efficiently learning adversarially robust halfspaces with noise. In International Conference on Machine Learning, pp. 7010-7021. PMLR, 2020a.

[51]

Montasser, O., Hanneke, S., and Srebro, N. Reducing adversarially robust learning to non-robust pac learning. Advances in Neural Information Processing Systems, 33: 14626-14637, 2020b.

[52]

Montasser, O., Hanneke, S., and Srebro, N. Adversarially robust learning with unknown perturbation sets. In Conference on Learning Theory, pp. 3452-3482. PMLR, 2021a.

[53]

Montasser, O., Hanneke, S., and Srebro, N. Transductive robust learning guarantees. arXiv preprint arXiv:2110.10602, 2021b.

[54]

Pollard, D. Convergence of stochastic processes. Springer Science & Business Media, 2012.

[55]

Rudelson, M. and Vershynin, R. Combinatorics of random processes and sections of convex bodies. Annals of Mathematics, pp. 603-648, 2006.

[56]

Schapire, R. E. and Freund, Y. Boosting: Foundations and algorithms. Kybernetes, 2013.

[57]

Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., and Madry, A. Adversarially robust generalization requires more data. Advances in neural information processing systems, 31, 2018.

[58]

Simon, H. U. Bounds on the number of examples needed for learning functions. SIAM Journal on Computing, 26 (3):751-763, 1997.

Digital Library

[59]

Srebro, N., Rennie, J., and Jaakkola, T. Maximum-margin matrix factorization. Advances in neural information processing systems, 17, 2004.

[60]

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.

[61]

Xing, Y., Zhang, R., and Cheng, G. Adversarially robust estimate and risk analysis in linear regression. In International Conference on Artificial Intelligence and Statistics, pp. 514-522. PMLR, 2021.

[62]

Yin, D., Kannan, R., and Bartlett, P. Rademacher complexity for adversarially robust generalization. In International Conference on Machine Learning, pp. 7085-7094. PMLR, 2019.

Recommendations

Fat-Shattering and the Learnability of Real-Valued Functions

We consider the problem of learning real-valued functions from random examples when the function values are corrupted with noise. With mild conditions on independent observation noise, we provide characterizations of the learnability of a real-valued ...
Fat-shattering and the learnability of real-valued functions
COLT '94: Proceedings of the seventh annual conference on Computational learning theory

We consider the problem of learning real-valued functions from random examples when the function values are corrupted with noise. With mild conditions on independent observation noise, we provide characterizations of the learnability of a real-valued ...
Reducing adversarially robust learning to non-robust PAC learning
NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing Systems

We study the problem of reducing adversarially robust learning to standard PAC learning, i.e. the complexity of learning adversarially robust predictors using access to only a black-box non-robust learner. We give a reduction that can robustly learn any ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'23: Proceedings of the 40th International Conference on Machine Learning

July 2023

43479 pages

Copyright © 2023.

Publisher

JMLR.org

Publication History

Published: 23 July 2023

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents