Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3618408.3618457guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Adversarially robust PAC learnability of real-valued functions

Published: 23 July 2023 Publication History

Abstract

We study robustness to test-time adversarial attacks in the regression setting with ℓp losses and arbitrary perturbation sets. We address the question of which function classes are PAC learnable in this setting. We show that classes of finite fat-shattering dimension are learnable in both realizable and agnostic settings. Moreover, for convex function classes, they are even properly learnable. In contrast, some non-convex function classes provably require improper learning algorithms. Our main technique is based on a construction of an adversarially robust sample compression scheme of a size determined by the fat-shattering dimension. Along the way, we introduce a novel agnostic sample compression scheme for realvalued functions, which may be of independent interest.

References

[1]
Agarwal, N., Bullins, B., Hazan, E., Kakade, S., and Singh, K. Online control with adversarial disturbances. In International Conference on Machine Learning, pp. 111-119. PMLR, 2019.
[2]
Alon, N., Ben-David, S., Cesa-Bianchi, N., and Haussler, D. Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM (JACM), 44(4):615-631, 1997.
[3]
Anava, O., Hazan, E., Mannor, S., and Shamir, O. Online learning for time series prediction. In Conference on learning theory, pp. 172-184. PMLR, 2013.
[4]
Angluin, D. Queries and concept learning. Machine learning, 2:319-342, 1988.
[5]
Anthony, M. and Bartlett, P. L. Function learning from interpolation. Combinatorics, Probability and Computing, 9(3):213-225, 2000.
[6]
Anthony, M., Bartlett, P. L., Bartlett, P. L., et al. Neural network learning: Theoretical foundations, volume 9. cambridge university press Cambridge, 1999.
[7]
Ashtiani, H., Pathak, V., and Urner, R. Black-box certification and learning under adversarial perturbations. In International Conference on Machine Learning, pp. 388- 398. PMLR, 2020.
[8]
Attias, I., Kontorovich, A., and Mansour, Y. Improved generalization bounds for robust learning. In Algorithmic Learning Theory, pp. 162-183. PMLR, 2019.
[9]
Attias, I., Hanneke, S., and Mansour, Y. A characterization of semi-supervised adversarially-robust pac learnability. arXiv preprint arXiv:2202.05420, 2022.
[10]
Awasthi, P., Frank, N., and Mohri, M. Adversarial learning guarantees for linear hypotheses and neural networks. In International Conference on Machine Learning, pp. 431-441. PMLR, 2020.
[11]
Awasthi, P., Frank, N., Mao, A., Mohri, M., and Zhong, Y. Calibration and consistency of adversarial surrogate losses. Advances in Neural Information Processing Systems, 34, 2021a.
[12]
Awasthi, P., Frank, N., and Mohri, M. On the existence of the adversarial bayes classifier. Advances in Neural Information Processing Systems, 34, 2021b.
[13]
Awasthi, P., Mao, A., Mohri, M., and Zhong, Y. H-consistency bounds for surrogate loss minimizers. In International Conference on Machine Learning, pp. 1117- 1174. PMLR, 2022a.
[14]
Awasthi, P., Mao, A., Mohri, M., and Zhong, Y. Multi-class h-consistency bounds. Advances in Neural Information Processing Systems, 35:782-795, 2022b.
[15]
Awasthi, P., Mao, A., Mohri, M., and Zhong, Y. Theoretically grounded loss functions and algorithms for adversarial robustness. In International Conference on Artificial Intelligence and Statistics, pp. 10077-10094. PMLR, 2023.
[16]
Bartlett, P. L. and Long, P. M. Prediction, learning, uniform convergence, and scale-sensitive dimensions. Journal of Computer and System Sciences, 56(2):174-190, 1998.
[17]
Bhattacharjee, R., Jha, S., and Chaudhuri, K. Sample complexity of robust linear classification on separated data. In International Conference on Machine Learning, pp. 884-893. PMLR, 2021.
[18]
Biggio, B., Corona, I., Maiorca, D., Nelson, B., ? Srndić, N., Laskov, P., Giacinto, G., and Roli, F. Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases, pp. 387-402. Springer, 2013.
[19]
Bubeck, S., Lee, Y. T., Price, E., and Razenshteyn, I. Adversarial examples from computational constraints. In International Conference on Machine Learning, pp. 831- 840. PMLR, 2019.
[20]
Candes, E. and Recht, B. Exact matrix completion via convex optimization. Communications of the ACM, 55 (6):111-119, 2012.
[21]
Cullina, D., Bhagoji, A. N., and Mittal, P. Pac-learning in the presence of adversaries. In Advances in Neural Information Processing Systems, pp. 230-241, 2018.
[22]
Dan, C., Wei, Y., and Ravikumar, P. Sharp statistical guaratees for adversarially robust gaussian classification. In International Conference on Machine Learning, pp. 2345- 2355. PMLR, 2020.
[23]
Daniely, A. and Shalev-Shwartz, S. Optimal learners for multiclass problems. In Conference on Learning Theory, pp. 287-316. PMLR, 2014.
[24]
Daniely, A., Sabato, S., Ben-David, S., and Shalev-Shwartz, S. Multiclass learnability and the erm principle. In Proceedings of the 24th Annual Conference on Learning Theory, pp. 207-232. JMLR Workshop and Conference Proceedings, 2011.
[25]
David, O., Moran, S., and Yehudayoff, A. Supervised learning through the lens of compression. Advances in Neural Information Processing Systems, 29:2784-2792, 2016.
[26]
Diochnos, D., Mahloujifar, S., and Mahmoody, M. Adversarial risk and robustness: General definitions and implications for the uniform distribution. Advances in Neural Information Processing Systems, 31, 2018.
[27]
Dudley, R. M. A course on empirical processes. In Ecole d'été de Probabilités de Saint-Flour XII-1982, pp. 1-142. Springer, 1984.
[28]
Duffy, N. and Helmbold, D. Boosting methods for regression. Machine Learning, 47(2):153-200, 2002.
[29]
Feige, U., Mansour, Y., and Schapire, R. Learning and inference in the presence of corrupted inputs. In Conference on Learning Theory, pp. 637-657, 2015.
[30]
Floyd, S. and Warmuth, M. Sample compression, learnability, and the vapnik-chervonenkis dimension. Machine learning, 21(3):269-304, 1995.
[31]
Freund, Y. and Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119-139, 1997.
[32]
Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
[33]
Gourdeau, P., Kanade, V., Kwiatkowska, M., and Worrell, J. On the hardness of robust classification. The Journal of Machine Learning Research, 22(1):12521-12549, 2021.
[34]
Graepel, T., Herbrich, R., and Shawe-Taylor, J. Pacbayesian compression bounds on the prediction error of learning algorithms for classification. Machine Learning, 59(1-2):55-76, 2005.
[35]
Hanneke, S. The optimal sample complexity of pac learning. The Journal of Machine Learning Research, 17(1):1319- 1333, 2016.
[36]
Hanneke, S., Kontorovich, A., and Sadigurschi, M. Sample compression for real-valued learners. In Algorithmic Learning Theory, pp. 466-488. PMLR, 2019.
[37]
Hazan, E. and Ma, T. A non-generative framework and convex relaxations for unsupervised learning. Advances in Neural Information Processing Systems, 29, 2016.
[38]
Hazan, E., Kale, S., and Shalev-Shwartz, S. Near-optimal algorithms for online matrix prediction. In Conference on Learning Theory, pp. 38-1. JMLR Workshop and Conference Proceedings, 2012.
[39]
Hazan, E., Livni, R., and Mansour, Y. Classification with low rank and missing data. In International conference on machine learning, pp. 257-266. PMLR, 2015.
[40]
Kearns, M. J. and Schapire, R. E. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3):464-497, 1994.
[41]
Kégl, B. Robust regression by boosting the median. In Learning Theory and Kernel Machines, pp. 258-272. Springer, 2003.
[42]
Khim, J. and Loh, P.-L. Adversarial risk bounds via function transformation. arXiv preprint arXiv:1810.09519, 2018.
[43]
Kleer, P. and Simon, H. Primal and dual combinatorial dimensions. arXiv preprint arXiv:2108.10037, 2021.
[44]
Kontorovich, A. and Attias, I. Fat-shattering dimension of k-fold maxima. arXiv preprint arXiv:2110.04763, 2021.
[45]
Littlestone, N. and Warmuth, M. Relating data compression and learnability. 1986.
[46]
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
[47]
Mao, A., Mohri, M., and Zhong, Y. Cross-entropy loss functions: Theoretical analysis and applications. arXiv preprint arXiv:2304.07288, 2023.
[48]
Mendelson, S. An optimal unrestricted learning procedure. Journal of the ACM, 66(6):1--42, 2019.
[49]
Montasser, O., Hanneke, S., and Srebro, N. Vc classes are adversarially robustly learnable, but only improperly. arXiv preprint arXiv:1902.04217, 2019.
[50]
Montasser, O., Goel, S., Diakonikolas, I., and Srebro, N. Efficiently learning adversarially robust halfspaces with noise. In International Conference on Machine Learning, pp. 7010-7021. PMLR, 2020a.
[51]
Montasser, O., Hanneke, S., and Srebro, N. Reducing adversarially robust learning to non-robust pac learning. Advances in Neural Information Processing Systems, 33: 14626-14637, 2020b.
[52]
Montasser, O., Hanneke, S., and Srebro, N. Adversarially robust learning with unknown perturbation sets. In Conference on Learning Theory, pp. 3452-3482. PMLR, 2021a.
[53]
Montasser, O., Hanneke, S., and Srebro, N. Transductive robust learning guarantees. arXiv preprint arXiv:2110.10602, 2021b.
[54]
Pollard, D. Convergence of stochastic processes. Springer Science & Business Media, 2012.
[55]
Rudelson, M. and Vershynin, R. Combinatorics of random processes and sections of convex bodies. Annals of Mathematics, pp. 603-648, 2006.
[56]
Schapire, R. E. and Freund, Y. Boosting: Foundations and algorithms. Kybernetes, 2013.
[57]
Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., and Madry, A. Adversarially robust generalization requires more data. Advances in neural information processing systems, 31, 2018.
[58]
Simon, H. U. Bounds on the number of examples needed for learning functions. SIAM Journal on Computing, 26 (3):751-763, 1997.
[59]
Srebro, N., Rennie, J., and Jaakkola, T. Maximum-margin matrix factorization. Advances in neural information processing systems, 17, 2004.
[60]
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
[61]
Xing, Y., Zhang, R., and Cheng, G. Adversarially robust estimate and risk analysis in linear regression. In International Conference on Artificial Intelligence and Statistics, pp. 514-522. PMLR, 2021.
[62]
Yin, D., Kannan, R., and Bartlett, P. Rademacher complexity for adversarially robust generalization. In International Conference on Machine Learning, pp. 7085-7094. PMLR, 2019.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'23: Proceedings of the 40th International Conference on Machine Learning
July 2023
43479 pages

Publisher

JMLR.org

Publication History

Published: 23 July 2023

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media