research-article

Generalization bounds for learning under graph-dependence: a survey

Authors:

Massih-Reza AminiAuthors Info & Claims

Machine Learning, Volume 113, Issue 7

Pages 3929 - 3959

https://doi.org/10.1007/s10994-024-06536-9

Published: 03 April 2024 Publication History

Abstract

Traditional statistical learning theory relies on the assumption that data are identically and independently distributed (i.i.d.). However, this assumption often does not hold in many real-life applications. In this survey, we explore learning scenarios where examples are dependent and their dependence relationship is described by a dependency graph, a commonly utilized model in probability and combinatorics. We collect various graph-dependent concentration bounds, which are then used to derive Rademacher complexity and stability generalization bounds for learning from graph-dependent data. We illustrate this paradigm through practical learning tasks and provide some research directions for future work. To our knowledge, this survey is the first of this kind on this subject.

References

[1]

Agarwal S and Niyogi P Generalization bounds for ranking algorithms via algorithmic stability Journal of Machine Learning Research 2009 10 16 441-474

Digital Library

[2]

Amini MR and Usunier N Learning with partially labeled and interdependent data 2015 Springer

[3]

Anselin L Spatial econometrics: Methods and models 2013 Springer

[4]

Baldi P and Rinott Y On normal approximations of distributions in terms of dependency graphs The Annals of Probability 1989 17 4 1646-1650

[5]

Bartlett PL and Mendelson S Rademacher and Gaussian complexities: Risk bounds and structural results Journal of Machine Learning Research 2002 3 463-482

Digital Library

[6]

Betlei, A., Diemert, E., & Amini, M. (2021). Uplift modeling with generalization guarantees. In 27th ACM SIGKDD conference on knowledge discovery and data mining (pp. 55–65).

[7]

Bollobás B Modern graph theory 1998 Springer

[8]

Boser, B., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the 5th annual workshop on computational learning theory (COLT’92) (pp. 144–152).

[9]

Boucheron S, Lugosi G, and Massart P Concentration inequalities: A nonasymptotic theory of independence 2013 Oxford University Press

[10]

Bousquet O and Elisseeff A Stability and generalization Journal of Machine Learning Research 2002 2 499-526

Digital Library

[11]

Bousquet, O., Klochkov, Y., & Zhivotovskiy, N. (2020). Sharper bounds for uniformly stable algorithms. In Conference on learning theory (pp. 610–626). PMLR.

[12]

Chen, L. H. (1975). Poisson approximation for dependent trials. The Annals of Probability, 534–545

[13]

Chen LH Two central limit problems for dependent random variables Probability Theory and Related Fields 1978 43 3 223-243

[14]

Cortes, C., & Mohri, M. (2004). AUC optimization vs. error rate minimization. In Advances in neural information processing systems.

[15]

Dehling, H., & Philipp, W. (2002). Empirical process techniques for dependent data. In Empirical process techniques for dependent data (pp. 3–113). Springer.

[16]

Devroye L and Wagner T Distribution-free performance bounds for potential function rules IEEE Transactions on Information Theory 1979 25 5 601-604

Digital Library

[17]

Dousse J and Féray V Weighted dependency graphs and the Ising model Annales de l’Institut Henri Poincaré D 2019 6 4 533-571

[18]

Erdős P and Lovász L Problems and results on 3-chromatic hypergraphs and some related questions Infinite and Finite Sets 1975 10 2 609-627

[19]

Feldman, V., & Vondrak, J. (2019). High probability generalization bounds for uniformly stable algorithms with nearly optimal rate. In Conference on learning theory (pp. 1270–1279). PMLR.

[20]

Féray, V. (2018). Weighted dependency graphs. Electronic Journal of Probability,23.

[21]

Freund Y, Iyer RD, Schapire RE, et al. An efficient boosting algorithm for combining preferences Journal of Machine Learning Research 2003 4 933-969

Digital Library

[22]

Halin R Tree-partitions of infinite graphs Discrete Mathematics 1991 97 1–3 203-217

Digital Library

[23]

Hang H and Steinwart I Fast learning from

α

-mixing observations Journal of Multivariate Analysis 2014 127 184-199

[24]

He F, Zuo L, and Chen H Stability analysis for ranking with stationary

φ

-mixing samples Neurocomputing 2016 171 1556-1562

Digital Library

[25]

Hoeffding W and Robbins H The central limit theorem for dependent random variables Duke Mathematical Journal 1948 15 3 773-780

[26]

Ibragimov IA Some limit theorems for stationary processes Theory of Probability & its Applications 1962 7 4 349-382

[27]

Isaev, M., Rodionov, I., & Zhang, R.R. et al (2021). Extremal independence in discrete random systems. arXiv preprint arXiv:2105.04917

[28]

Janson S Normal convergence by higher semiinvariants with applications to sums of dependent random variables and random graphs The Annals of Probability 1988 16 1 305-312

[29]

Janson S Poisson approximation for large deviations Random Structures & Algorithms 1990 1 2 221-229

[30]

Janson S Large deviations for sums of partly dependent random variables Random Structures & Algorithms 2004 24 3 234-248

[31]

Janson, S., Łuczak, T., & Rucinski, A. (1988). An exponential bound for the probability of nonexistence of a specified subgraph in a random graph. Institute for Mathematics and its Applications (USA)

[32]

Kearns M and Ron D Algorithmic stability and sanity-check bounds for leave-one-out cross-validation Neural Computation 1999 11 6 1427-1453

Digital Library

[33]

Kirichenko A and Van Zanten H Optimality of Poisson processes intensity learning with Gaussian processes The Journal of Machine Learning Research 2015 16 1 2909-2919

Digital Library

[34]

Kontorovich, L. (2007). Measure concentration of strongly mixing processes with applications. Carnegie Mellon University.

[35]

Kontorovich LA and Ramanan K Concentration inequalities for dependent random variables via the martingale method The Annals of Probability 2008 36 6 2126-2158

[36]

Kutin, S., & Niyogi, P. (2002). Almost-everywhere algorithmic stability and generalization error. In Proceedings of the eighteenth conference on uncertainty in artificial intelligence (pp. 275–282). Morgan Kaufmann Publishers Inc.

[37]

Kuznetsov V and Mohri M Generalization bounds for non-stationary mixing processes Machine Learning 2017 106 1 93-117

Digital Library

[38]

Lampert, C.H., Ralaivola, L., & Zimin, A. (2018). Dependency-dependent bounds for sums of dependent random variables. arXiv preprint arXiv:1811.01404

[39]

Ledoux M and Talagrand M Probability in Banach spaces: Isoperimetry and processes 1991 Springer

[40]

Linderman, S., & Adams, R. (2014). Discovering latent network structure in point process data. In International conference on machine learning (pp. 1413–1421).

[41]

Lozano, A. C., Kulkarni, S. R., & Schapire, R. E. (2006). Convergence and consistency of regularized boosting algorithms with stationary

β

-mixing observations. In: Advances in neural information processing systems (pp. 819–826).

[42]

McDiarmid C On the method of bounded differences Surveys in Combinatorics 1989 141 1 148-188

[43]

Meir R Nonparametric time series prediction through adaptive model selection Machine Learning 2000 39 1 5-34

Digital Library

[44]

Mohri, M., & Rostamizadeh, A. (2008). Stability bounds for non-i.i.d. processes. In Advances in neural information processing systems (pp. 1025–1032).

[45]

Mohri, M., & Rostamizadeh, A. (2009). Rademacher complexity bounds for non-i.i.d. processes. In Advances in neural information processing systems (pp. 1097–1104).

[46]

Mohri M and Rostamizadeh A Stability bounds for stationary

φ

-mixing and

β

-mixing processes Journal of Machine Learning Research 2010 11 789-814

Digital Library

[47]

Mohri M, Rostamizadeh A, and Talwalkar A Foundations of machine learning 2018 MIT press

Digital Library

[48]

Peña VH and Giné E Decoupling: From dependence to independence 1999 Springer

[49]

Ralaivola, L., & Amini, M. R. (2015). Entropy-based concentration inequalities for dependent variables. In International conference on machine learning (pp. 2436–2444).

[50]

Ralaivola L, Szafranski M, and Stempfel G Chromatic PAC-Bayes bounds for non-iid data: Applications to ranking and stationary

β

-mixing processes Journal of Machine Learning Research 2010 11 1927-1956

Digital Library

[51]

Rogers, W. H., & Wagner, T. J. (1978). A finite sample distribution-free performance bound for local discrimination rules. The Annals of Statistics, 506–514.

[52]

Rosenblatt M A central limit theorem and a strong mixing condition Proceedings of the National Academy of Sciences of the United States of America 1956 42 1 43

[53]

Schapire RE and Singer Y Improved boosting algorithms using confidence-rated predictions Machine Learning 1999 37 3 297-336

Digital Library

[54]

Seese, D. (1985). Tree-partite graphs and the complexity of algorithms. In International conference on fundamentals of computation theory (pp.412–421) Springer.

[55]

Sidana S, Trofimov M, Horodnytskyi O, et al. User preference and embedding learning with implicit feedback for recommender systems Data Mining Knowledge Discovery 2021 35 2 568-592

Digital Library

[56]

Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 2: Probability theory. The Regents of the University of California.

[57]

Steinwart, I., & Christmann, A. (2009). Fast learning from non-i.i.d. observations. In Advances in neural information processing systems (pp. 1768–1776).

[58]

Usunier N, Amini MR, and Gallinari P Generalization error bounds for classifiers trained with interdependent data Advances in Neural Information Processing Systems 2005 18 1369-1376

[59]

Volkonskii V and Rozanov YA Some limit theorems for random functions. I Theory of Probability & its Applications 1959 4 2 178-197

[60]

Weston, J., & Watkins, C. (1998). Multi-class support vector machines.

[61]

Wood DR On tree-partition-width European Journal of Combinatorics 2009 30 5 1245-1253

Digital Library

[62]

Yu, B. (1994). Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, 94–116.

[63]

Zhang RR When Janson meets McDiarmid: Bounded difference inequalities under graph-dependence Statistics & Probability Letters 2022 181 109 272

[64]

Zhang RR, Liu X, Wang Y, and Wang L McDiarmid-type inequalities for graph-dependent variables and stability bounds Advances in Neural Information Processing Systems 2019 32 10890-10901

Recommendations

Generalization bounds for learning weighted automata

This paper studies the problem of learning weighted automata from a finite sample of strings with real-valued labels. We consider several hypothesis classes of weighted automata defined in terms of three different measures: the norm of an automaton's ...
Relative deviation learning bounds and generalization with unbounded loss functions

We present an extensive analysis of relative deviation bounds, including detailed proofs of two-sided inequalities and their implications. We also give detailed proofs of two-sided generalization bounds that hold in the general case of unbounded loss ...
Generalization bounds for non-stationary mixing processes

This paper presents the first generalization bounds for time series prediction with a non-stationary mixing stochastic process. We prove Rademacher complexity learning bounds for both average-path generalization with non-stationary $$\beta $$ -mixing ...

Comments

Information & Contributors

Information

Published In

cover image Machine Language

Machine Language Volume 113, Issue 7

Jul 2024

1045 pages

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 03 April 2024

Accepted: 05 March 2024

Revision received: 29 January 2024

Received: 18 May 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents