Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Generalization bounds for learning under graph-dependence: a survey

Published: 03 April 2024 Publication History

Abstract

Traditional statistical learning theory relies on the assumption that data are identically and independently distributed (i.i.d.). However, this assumption often does not hold in many real-life applications. In this survey, we explore learning scenarios where examples are dependent and their dependence relationship is described by a dependency graph, a commonly utilized model in probability and combinatorics. We collect various graph-dependent concentration bounds, which are then used to derive Rademacher complexity and stability generalization bounds for learning from graph-dependent data. We illustrate this paradigm through practical learning tasks and provide some research directions for future work. To our knowledge, this survey is the first of this kind on this subject.

References

[1]
Agarwal S and Niyogi P Generalization bounds for ranking algorithms via algorithmic stability Journal of Machine Learning Research 2009 10 16 441-474
[2]
Amini MR and Usunier N Learning with partially labeled and interdependent data 2015 Springer
[3]
Anselin L Spatial econometrics: Methods and models 2013 Springer
[4]
Baldi P and Rinott Y On normal approximations of distributions in terms of dependency graphs The Annals of Probability 1989 17 4 1646-1650
[5]
Bartlett PL and Mendelson S Rademacher and Gaussian complexities: Risk bounds and structural results Journal of Machine Learning Research 2002 3 463-482
[6]
Betlei, A., Diemert, E., & Amini, M. (2021). Uplift modeling with generalization guarantees. In 27th ACM SIGKDD conference on knowledge discovery and data mining (pp. 55–65).
[7]
Bollobás B Modern graph theory 1998 Springer
[8]
Boser, B., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the 5th annual workshop on computational learning theory (COLT’92) (pp. 144–152).
[9]
Boucheron S, Lugosi G, and Massart P Concentration inequalities: A nonasymptotic theory of independence 2013 Oxford University Press
[10]
Bousquet O and Elisseeff A Stability and generalization Journal of Machine Learning Research 2002 2 499-526
[11]
Bousquet, O., Klochkov, Y., & Zhivotovskiy, N. (2020). Sharper bounds for uniformly stable algorithms. In Conference on learning theory (pp. 610–626). PMLR.
[12]
Chen, L. H. (1975). Poisson approximation for dependent trials. The Annals of Probability, 534–545
[13]
Chen LH Two central limit problems for dependent random variables Probability Theory and Related Fields 1978 43 3 223-243
[14]
Cortes, C., & Mohri, M. (2004). AUC optimization vs. error rate minimization. In Advances in neural information processing systems.
[15]
Dehling, H., & Philipp, W. (2002). Empirical process techniques for dependent data. In Empirical process techniques for dependent data (pp. 3–113). Springer.
[16]
Devroye L and Wagner T Distribution-free performance bounds for potential function rules IEEE Transactions on Information Theory 1979 25 5 601-604
[17]
Dousse J and Féray V Weighted dependency graphs and the Ising model Annales de l’Institut Henri Poincaré D 2019 6 4 533-571
[18]
Erdős P and Lovász L Problems and results on 3-chromatic hypergraphs and some related questions Infinite and Finite Sets 1975 10 2 609-627
[19]
Feldman, V., & Vondrak, J. (2019). High probability generalization bounds for uniformly stable algorithms with nearly optimal rate. In Conference on learning theory (pp. 1270–1279). PMLR.
[20]
Féray, V. (2018). Weighted dependency graphs. Electronic Journal of Probability,23.
[21]
Freund Y, Iyer RD, Schapire RE, et al. An efficient boosting algorithm for combining preferences Journal of Machine Learning Research 2003 4 933-969
[22]
Halin R Tree-partitions of infinite graphs Discrete Mathematics 1991 97 1–3 203-217
[23]
Hang H and Steinwart I Fast learning from α-mixing observations Journal of Multivariate Analysis 2014 127 184-199
[24]
He F, Zuo L, and Chen H Stability analysis for ranking with stationary φ-mixing samples Neurocomputing 2016 171 1556-1562
[25]
Hoeffding W and Robbins H The central limit theorem for dependent random variables Duke Mathematical Journal 1948 15 3 773-780
[26]
Ibragimov IA Some limit theorems for stationary processes Theory of Probability & its Applications 1962 7 4 349-382
[27]
Isaev, M., Rodionov, I., & Zhang, R.R. et al (2021). Extremal independence in discrete random systems. arXiv preprint arXiv:2105.04917
[28]
Janson S Normal convergence by higher semiinvariants with applications to sums of dependent random variables and random graphs The Annals of Probability 1988 16 1 305-312
[29]
Janson S Poisson approximation for large deviations Random Structures & Algorithms 1990 1 2 221-229
[30]
Janson S Large deviations for sums of partly dependent random variables Random Structures & Algorithms 2004 24 3 234-248
[31]
Janson, S., Łuczak, T., & Rucinski, A. (1988). An exponential bound for the probability of nonexistence of a specified subgraph in a random graph. Institute for Mathematics and its Applications (USA)
[32]
Kearns M and Ron D Algorithmic stability and sanity-check bounds for leave-one-out cross-validation Neural Computation 1999 11 6 1427-1453
[33]
Kirichenko A and Van Zanten H Optimality of Poisson processes intensity learning with Gaussian processes The Journal of Machine Learning Research 2015 16 1 2909-2919
[34]
Kontorovich, L. (2007). Measure concentration of strongly mixing processes with applications. Carnegie Mellon University.
[35]
Kontorovich LA and Ramanan K Concentration inequalities for dependent random variables via the martingale method The Annals of Probability 2008 36 6 2126-2158
[36]
Kutin, S., & Niyogi, P. (2002). Almost-everywhere algorithmic stability and generalization error. In Proceedings of the eighteenth conference on uncertainty in artificial intelligence (pp. 275–282). Morgan Kaufmann Publishers Inc.
[37]
Kuznetsov V and Mohri M Generalization bounds for non-stationary mixing processes Machine Learning 2017 106 1 93-117
[38]
Lampert, C.H., Ralaivola, L., & Zimin, A. (2018). Dependency-dependent bounds for sums of dependent random variables. arXiv preprint arXiv:1811.01404
[39]
Ledoux M and Talagrand M Probability in Banach spaces: Isoperimetry and processes 1991 Springer
[40]
Linderman, S., & Adams, R. (2014). Discovering latent network structure in point process data. In International conference on machine learning (pp. 1413–1421).
[41]
Lozano, A. C., Kulkarni, S. R., & Schapire, R. E. (2006). Convergence and consistency of regularized boosting algorithms with stationary β-mixing observations. In: Advances in neural information processing systems (pp. 819–826).
[42]
McDiarmid C On the method of bounded differences Surveys in Combinatorics 1989 141 1 148-188
[43]
Meir R Nonparametric time series prediction through adaptive model selection Machine Learning 2000 39 1 5-34
[44]
Mohri, M., & Rostamizadeh, A. (2008). Stability bounds for non-i.i.d. processes. In Advances in neural information processing systems (pp. 1025–1032).
[45]
Mohri, M., & Rostamizadeh, A. (2009). Rademacher complexity bounds for non-i.i.d. processes. In Advances in neural information processing systems (pp. 1097–1104).
[46]
Mohri M and Rostamizadeh A Stability bounds for stationary φ-mixing and β-mixing processes Journal of Machine Learning Research 2010 11 789-814
[47]
Mohri M, Rostamizadeh A, and Talwalkar A Foundations of machine learning 2018 MIT press
[48]
Peña VH and Giné E Decoupling: From dependence to independence 1999 Springer
[49]
Ralaivola, L., & Amini, M. R. (2015). Entropy-based concentration inequalities for dependent variables. In International conference on machine learning (pp. 2436–2444).
[50]
Ralaivola L, Szafranski M, and Stempfel G Chromatic PAC-Bayes bounds for non-iid data: Applications to ranking and stationary β-mixing processes Journal of Machine Learning Research 2010 11 1927-1956
[51]
Rogers, W. H., & Wagner, T. J. (1978). A finite sample distribution-free performance bound for local discrimination rules. The Annals of Statistics, 506–514.
[52]
Rosenblatt M A central limit theorem and a strong mixing condition Proceedings of the National Academy of Sciences of the United States of America 1956 42 1 43
[53]
Schapire RE and Singer Y Improved boosting algorithms using confidence-rated predictions Machine Learning 1999 37 3 297-336
[54]
Seese, D. (1985). Tree-partite graphs and the complexity of algorithms. In International conference on fundamentals of computation theory (pp.412–421) Springer.
[55]
Sidana S, Trofimov M, Horodnytskyi O, et al. User preference and embedding learning with implicit feedback for recommender systems Data Mining Knowledge Discovery 2021 35 2 568-592
[56]
Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 2: Probability theory. The Regents of the University of California.
[57]
Steinwart, I., & Christmann, A. (2009). Fast learning from non-i.i.d. observations. In Advances in neural information processing systems (pp. 1768–1776).
[58]
Usunier N, Amini MR, and Gallinari P Generalization error bounds for classifiers trained with interdependent data Advances in Neural Information Processing Systems 2005 18 1369-1376
[59]
Volkonskii V and Rozanov YA Some limit theorems for random functions. I Theory of Probability & its Applications 1959 4 2 178-197
[60]
Weston, J., & Watkins, C. (1998). Multi-class support vector machines.
[61]
Wood DR On tree-partition-width European Journal of Combinatorics 2009 30 5 1245-1253
[62]
Yu, B. (1994). Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, 94–116.
[63]
Zhang RR When Janson meets McDiarmid: Bounded difference inequalities under graph-dependence Statistics & Probability Letters 2022 181 109 272
[64]
Zhang RR, Liu X, Wang Y, and Wang L McDiarmid-type inequalities for graph-dependent variables and stability bounds Advances in Neural Information Processing Systems 2019 32 10890-10901

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Machine Language
Machine Language  Volume 113, Issue 7
Jul 2024
1045 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 03 April 2024
Accepted: 05 March 2024
Revision received: 29 January 2024
Received: 18 May 2022

Author Tags

  1. Generalization bounds
  2. Dependency graphs
  3. Uniform stability
  4. Rademacher complexity
  5. Bipartite ranking

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media