article

Free access

Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary β-Mixing Processes

Authors:

Liva Ralaivola,

Marie Szafranski,

Guillaume StempfelAuthors Info & Claims

The Journal of Machine Learning Research, Volume 11

Pages 1927 - 1956

Published: 01 August 2010 Publication History

Abstract

PAC-Bayes bounds are among the most accurate generalization bounds for classifiers learned from independently and identically distributed (IID) data, and it is particularly so for margin classifiers: there have been recent contributions showing how practical these bounds can be either to perform model selection (Ambroladze et al., 2007) or even to directly guide the learning of linear classifiers (Germain et al., 2009). However, there are many practical situations where the training data show some dependencies and where the traditional IID assumption does not hold. Stating generalization bounds for such frameworks is therefore of the utmost interest, both from theoretical and practical standpoints. In this work, we propose the first--to the best of our knowledge--PAC-Bayes generalization bounds for classifiers trained on data exhibiting interdependencies. The approach undertaken to establish our results is based on the decomposition of a so-called dependency graph that encodes the dependencies within the data, in sets of independent data, thanks to graph fractional covers. Our bounds are very general, since being able to find an upper bound on the fractional chromatic number of the dependency graph is sufficient to get new PAC-Bayes bounds for specific settings. We show how our results can be used to derive bounds for ranking statistics (such as AUC) and classifiers trained on data distributed according to a stationary β-mixing process. In the way, we show how our approach seamlessly allows us to deal with U-processes. As a side note, we also provide a PAC-Bayes generalization bound for classifiers learned on data from stationary φ-mixing distributions.

References

[1]

S. Agarwal and P. Niyogi. Generalization bounds for ranking algorithms via algorithmic stability. Journal of Machine Learning Research, 10:441-474, 2009.

Digital Library

[2]

S. Agarwal, T. Graepel, R. Herbrich, S. Har-Peled, and D. Roth. Generalization bounds for the area under the ROC curve. Journal of Machine Learning Research, 6:393-425, 2005.

Digital Library

[3]

A. Ambroladze, E. Parrado-Hernandez, and J. Shawe-Taylor. Tighter PAC-Bayes bounds. In Adv. in Neural Information Processing Systems 19, pages 9-16, 2007.

[4]

K. Ataman, W. Nick, and Y. Zhang. Learning to rank by maximizing auc with linear programming. In In IEEE International Joint Conference on Neural Networks (IJCNN 2006), pages 123-129, 2006.

[5]

J.-Y. Audibert and O. Bousquet. Combining PAC-bayesian and generic chaining bounds. Journal of Machine Learning Research, 8:863-889, 2007. ISSN 1533-7928.

Digital Library

[6]

P. Bartlett, O. Bousquet, and S. Mendelson. Local rademacher complexities. Annals of Statistics, 33(4):1497-1537, 2005.

[7]

P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463-482, 2002.

Digital Library

[8]

G. Blanchard and F. Fleuret. Occam's hammer. In COLT, pages 112-126, 2007.

Digital Library

[9]

O. Bousquet and A. Elisseeff. Stability and generalization. Journal of Machine Learning Research, 2:499-526, March 2002.

Digital Library

[10]

U. Brefeld and T. Scheffer. AUC maximizing support vector learning. In Proc. of the ICML Workshop on ROC Analysis in Machine Learning, 2005.

[11]

O. Catoni. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 56 of Lecture Notes-Monograph Series. Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2007.

[12]

S. Clémençon, G. Lugosi, and N. Vayatis. Ranking and empirical minimization of U-statistics. The Annals of Statistics, 36(2):844-874, April 2008. ISSN 0090-5364.

[13]

C. Cortes and M. Mohri. AUC optimization vs. error rate minimization. In Adv. in Neural Information Processing Systems 16, 2004.

[14]

Y. Freund, R. Iyer, R.E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933-969, 2003.

Digital Library

[15]

P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. PAC-Bayesian learning of linear classifiers. In Proc. of the 26th Annual International Conference on Machine Learning, pages 353-360, 2009.

Digital Library

[16]

R. Herbrich and T. Graepel. A PAC-Bayesian margin bound for linear classifiers: Why svms work. In Advances in Neural Information Processing Systems 13, pages 224-230, 2001.

[17]

W. Hoeffding. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics, 19(3):293-325, 1948.

[18]

W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13-30, 1963.

[19]

S. Janson. Large deviations for sums of partly dependent random variables. Random Structures Algorithms, 24:234-248, 2004.

Digital Library

[20]

L. Kontorovich and K. Ramanan. Concentration inequalities for dependent random variables via the martingale method. The Annals of Probability, 36(6):2126-2158, 2008.

[21]

A. Lacasse, F. Laviolette, M. Marchand, P. Germain, and N. Usunier. PAC-Bayes bounds for the risk of the majority vote and the variance of the Gibbs classifier. In Advances in Neural Information Processing Systems 19, pages 769-776, 2006.

[22]

J. Langford. Tutorial on Practical Theory for Classification. Journal of Machine Learning Research, pages 273-306, 2005.

Digital Library

[23]

J. Langford and J. Shawe-taylor. PAC-Bayes and margins. In Adv. in Neural Information Processing Systems 15, pages 439-446, 2002.

[24]

D. McAllester. Some PAC-Bayesian Theorems. Machine Learning, 37:355-363, 1999.

Digital Library

[25]

D. McAllester. Simplified pac-bayesian margin bounds. In Proc. of the 16th Annual Conference on Computational Learning Theory, pages 203-215, 2003.

[26]

C. McDiarmid. On the method of bounded differences. Survey in Combinatorics, pages 148-188, 1989.

[27]

M. Mohri and A. Rostamizadeh. Rademacher complexity bounds for non-i.i.d. processes. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1097-1104, 2009.

[28]

S. V. Pemmaraju. Equitable coloring extends chernoff-hoeffding bounds. In RANDOM-APPROX, pages 285-296, 2001.

Digital Library

[29]

A. Rakotomamonjy. Optimizing the area under the ROC curve with SVMs. In ROC Analysis in Artificial Intelligence, pages 71-80, 2004.

[30]

E.R. Schreinerman and D.H. Ullman. Fractional Graph Theory: A Rational Approach to the Theory of Graphs. Wiley Interscience Series in Discrete Math., 1997.

[31]

M. Seeger. PAC-Bayesian generalization bounds for Gaussian processes. Journal of Machine Learning Research, 3:233-269, 2002a.

Digital Library

[32]

M. Seeger. The proof of McAllester's PAC-Bayesian theorem. Technical report, Institute for ANC, Edinburgh, UK, 2002b.

[33]

N. Usunier, M.-R. Amini, and P. Gallinari. A data-dependent generalisation error bound for the AUC. In Proc. of the ICML Workshop on ROC Analysis in Machine Learning, 2005.

[34]

N. Usunier, M.-R. Amini, and P. Gallinari. Generalization error bounds for classifiers trained with interdependent data. In Adv. in Neural Information Processing Systems 18, pages 1369-1376, 2006.

[35]

B. Yu. Rates of convergence for empirical processes of stationary mixing sequences. Annals of Probability, 22(1):94-116, 1994.

Cited By

Zhang RAmini M(2024)Generalization bounds for learning under graph-dependence: a surveyMachine Language10.1007/s10994-024-06536-9113:7(3929-3959)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s10994-024-06536-9
Zhou SLei YKabán A(2024)Self-certified Tuple-Wise Deep LearningMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70344-7_18(303-320)Online publication date: 8-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-70344-7_18
Ohana RNadjahi KRakotomamonjy ARalaivola LKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Shedding a PAC-Bayesian light on adaptive sliced-Wasserstein distancesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619510(26451-26473)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619510
Show More Cited By

Index Terms

Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary β-Mixing Processes
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Fast-rate PAC-bayes generalization bounds via shifted rademacher processes
NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems

The developments of Rademacher complexity and PAC-Bayesian theory have been largely independent. One exception is the PAC-Bayes theorem of Kakade, Sridharan, and Tewari [21], which is established via Rademacher complexity theory by viewing Gibbs ...
PAC-bayes bounds with data dependent priors

This paper presents the prior PAC-Bayes bound and explores its capabilities as a tool to provide tight predictions of SVMs' generalization. The computation of the bound involves estimating a prior of the distribution of classifiers from the available ...
How tight can PAC-Bayes be in the small data regime?
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

In this paper, we investigate the question: Given a small number of datapoints, for example N = 30, how tight can PAC-Bayes and test set bounds be made? For such small datasets, test set bounds adversely affect generalisation performance by withholding ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 11, Issue

3/1/2010

3637 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 August 2010

Published in JMLR Volume 11

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
131
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)5

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang RAmini M(2024)Generalization bounds for learning under graph-dependence: a surveyMachine Language10.1007/s10994-024-06536-9113:7(3929-3959)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s10994-024-06536-9
Zhou SLei YKabán A(2024)Self-certified Tuple-Wise Deep LearningMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70344-7_18(303-320)Online publication date: 8-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-70344-7_18
Ohana RNadjahi KRakotomamonjy ARalaivola LKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Shedding a PAC-Bayesian light on adaptive sliced-Wasserstein distancesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619510(26451-26473)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619510
Yang ZXu QBao SHe YCao XHuang Q(2023)Optimizing Two-Way Partial AUC With an End-to-End FrameworkIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.318531145:8(10228-10246)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TPAMI.2022.3185311
Chen SGong CLi XYang JNiu GSugiyama M(2023)Boundary-restricted metric learningMachine Language10.1007/s10994-023-06380-3112:12(4723-4762)Online publication date: 20-Sep-2023
https://dl.acm.org/doi/10.1007/s10994-023-06380-3
Sun ZJing CGuo SAn L(2023)PAC-Bayesian offline Meta-reinforcement learningApplied Intelligence10.1007/s10489-023-04911-y53:22(27128-27147)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1007/s10489-023-04911-y
Qi MCao YShen Z(2022)Distributionally Robust Conditional Quantile Prediction with Fixed DesignManagement Science10.1287/mnsc.2020.390368:3(1639-1658)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1287/mnsc.2020.3903
Roy ABalasubramanian KErdogdu MRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)On empirical risk minimization with dependent and heavy-tailed dataProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540943(8913-8926)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540943
Jian SCao LLu KGao H(2018)Unsupervised Coupled Metric Similarity for Non-IID Categorical DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.280853230:9(1810-1823)Online publication date: 1-Sep-2018
https://dl.acm.org/doi/10.1109/TKDE.2018.2808532
Li XHan QQiu B(2018)A clustering algorithm using skewness-based boundary detectionNeurocomputing10.1016/j.neucom.2017.09.023275:C(618-626)Online publication date: 31-Jan-2018
https://dl.acm.org/doi/10.1016/j.neucom.2017.09.023
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents