Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary β-Mixing Processes

Published: 01 August 2010 Publication History

Abstract

PAC-Bayes bounds are among the most accurate generalization bounds for classifiers learned from independently and identically distributed (IID) data, and it is particularly so for margin classifiers: there have been recent contributions showing how practical these bounds can be either to perform model selection (Ambroladze et al., 2007) or even to directly guide the learning of linear classifiers (Germain et al., 2009). However, there are many practical situations where the training data show some dependencies and where the traditional IID assumption does not hold. Stating generalization bounds for such frameworks is therefore of the utmost interest, both from theoretical and practical standpoints. In this work, we propose the first--to the best of our knowledge--PAC-Bayes generalization bounds for classifiers trained on data exhibiting interdependencies. The approach undertaken to establish our results is based on the decomposition of a so-called dependency graph that encodes the dependencies within the data, in sets of independent data, thanks to graph fractional covers. Our bounds are very general, since being able to find an upper bound on the fractional chromatic number of the dependency graph is sufficient to get new PAC-Bayes bounds for specific settings. We show how our results can be used to derive bounds for ranking statistics (such as AUC) and classifiers trained on data distributed according to a stationary β-mixing process. In the way, we show how our approach seamlessly allows us to deal with U-processes. As a side note, we also provide a PAC-Bayes generalization bound for classifiers learned on data from stationary φ-mixing distributions.

References

[1]
S. Agarwal and P. Niyogi. Generalization bounds for ranking algorithms via algorithmic stability. Journal of Machine Learning Research, 10:441-474, 2009.
[2]
S. Agarwal, T. Graepel, R. Herbrich, S. Har-Peled, and D. Roth. Generalization bounds for the area under the ROC curve. Journal of Machine Learning Research, 6:393-425, 2005.
[3]
A. Ambroladze, E. Parrado-Hernandez, and J. Shawe-Taylor. Tighter PAC-Bayes bounds. In Adv. in Neural Information Processing Systems 19, pages 9-16, 2007.
[4]
K. Ataman, W. Nick, and Y. Zhang. Learning to rank by maximizing auc with linear programming. In In IEEE International Joint Conference on Neural Networks (IJCNN 2006), pages 123-129, 2006.
[5]
J.-Y. Audibert and O. Bousquet. Combining PAC-bayesian and generic chaining bounds. Journal of Machine Learning Research, 8:863-889, 2007. ISSN 1533-7928.
[6]
P. Bartlett, O. Bousquet, and S. Mendelson. Local rademacher complexities. Annals of Statistics, 33(4):1497-1537, 2005.
[7]
P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463-482, 2002.
[8]
G. Blanchard and F. Fleuret. Occam's hammer. In COLT, pages 112-126, 2007.
[9]
O. Bousquet and A. Elisseeff. Stability and generalization. Journal of Machine Learning Research, 2:499-526, March 2002.
[10]
U. Brefeld and T. Scheffer. AUC maximizing support vector learning. In Proc. of the ICML Workshop on ROC Analysis in Machine Learning, 2005.
[11]
O. Catoni. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 56 of Lecture Notes-Monograph Series. Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2007.
[12]
S. Clémençon, G. Lugosi, and N. Vayatis. Ranking and empirical minimization of U-statistics. The Annals of Statistics, 36(2):844-874, April 2008. ISSN 0090-5364.
[13]
C. Cortes and M. Mohri. AUC optimization vs. error rate minimization. In Adv. in Neural Information Processing Systems 16, 2004.
[14]
Y. Freund, R. Iyer, R.E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933-969, 2003.
[15]
P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. PAC-Bayesian learning of linear classifiers. In Proc. of the 26th Annual International Conference on Machine Learning, pages 353-360, 2009.
[16]
R. Herbrich and T. Graepel. A PAC-Bayesian margin bound for linear classifiers: Why svms work. In Advances in Neural Information Processing Systems 13, pages 224-230, 2001.
[17]
W. Hoeffding. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics, 19(3):293-325, 1948.
[18]
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13-30, 1963.
[19]
S. Janson. Large deviations for sums of partly dependent random variables. Random Structures Algorithms, 24:234-248, 2004.
[20]
L. Kontorovich and K. Ramanan. Concentration inequalities for dependent random variables via the martingale method. The Annals of Probability, 36(6):2126-2158, 2008.
[21]
A. Lacasse, F. Laviolette, M. Marchand, P. Germain, and N. Usunier. PAC-Bayes bounds for the risk of the majority vote and the variance of the Gibbs classifier. In Advances in Neural Information Processing Systems 19, pages 769-776, 2006.
[22]
J. Langford. Tutorial on Practical Theory for Classification. Journal of Machine Learning Research, pages 273-306, 2005.
[23]
J. Langford and J. Shawe-taylor. PAC-Bayes and margins. In Adv. in Neural Information Processing Systems 15, pages 439-446, 2002.
[24]
D. McAllester. Some PAC-Bayesian Theorems. Machine Learning, 37:355-363, 1999.
[25]
D. McAllester. Simplified pac-bayesian margin bounds. In Proc. of the 16th Annual Conference on Computational Learning Theory, pages 203-215, 2003.
[26]
C. McDiarmid. On the method of bounded differences. Survey in Combinatorics, pages 148-188, 1989.
[27]
M. Mohri and A. Rostamizadeh. Rademacher complexity bounds for non-i.i.d. processes. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1097-1104, 2009.
[28]
S. V. Pemmaraju. Equitable coloring extends chernoff-hoeffding bounds. In RANDOM-APPROX, pages 285-296, 2001.
[29]
A. Rakotomamonjy. Optimizing the area under the ROC curve with SVMs. In ROC Analysis in Artificial Intelligence, pages 71-80, 2004.
[30]
E.R. Schreinerman and D.H. Ullman. Fractional Graph Theory: A Rational Approach to the Theory of Graphs. Wiley Interscience Series in Discrete Math., 1997.
[31]
M. Seeger. PAC-Bayesian generalization bounds for Gaussian processes. Journal of Machine Learning Research, 3:233-269, 2002a.
[32]
M. Seeger. The proof of McAllester's PAC-Bayesian theorem. Technical report, Institute for ANC, Edinburgh, UK, 2002b.
[33]
N. Usunier, M.-R. Amini, and P. Gallinari. A data-dependent generalisation error bound for the AUC. In Proc. of the ICML Workshop on ROC Analysis in Machine Learning, 2005.
[34]
N. Usunier, M.-R. Amini, and P. Gallinari. Generalization error bounds for classifiers trained with interdependent data. In Adv. in Neural Information Processing Systems 18, pages 1369-1376, 2006.
[35]
B. Yu. Rates of convergence for empirical processes of stationary mixing sequences. Annals of Probability, 22(1):94-116, 1994.

Cited By

View all
  • (2024)Generalization bounds for learning under graph-dependence: a surveyMachine Language10.1007/s10994-024-06536-9113:7(3929-3959)Online publication date: 1-Jul-2024
  • (2024)Self-certified Tuple-Wise Deep LearningMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70344-7_18(303-320)Online publication date: 8-Sep-2024
  • (2023)Shedding a PAC-Bayesian light on adaptive sliced-Wasserstein distancesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619510(26451-26473)Online publication date: 23-Jul-2023
  • Show More Cited By

Index Terms

  1. Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary β-Mixing Processes

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image The Journal of Machine Learning Research
      The Journal of Machine Learning Research  Volume 11, Issue
      3/1/2010
      3637 pages
      ISSN:1532-4435
      EISSN:1533-7928
      Issue’s Table of Contents

      Publisher

      JMLR.org

      Publication History

      Published: 01 August 2010
      Published in JMLR Volume 11

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)29
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 13 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Generalization bounds for learning under graph-dependence: a surveyMachine Language10.1007/s10994-024-06536-9113:7(3929-3959)Online publication date: 1-Jul-2024
      • (2024)Self-certified Tuple-Wise Deep LearningMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70344-7_18(303-320)Online publication date: 8-Sep-2024
      • (2023)Shedding a PAC-Bayesian light on adaptive sliced-Wasserstein distancesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619510(26451-26473)Online publication date: 23-Jul-2023
      • (2023)Optimizing Two-Way Partial AUC With an End-to-End FrameworkIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.318531145:8(10228-10246)Online publication date: 1-Aug-2023
      • (2023)Boundary-restricted metric learningMachine Language10.1007/s10994-023-06380-3112:12(4723-4762)Online publication date: 20-Sep-2023
      • (2023)PAC-Bayesian offline Meta-reinforcement learningApplied Intelligence10.1007/s10489-023-04911-y53:22(27128-27147)Online publication date: 1-Nov-2023
      • (2022)Distributionally Robust Conditional Quantile Prediction with Fixed DesignManagement Science10.1287/mnsc.2020.390368:3(1639-1658)Online publication date: 1-Mar-2022
      • (2021)On empirical risk minimization with dependent and heavy-tailed dataProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540943(8913-8926)Online publication date: 6-Dec-2021
      • (2018)Unsupervised Coupled Metric Similarity for Non-IID Categorical DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.280853230:9(1810-1823)Online publication date: 1-Sep-2018
      • (2018)A clustering algorithm using skewness-based boundary detectionNeurocomputing10.1016/j.neucom.2017.09.023275:C(618-626)Online publication date: 31-Jan-2018
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media