article

Free access

Hilbert Space Embeddings and Metrics on Probability Measures

Authors:

Bharath K. Sriperumbudur,

Arthur Gretton,

Kenji Fukumizu,

Bernhard Schölkopf,

Gert R.G. LanckrietAuthors Info & Claims

The Journal of Machine Learning Research, Volume 11

Pages 1517 - 1561

Published: 01 August 2010 Publication History

Abstract

A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing, and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseudometric on the space of probability measures can be defined as the distance between distribution embeddings: we denote this as γ_k, indexed by the kernel function k that defines the inner product in the RKHS.

We present three theoretical properties of γ_k. First, we consider the question of determining the conditions on the kernel k for which γ_k is a metric: such k are denoted characteristic kernels. Unlike pseudometrics, a metric is zero only when two distributions coincide, thus ensuring the RKHS embedding maps all distributions uniquely (i.e., the embedding is injective). While previously published conditions may apply only in restricted circumstances (e.g., on compact domains), and are difficult to check, our conditions are straightforward and intuitive: integrally strictly positive definite kernels are characteristic. Alternatively, if a bounded continuous kernel is translation-invariant on ℜ^d, then it is characteristic if and only if the support of its Fourier transform is the entire ℜ^d. Second, we show that the distance between distributions under γ_k results from an interplay between the properties of the kernel and the distributions, by demonstrating that distributions are close in the embedding space when their differences occur at higher frequencies. Third, to understand the nature of the topology induced by γ_k, we relate γ_k to other popular metrics on probability measures, and present conditions on the kernel k under which γ_k metrizes the weak topology.

References

[1]

S. M. Ali and S. D. Silvey. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society, Series B (Methodological), 28:131-142, 1966.

[2]

N. Anderson, P. Hall, and D. Titterington. Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. Journal of Multivariate Analysis, 50:41-54, 1994.

Digital Library

[3]

N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 68:337-404, 1950.

[4]

F. R. Bach and M. I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1-48, 2002.

Digital Library

[5]

A. D. Barbour and L. H. Y. Chen. An Introduction to Stein's Method. Singapore University Press, Singapore, 2005.

[6]

C. Berg, J. P. R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups. Spring Verlag, New York, 1984.

[7]

A. Berlinet and C. Thomas-Agnan. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic Publishers, London, UK, 2004.

[8]

K. M. Borgwardt, A. Gretton, M. Rasch, H.-P. Kriegel, B. Schölkopf, and A. J. Smola. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, 22(14):e49- e57, 2006.

Digital Library

[9]

P. Brémaud. Mathematical Principles of Signal Processing. Springer-Verlag, New York, 2001.

[10]

I. Csiszár. Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarium Mathematicarum Hungarica, 2:299-318, 1967.

[11]

W. Dahmen and C. A. Micchelli. Some remarks on ridge functions. Approx. Theory Appl., 3: 139-143, 1987.

[12]

E. del Barrio, J. A. Cuesta-Albertos, C. Matrán, and J. M. Rodríguez-Rodríguez. Testing of goodness of fit based on the L ₂-Wasserstein distance. Annals of Statistics, 27:1230-1239, 1999.

[13]

L. Devroye and L. Györfi. No empirical probability measure can converge in the total variation sense for all distributions. Annals of Statistics, 18(3):1496-1499, 1990.

[14]

R. M. Dudley. Real Analysis and Probability. Cambridge University Press, Cambridge, UK, 2002.

[15]

G. B. Folland. Real Analysis: Modern Techniques and Their Applications. Wiley-Interscience, New York, 1999.

[16]

B. Fuglede and F. Topsøe. Jensen-Shannon divergence and Hilbert space embedding, 2003. Preprint.

[17]

K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. Journal of Machine Learning Research, 5:73-99, 2004.

Digital Library

[18]

K. Fukumizu, A. Gretton, X. Sun, and B. Schölkopf. Kernel measures of conditional dependence. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 489-496, Cambridge, MA, 2008. MIT Press.

Digital Library

[19]

K. Fukumizu, F. R. Bach, and M. I. Jordan. Kernel dimension reduction in regression. Annals of Statistics, 37(5):1871-1905, 2009a.

[20]

K. Fukumizu, B. K. Sriperumbudur, A. Gretton, and B. Schölkopf. Characteristic kernels on groups and semigroups. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 473-480, 2009b.

Digital Library

[21]

C. Gasquet and P.Witomski. Fourier Analysis and Applications. Springer-Verlag, New York, 1999.

Digital Library

[22]

A. L. Gibbs and F. E. Su. On choosing and bounding probability metrics. International Statistical Review, 70(3):419-435, 2002.

[23]

A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Schölkopf. Kernel methods for measuring independence. Journal of Machine Learning Research, 6:2075-2129, December 2005a.

Digital Library

[24]

A. Gretton, A. Smola, O. Bousquet, R. Herbrich, A. Belitski, M. Augath, Y. Murayama, J. Pauls, B. Schölkopf, and N. Logothetis. Kernel constrained covariance for dependence measurement. In Z. Ghahramani and R. Cowell, editors, Proc. 10th International Workshop on Artificial Intelligence and Statistics, pages 1-8, 2005b.

[25]

A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola. A kernel method for the two sample problem. Technical Report 157, MPI for Biological Cybernetics, 2007a.

[26]

A. Gretton, K. M. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola. A kernel method for the two sample problem. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 513-520. MIT Press, 2007b.

Digital Library

[27]

A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Schölkopf, and A. J. Smola. A kernel statistical test of independence. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 585-592. MIT Press, 2008.

Digital Library

[28]

M. Hein and O. Bousquet. Hilbertian metrics and positive definite kernels on probability measures. In Z. Ghahramani and R. Cowell, editors, Proc. 10th International Workshop on Artificial Intelligence and Statistics, pages 136-143, 2005.

[29]

M. Hein, T.N. Lal, and O. Bousquet. Hilbertian metrics on probability measures and their application in SVMs. In Proceedings of the 26th DAGM Symposium, pages 270-277, Berlin, 2004. Springer.

[30]

E. L. Lehmann and J. P. Romano. Testing Statistical Hypothesis. Springer-Verlag, New York, 2005.

[31]

F. Liese and I. Vajda. On divergences and informations in statistics and information theory. IEEE Trans. Information Theory, 52(10):4394-4412, 2006.

Digital Library

[32]

T. Lindvall. Lectures on the Coupling Method. John Wiley & Sons, New York, 1992.

[33]

C. A. Micchelli, Y. Xu, and H. Zhang. Universal kernels. Journal of Machine Learning Research, 7:2651-2667, 2006.

Digital Library

[34]

A. Müller. Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29:429-443, 1997.

[35]

A. Pinkus. Strictly positive definite functions on a real inner product space. Adv. Comput. Math., 20:263-271, 2004.

[36]

S. T. Rachev. Probability Metrics and the Stability of Stochastic Models. John Wiley & Sons, Chichester, 1991.

[37]

S. T. Rachev and L. Rüschendorf. Mass Transportation Problems. Vol. I Theory, Vol. II Applications. Probability and its Applications. Springer-Verlag, Berlin, 1998.

[38]

C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006.

Digital Library

[39]

M. Reed and B. Simon. Methods of Modern Mathematical Physics: Functional Analysis I. Academic Press, New York, 1980.

[40]

M. Rosenblatt. A quadratic measure of deviation of two-dimensional density estimates and a test of independence. Annals of Statistics, 3(1):1-14, 1975.

[41]

W. Rudin. Functional Analysis. McGraw-Hill, USA, 1991.

[42]

B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.

[43]

H. Shen, S. Jegelka, and A. Gretton. Fast kernel-based independent component analysis. IEEE Transactions on Signal Processing, 57:3498 - 3511, 2009.

Digital Library

[44]

G. R. Shorack. Probability for Statisticians. Springer-Verlag, New York, 2000.

[45]

A. J. Smola, A. Gretton, L. Song, and B. Schölkopf. A Hilbert space embedding for distributions. In Proc. 18th International Conference on Algorithmic Learning Theory, pages 13-31. Springer-Verlag, Berlin, Germany, 2007.

Digital Library

[46]

B. K. Sriperumbudur, A. Gretton, K. Fukumizu, G. R. G. Lanckriet, and B. Schölkopf. Injective Hilbert space embeddings of probability measures. In R. Servedio and T. Zhang, editors, Proc. of the 21st Annual Conference on Learning Theory, pages 111-122, 2008.

[47]

B. K. Sriperumbudur, K. Fukumizu, A. Gretton, G. R. G. Lanckriet, and B. Schölkopf. Kernel choice and classifiability for RKHS embeddings of probability distributions. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1750-1758. MIT Press, 2009a.

Digital Library

[48]

B. K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Schölkopf, and G. R. G. Lanckriet. On integral probability metrics, ¿-divergences and binary classification. http://arxiv.org/abs/0901.2698v4, October 2009b.

[49]

B. K. Sriperumbudur, K. Fukumizu, and G. R. G. Lanckriet. On the relation between universality, characteristic kernels and RKHS embedding of measures. In Y. W. Teh and M. Titterington, editors, Proc. 13th International Conference on Artificial Intelligence and Statistics, volume 9 of Workshop and Conference Proceedings. JMLR, 2010a.

[50]

B. K. Sriperumbudur, K. Fukumizu, and G. R. G. Lanckriet. Universality, characteristic kernels and RKHS embedding of measures. http://arxiv.org/abs/1003.0887, March 2010b.

[51]

C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proc. of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, 1972.

[52]

I. Steinwart. On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2:67-93, 2001.

Digital Library

[53]

I. Steinwart and A. Christmann. Support Vector Machines. Springer, 2008.

Digital Library

[54]

J. Stewart. Positive definite functions and generalizations, an historical survey. Rocky Mountain Journal of Mathematics, 6(3):409-433, 1976.

[55]

I. Vajda. Theory of Statistical Inference and Information. Kluwer Academic Publishers, Boston, 1989.

[56]

S. S. Vallander. Calculation of the Wasserstein distance between probability distributions on the line. Theory Probab. Appl., 18:784-786, 1973.

[57]

A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer-Verlag, New York, 1996.

[58]

V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.

[59]

N. Weaver. Lipschitz Algebras. World Scientific Publishing Company, 1999.

[60]

H. Wendland. Scattered Data Approximation. Cambridge University Press, Cambridge, UK, 2005.

Cited By

Gu HGuo XJacobs TKaminsky PLi XBaeza-Yates RBonchi F(2024)Transportation Marketplace Rate Forecast Using Signature TransformProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671637(4997-5005)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671637
Baumann DSchön T(2024)Safe Reinforcement Learning in Uncertain ContextsIEEE Transactions on Robotics10.1109/TRO.2024.335417640(1828-1841)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TRO.2024.3354176
Shen GZhao D(2024)Towards Nonparametric Topological Layers in Neural NetworksAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2259-4_7(91-102)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1007/978-981-97-2259-4_7
Show More Cited By

Index Terms

Hilbert Space Embeddings and Metrics on Probability Measures
1. Computing methodologies
  1. Machine learning
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Reproducing kernel space embeddings and metrics on probability measures
Hilbert Algebras with Hilbert–Galois Connections
Abstract
In this paper we introduce Hilbert algebras with Hilbert–Galois connections (HilGC-algebras) and we study the Hilbert–Galois connections defined in Heyting algebras, called HGC-algebras. We assign a categorical duality between the category HilGC-...
Reproducing kernel Hilbert C*-module and kernel mean embeddings

Kernel methods have been among the most popular techniques in machine learning, where learning tasks are solved using the property of reproducing kernel Hilbert space (RKHS). In this paper, we propose a novel data analysis framework with reproducing ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 11, Issue

3/1/2010

3637 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 August 2010

Published in JMLR Volume 11

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

116
Total Citations
View Citations
656
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)9

Reflects downloads up to 11 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gu HGuo XJacobs TKaminsky PLi XBaeza-Yates RBonchi F(2024)Transportation Marketplace Rate Forecast Using Signature TransformProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671637(4997-5005)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671637
Baumann DSchön T(2024)Safe Reinforcement Learning in Uncertain ContextsIEEE Transactions on Robotics10.1109/TRO.2024.335417640(1828-1841)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TRO.2024.3354176
Shen GZhao D(2024)Towards Nonparametric Topological Layers in Neural NetworksAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2259-4_7(91-102)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1007/978-981-97-2259-4_7
Franceschi JGartrell MSantos LIssenhuth Tde Bézenac EChen MRakotomamonjy AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Unifying GANs and score-based diffusion as generative particle modelsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668732(59729-59760)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668732
Wu JAinsworth ELeakey AWang HHe JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Graph-structured gaussian processes for transferable graph learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668336(50879-50906)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668336
Lou HLi SNi HOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)PCF-GANProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667850(39755-39781)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667850
Yu ZTrapp MKersting KOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Characteristic CircuitsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667598(34074-34086)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667598
Zhang TZhang YZhou TOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Statistical insights into HSIC in high dimensionsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666961(19145-19156)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666961
Bonnier POberhauser HSzabó ZOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Kernelized cumulantsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666610(11049-11074)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666610
Kalinke FSzabó ZEvans RShpitser I(2023)Nyström M-hilbert-schmidt independence criterionProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625929(1005-1015)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625929
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents