Skip to main content

Kenji Fukumizu

Followers

41

Following

24

Co-authors

2

Public Views

Interests

Uploads

Papers by Kenji Fukumizu

A Kernel Stein Test for Comparing Latent Variable Models

ArXiv, 2019

We propose a nonparametric, kernel-based test to assess the relative goodness of fit of latent va... more We propose a nonparametric, kernel-based test to assess the relative goodness of fit of latent variable models with intractable unnormalized densities. Our test generalises the kernel Stein discrepancy (KSD) tests of (Liu et al., 2016, Chwialkowski et al., 2016, Yang et al., 2018, Jitkrittum et al., 2018) which required exact access to unnormalized densities. Our new test relies on the simple idea of using an approximate observed-variable marginal in place of the exact, intractable one. As our main theoretical contribution, we prove that the new test, with a properly corrected threshold, has a well-controlled type-I error. In the case of models with low-dimensional latent structure and high-dimensional observations, our test significantly outperforms the relative maximum mean discrepancy test (Bounliphone et al., 2015) , which cannot exploit the latent structure.

Graph Zeta Function in the Bethe Free Energy and Loopy Belief Propagation

Computing Research Repository, 2010

A Regularity Condition of the Information Matrix of a Multilayer Perceptron Network

Neural Networks, 1996

The Fisher information matrix of a multi-layer perceptron network can be singular at certain para... more The Fisher information matrix of a multi-layer perceptron network can be singular at certain parameters, and in such cases many statistical techniques based on asymptotic theory cannot be applied properly. In this paper, we prove rigorously that the Fisher information matrix of a three-layer perceptron network is positive definite if and only if the network is irreducible; that is, if there is no hidden unit that makes no contribution to the output and there is no pair of hidden units that could be collapsed to a single unit without altering the input-output map. This implies that a network that has a singular Fisher information matrix can be reduced to a network with a positive definite Fisher information matrix by eliminating redundant hidden units. Copyright 1996 Elsevier Science Ltd

Local minima and plateaus in hierarchical structures of multilayer perceptrons

Neural Networks, 2000

Adaptive natural gradient learning algorithms for various stochastic models

Neural Networks, 2000

Critical Lines in Symmetry of Mixture Models and its Application to Component Splitting

Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons

Neural Computation, 2000

The natural gradient learning method is known to have ideal performances for on-line training of ... more The natural gradient learning method is known to have ideal performances for on-line training of multilayer perceptrons. It avoids plateaus, which give rise to slow convergence of the backpropagation method. It is Fisher efficient, whereas the conventional method is not. However, for implementing the method, it is necessary to calculate the Fisher information matrix and its inverse, which is practically very difficult. This article proposes an adaptive method of directly obtaining the inverse of the Fisher information matrix. It generalizes the adaptive Gauss-Newton algorithms and provides a solid theoretical justification of them. Simulations show that the proposed adaptive method works very well for realizing natural gradient learning.

Active Learning in Multilayer Perceptrons

Local Minima and Plateaus in Multilayer Neural Networks

Semigroup Kernels on Measures

Journal of Machine Learning Research, 2005

Characteristic Kernels on Groups and Semigroups

Kernel dimension reduction in regression

Annals of Statistics, 2009

A kernel-based causal learning algorithm

Statistical Convergence of Kernel CCA

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

Consistency of kernel canonical correlation analysis

Journal of Machine Learning Research, 2005

While the kernel CCA has been applied in many problems, the convergence of the estimated function... more

Kernel Dimensionality Reduction for Supervised Learning

by Kenji Fukumizu and Michael Jordan

Injective Hilbert Space Embeddings of Probability Measures

Statistical Consistency of Kernel Canonical Correlation Analysis

Journal of Machine Learning Research, 2007

A Kernel Statistical Test of Independence

A Kernel Stein Test for Comparing Latent Variable Models

ArXiv, 2019

We propose a nonparametric, kernel-based test to assess the relative goodness of fit of latent va... more We propose a nonparametric, kernel-based test to assess the relative goodness of fit of latent variable models with intractable unnormalized densities. Our test generalises the kernel Stein discrepancy (KSD) tests of (Liu et al., 2016, Chwialkowski et al., 2016, Yang et al., 2018, Jitkrittum et al., 2018) which required exact access to unnormalized densities. Our new test relies on the simple idea of using an approximate observed-variable marginal in place of the exact, intractable one. As our main theoretical contribution, we prove that the new test, with a properly corrected threshold, has a well-controlled type-I error. In the case of models with low-dimensional latent structure and high-dimensional observations, our test significantly outperforms the relative maximum mean discrepancy test (Bounliphone et al., 2015) , which cannot exploit the latent structure.

Graph Zeta Function in the Bethe Free Energy and Loopy Belief Propagation

Computing Research Repository, 2010

A Regularity Condition of the Information Matrix of a Multilayer Perceptron Network

Neural Networks, 1996

The Fisher information matrix of a multi-layer perceptron network can be singular at certain para... more The Fisher information matrix of a multi-layer perceptron network can be singular at certain parameters, and in such cases many statistical techniques based on asymptotic theory cannot be applied properly. In this paper, we prove rigorously that the Fisher information matrix of a three-layer perceptron network is positive definite if and only if the network is irreducible; that is, if there is no hidden unit that makes no contribution to the output and there is no pair of hidden units that could be collapsed to a single unit without altering the input-output map. This implies that a network that has a singular Fisher information matrix can be reduced to a network with a positive definite Fisher information matrix by eliminating redundant hidden units. Copyright 1996 Elsevier Science Ltd

Local minima and plateaus in hierarchical structures of multilayer perceptrons

Neural Networks, 2000

Adaptive natural gradient learning algorithms for various stochastic models

Neural Networks, 2000

Critical Lines in Symmetry of Mixture Models and its Application to Component Splitting

Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons

Neural Computation, 2000

The natural gradient learning method is known to have ideal performances for on-line training of ... more The natural gradient learning method is known to have ideal performances for on-line training of multilayer perceptrons. It avoids plateaus, which give rise to slow convergence of the backpropagation method. It is Fisher efficient, whereas the conventional method is not. However, for implementing the method, it is necessary to calculate the Fisher information matrix and its inverse, which is practically very difficult. This article proposes an adaptive method of directly obtaining the inverse of the Fisher information matrix. It generalizes the adaptive Gauss-Newton algorithms and provides a solid theoretical justification of them. Simulations show that the proposed adaptive method works very well for realizing natural gradient learning.

Active Learning in Multilayer Perceptrons

Local Minima and Plateaus in Multilayer Neural Networks

Semigroup Kernels on Measures

Journal of Machine Learning Research, 2005

Characteristic Kernels on Groups and Semigroups

Kernel dimension reduction in regression

Annals of Statistics, 2009

A kernel-based causal learning algorithm

Statistical Convergence of Kernel CCA

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

Consistency of kernel canonical correlation analysis

Journal of Machine Learning Research, 2005

While the kernel CCA has been applied in many problems, the convergence of the estimated function... more

Kernel Dimensionality Reduction for Supervised Learning

by Kenji Fukumizu and Michael Jordan

Injective Hilbert Space Embeddings of Probability Measures

Statistical Consistency of Kernel Canonical Correlation Analysis

Journal of Machine Learning Research, 2007

A Kernel Statistical Test of Independence

A Linear-Time Kernel Goodness-of-Fit Test

by Wittawat Jitkrittum and Kenji Fukumizu

We propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number... more We propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number of samples. We learn the test features that best indicate the differences between observed samples and a reference model, by minimizing the false negative rate. These features are constructed via Stein's method, meaning that it is not necessary to compute the normalising constant of the model. We analyse the asymptotic Bahadur efficiency of the new test, and prove that under a mean-shift alternative, our test always has greater relative efficiency than a previous linear-time kernel test, regardless of the choice of parameters for that test. In experiments, the performance of our method exceeds that of the earlier linear-time test, and matches or exceeds the power of a quadratic-time kernel test. In high dimensions and where model structure may be exploited, our goodness of fit test performs far better than a quadratic-time two-sample test based on the Maximum Mean Discrepancy, with samples drawn from the model.