Abstract We describe distributed algorithms for two widely-used topic models, namely the Latent D... more Abstract We describe distributed algorithms for two widely-used topic models, namely the Latent Dirichlet Allocation (LDA) model, and the Hierarchical Dirichet Process (HDP) model. In our distributed algorithms the data is partitioned across separate processors and inference is done in a parallel, distributed fashion. We propose two distributed algorithms for LDA. The first algorithm is a straightforward mapping of LDA to a distributed processor setting.
We describe factored spectrally transformed linear mixed models (FaST-LMM), an algorithm for geno... more We describe factored spectrally transformed linear mixed models (FaST-LMM), an algorithm for genome-wide association studies (GWAS) that scales linearly with cohort size in both run time and memory use. On Wellcome Trust data for 15,000 individuals, FaST-LMM ran an order of magnitude faster than current efficient algorithms. Our algorithm can analyze data for 120,000 individuals in just a few hours, whereas current algorithms fail on data for even 20,000 individuals (http://mscompbio. codeplex. com/).
Abstract: How many labeled examples are needed to estimate a classifier's performance on a new da... more Abstract: How many labeled examples are needed to estimate a classifier's performance on a new dataset? We study the case where data is plentiful, but labels are expensive. We show that by making a few reasonable assumptions on the structure of the data, it is possible to estimate performance curves, with confidence bounds, using a small number of ground truth labels. Our approach, which we call Semisupervised Performance Evaluation (SPE), is based on a generative model for the classifier's confidence scores.
Abstract We introduce a new cluster-cumulant expansion (CCE) based on the fixed points of iterati... more Abstract We introduce a new cluster-cumulant expansion (CCE) based on the fixed points of iterative belief propagation (IBP). This expansion is similar in spirit to the loop-series (LS) recently introduced in [1].
The page budget stood at 1872 pages during 2011, which will maintained at the same level in 2012.... more The page budget stood at 1872 pages during 2011, which will maintained at the same level in 2012. As a measure of overall timely review, over the last 12 months the delay encountered from submission to first notification has been under three months. We are striving to improve this turnaround period and I am positive that this will be achieved due to the effort of the great team that we have. I would like to extend my gratitude to the wonderful team of Associate Editors, Guest Editors, reviewers, and IEEE Computer Society staff.
In this paper we consider space-times containing matter expanding or contracting according to a t... more In this paper we consider space-times containing matter expanding or contracting according to a time-dependent scale factor. Cosmologies with vanishing, positive or negative cosmological constant are considered. In the case of vanishing or negative cosmological constant open and closed spatial surfaces are solutions while in the case of positive cosmological constant only closed surfaces exist. The gravitational field is solved explicitly in the case of 1 or 2 particles, 1 black hole, and 1 black hole vacuum state.
Abstract. We find that the momentum conjugate to the relative distance between two gravitating pa... more Abstract. We find that the momentum conjugate to the relative distance between two gravitating particles in their centre-of-mass frame is a hyperbolic angle. This fact suggests that momentum space can be defined consistently on a hyperboloid. We investigate the effect of quantization on this curved momentum space. The coordinates are represented by noncommuting Hermitian operators. We also find that there is a smallest distance between the two particles of one quarter of the Planck length.
Abstract: Variational Bayesian inference and (collapsed) Gibbs sampling are the two important cla... more Abstract: Variational Bayesian inference and (collapsed) Gibbs sampling are the two important classes of inference algorithms for Bayesian networks. Both have their advantages and disadvantages: collapsed Gibbs sampling is unbiased but is also inefficient for large count values and requires averaging over many samples to reduce variance. On the other hand, variational Bayesian inference is efficient and accurate for large count values but suffers from bias for small counts.
Abstract. In this paper we describe the matter-free toroidal spacetime in't Hooft's polygon appro... more Abstract. In this paper we describe the matter-free toroidal spacetime in't Hooft's polygon approach to (2+ 1)-dimensional gravity. First we show that the constraint algebra of the polygons closes (this is a general result, not necessarily derived for a torus). Next we construct a onepolygon torus and find (in contrast to earlier results in the literature) that this slicing of spacetime is not compatible with all the solutions that emerge in the continuum formulation. Finally, we remedy this situation by adding one more polygon.
Abstract: We extend the herding algorithm to continuous spaces by using the kernel trick. The res... more Abstract: We extend the herding algorithm to continuous spaces by using the kernel trick. The resulting" kernel herding" algorithm is an infinite memory deterministic process that learns to approximate a PDF with a collection of samples. We show that kernel herding decreases the error of expectations of functions in the Hilbert space at a rate O (1/T) which is much faster than the usual O (1/pT) for iid random samples. We illustrate kernel herding by approximating Bayesian predictive distributions.
Abstract. We use a computer to follow the evolution of two gravitating particles in a (2+ 1)-dime... more Abstract. We use a computer to follow the evolution of two gravitating particles in a (2+ 1)-dimensional closed universe. In a closed universe there is enough energy to produce a Gott-pair, ie a pair of particles with tachyonic centre of mass, from regular initial data. We study such a pair and find that they can wind around each other with ever increasing momentum. As was shown by't Hooft, the universe must crunch before any closed timelike curve can be traversed.
Deep belief nets are probabilistic generative models that are composed of multiple layers of stoc... more Deep belief nets are probabilistic generative models that are composed of multiple layers of stochastic, latent variables. The latent variables typically have binary values and are often called hidden units or feature detectors. The top two layers have undirected, symmetric connections between them and form an associative memory. The lower layers receive top-down, directed connections from the layer above. The states of the units in the lowest layer represent a data vector.
In this class we will study two methods that model data which are distributed on a curved manifol... more In this class we will study two methods that model data which are distributed on a curved manifold in some D-dimensional space. We have seen how to model data which lie on a linear subspace through methods like Faxtor Analysis (FA) and Probabilistic Principal Component Analysis (PPCA). Here we will adress the issue of how to deal with data that are also intrinsically lower dimenional, but are\ embedded" nonlinearly in a higher dimensional space, ie they lie on a curved manifold in this space.
Abstract The radio spectrum in wireless communication systems is being allocated up quickly with ... more Abstract The radio spectrum in wireless communication systems is being allocated up quickly with ever increasing demands in wireless industries. Cognitive radio is a way of opportunistically sharing the scare spectrum among primary and secondary users of the spectrum. The key challenge in deploying cognitive radio networks is to find out the spectrum holes in the primary wireless systems in order to allow the secondary users to operate.
This is a set of MATLAB functions related to learning in undirected graphical models or Markov Ra... more This is a set of MATLAB functions related to learning in undirected graphical models or Markov Random Fields (MRFs). It is part of an effort to create a publicly available repository where authors can contribute code and benchmark datasets that will allow for free comparision of various methods for learning in MRFs.
Abstract: In recent years a number of methods have been developed for automatically learning the ... more Abstract: In recent years a number of methods have been developed for automatically learning the (sparse) connectivity structure of Markov Random Fields. These methods are mostly based on L1-regularized optimization which has a number of disadvantages such as the inability to assess model uncertainty and expensive cross-validation to find the optimal regularization parameter.
Abstract: The Gibbs sampler is one of the most popular algorithms for inference in statistical mo... more Abstract: The Gibbs sampler is one of the most popular algorithms for inference in statistical models. In this paper, we introduce a herding variant of this algorithm, called herded Gibbs, that is entirely deterministic. We prove that herded Gibbs has an $ O (1/T) $ convergence rate for models with independent variables and for fully connected probabilistic graphical models. Herded Gibbs is shown to outperform Gibbs in the tasks of image denoising with MRFs and named entity recognition with CRFs.
Abstract Most infinite mixture models in the current literature are based on the Dirichlet proces... more Abstract Most infinite mixture models in the current literature are based on the Dirichlet process prior. This prior on partitions implies a very specific (a priori) distribution on cluster sizes. A slightly more general prior known as the Pitman-Yor process prior generalizes this to a two-parameter family. The latter is the most general exchangeable partition probability function (EPPF) as defined by Pitman (Pitman, 2002) known to date. I want to argue that it is desirable to have more flexibility in expressing our prior beliefs over cluster sizes.
Abstract We introduce a new prior for use in Nonparametric Bayesian Hierarchical Clustering. The ... more Abstract We introduce a new prior for use in Nonparametric Bayesian Hierarchical Clustering. The prior is constructed by marginalizing out the time information of Kingman's coalescent, providing a prior over tree structures which we call the Time-Marginalized Coalescent (TMC). This allows for models which factorize the tree structure and times, providing two benefits: more flexible priors may be constructed and more efficient Gibbs type inference can be used.
Abstract We describe distributed algorithms for two widely-used topic models, namely the Latent D... more Abstract We describe distributed algorithms for two widely-used topic models, namely the Latent Dirichlet Allocation (LDA) model, and the Hierarchical Dirichet Process (HDP) model. In our distributed algorithms the data is partitioned across separate processors and inference is done in a parallel, distributed fashion. We propose two distributed algorithms for LDA. The first algorithm is a straightforward mapping of LDA to a distributed processor setting.
We describe factored spectrally transformed linear mixed models (FaST-LMM), an algorithm for geno... more We describe factored spectrally transformed linear mixed models (FaST-LMM), an algorithm for genome-wide association studies (GWAS) that scales linearly with cohort size in both run time and memory use. On Wellcome Trust data for 15,000 individuals, FaST-LMM ran an order of magnitude faster than current efficient algorithms. Our algorithm can analyze data for 120,000 individuals in just a few hours, whereas current algorithms fail on data for even 20,000 individuals (http://mscompbio. codeplex. com/).
Abstract: How many labeled examples are needed to estimate a classifier's performance on a new da... more Abstract: How many labeled examples are needed to estimate a classifier's performance on a new dataset? We study the case where data is plentiful, but labels are expensive. We show that by making a few reasonable assumptions on the structure of the data, it is possible to estimate performance curves, with confidence bounds, using a small number of ground truth labels. Our approach, which we call Semisupervised Performance Evaluation (SPE), is based on a generative model for the classifier's confidence scores.
Abstract We introduce a new cluster-cumulant expansion (CCE) based on the fixed points of iterati... more Abstract We introduce a new cluster-cumulant expansion (CCE) based on the fixed points of iterative belief propagation (IBP). This expansion is similar in spirit to the loop-series (LS) recently introduced in [1].
The page budget stood at 1872 pages during 2011, which will maintained at the same level in 2012.... more The page budget stood at 1872 pages during 2011, which will maintained at the same level in 2012. As a measure of overall timely review, over the last 12 months the delay encountered from submission to first notification has been under three months. We are striving to improve this turnaround period and I am positive that this will be achieved due to the effort of the great team that we have. I would like to extend my gratitude to the wonderful team of Associate Editors, Guest Editors, reviewers, and IEEE Computer Society staff.
In this paper we consider space-times containing matter expanding or contracting according to a t... more In this paper we consider space-times containing matter expanding or contracting according to a time-dependent scale factor. Cosmologies with vanishing, positive or negative cosmological constant are considered. In the case of vanishing or negative cosmological constant open and closed spatial surfaces are solutions while in the case of positive cosmological constant only closed surfaces exist. The gravitational field is solved explicitly in the case of 1 or 2 particles, 1 black hole, and 1 black hole vacuum state.
Abstract. We find that the momentum conjugate to the relative distance between two gravitating pa... more Abstract. We find that the momentum conjugate to the relative distance between two gravitating particles in their centre-of-mass frame is a hyperbolic angle. This fact suggests that momentum space can be defined consistently on a hyperboloid. We investigate the effect of quantization on this curved momentum space. The coordinates are represented by noncommuting Hermitian operators. We also find that there is a smallest distance between the two particles of one quarter of the Planck length.
Abstract: Variational Bayesian inference and (collapsed) Gibbs sampling are the two important cla... more Abstract: Variational Bayesian inference and (collapsed) Gibbs sampling are the two important classes of inference algorithms for Bayesian networks. Both have their advantages and disadvantages: collapsed Gibbs sampling is unbiased but is also inefficient for large count values and requires averaging over many samples to reduce variance. On the other hand, variational Bayesian inference is efficient and accurate for large count values but suffers from bias for small counts.
Abstract. In this paper we describe the matter-free toroidal spacetime in't Hooft's polygon appro... more Abstract. In this paper we describe the matter-free toroidal spacetime in't Hooft's polygon approach to (2+ 1)-dimensional gravity. First we show that the constraint algebra of the polygons closes (this is a general result, not necessarily derived for a torus). Next we construct a onepolygon torus and find (in contrast to earlier results in the literature) that this slicing of spacetime is not compatible with all the solutions that emerge in the continuum formulation. Finally, we remedy this situation by adding one more polygon.
Abstract: We extend the herding algorithm to continuous spaces by using the kernel trick. The res... more Abstract: We extend the herding algorithm to continuous spaces by using the kernel trick. The resulting" kernel herding" algorithm is an infinite memory deterministic process that learns to approximate a PDF with a collection of samples. We show that kernel herding decreases the error of expectations of functions in the Hilbert space at a rate O (1/T) which is much faster than the usual O (1/pT) for iid random samples. We illustrate kernel herding by approximating Bayesian predictive distributions.
Abstract. We use a computer to follow the evolution of two gravitating particles in a (2+ 1)-dime... more Abstract. We use a computer to follow the evolution of two gravitating particles in a (2+ 1)-dimensional closed universe. In a closed universe there is enough energy to produce a Gott-pair, ie a pair of particles with tachyonic centre of mass, from regular initial data. We study such a pair and find that they can wind around each other with ever increasing momentum. As was shown by't Hooft, the universe must crunch before any closed timelike curve can be traversed.
Deep belief nets are probabilistic generative models that are composed of multiple layers of stoc... more Deep belief nets are probabilistic generative models that are composed of multiple layers of stochastic, latent variables. The latent variables typically have binary values and are often called hidden units or feature detectors. The top two layers have undirected, symmetric connections between them and form an associative memory. The lower layers receive top-down, directed connections from the layer above. The states of the units in the lowest layer represent a data vector.
In this class we will study two methods that model data which are distributed on a curved manifol... more In this class we will study two methods that model data which are distributed on a curved manifold in some D-dimensional space. We have seen how to model data which lie on a linear subspace through methods like Faxtor Analysis (FA) and Probabilistic Principal Component Analysis (PPCA). Here we will adress the issue of how to deal with data that are also intrinsically lower dimenional, but are\ embedded" nonlinearly in a higher dimensional space, ie they lie on a curved manifold in this space.
Abstract The radio spectrum in wireless communication systems is being allocated up quickly with ... more Abstract The radio spectrum in wireless communication systems is being allocated up quickly with ever increasing demands in wireless industries. Cognitive radio is a way of opportunistically sharing the scare spectrum among primary and secondary users of the spectrum. The key challenge in deploying cognitive radio networks is to find out the spectrum holes in the primary wireless systems in order to allow the secondary users to operate.
This is a set of MATLAB functions related to learning in undirected graphical models or Markov Ra... more This is a set of MATLAB functions related to learning in undirected graphical models or Markov Random Fields (MRFs). It is part of an effort to create a publicly available repository where authors can contribute code and benchmark datasets that will allow for free comparision of various methods for learning in MRFs.
Abstract: In recent years a number of methods have been developed for automatically learning the ... more Abstract: In recent years a number of methods have been developed for automatically learning the (sparse) connectivity structure of Markov Random Fields. These methods are mostly based on L1-regularized optimization which has a number of disadvantages such as the inability to assess model uncertainty and expensive cross-validation to find the optimal regularization parameter.
Abstract: The Gibbs sampler is one of the most popular algorithms for inference in statistical mo... more Abstract: The Gibbs sampler is one of the most popular algorithms for inference in statistical models. In this paper, we introduce a herding variant of this algorithm, called herded Gibbs, that is entirely deterministic. We prove that herded Gibbs has an $ O (1/T) $ convergence rate for models with independent variables and for fully connected probabilistic graphical models. Herded Gibbs is shown to outperform Gibbs in the tasks of image denoising with MRFs and named entity recognition with CRFs.
Abstract Most infinite mixture models in the current literature are based on the Dirichlet proces... more Abstract Most infinite mixture models in the current literature are based on the Dirichlet process prior. This prior on partitions implies a very specific (a priori) distribution on cluster sizes. A slightly more general prior known as the Pitman-Yor process prior generalizes this to a two-parameter family. The latter is the most general exchangeable partition probability function (EPPF) as defined by Pitman (Pitman, 2002) known to date. I want to argue that it is desirable to have more flexibility in expressing our prior beliefs over cluster sizes.
Abstract We introduce a new prior for use in Nonparametric Bayesian Hierarchical Clustering. The ... more Abstract We introduce a new prior for use in Nonparametric Bayesian Hierarchical Clustering. The prior is constructed by marginalizing out the time information of Kingman's coalescent, providing a prior over tree structures which we call the Time-Marginalized Coalescent (TMC). This allows for models which factorize the tree structure and times, providing two benefits: more flexible priors may be constructed and more efficient Gibbs type inference can be used.
Uploads
Papers by Max Welling