-
Learning k-Determinantal Point Processes for Personalized Ranking
Authors:
Yuli Liu,
Christian Walder,
Lexing Xie
Abstract:
The key to personalized recommendation is to predict a personalized ranking on a catalog of items by modeling the user's preferences. There are many personalized ranking approaches for item recommendation from implicit feedback like Bayesian Personalized Ranking (BPR) and listwise ranking. Despite these methods have shown performance benefits, there are still limitations affecting recommendation p…
▽ More
The key to personalized recommendation is to predict a personalized ranking on a catalog of items by modeling the user's preferences. There are many personalized ranking approaches for item recommendation from implicit feedback like Bayesian Personalized Ranking (BPR) and listwise ranking. Despite these methods have shown performance benefits, there are still limitations affecting recommendation performance. First, none of them directly optimize ranking of sets, causing inadequate exploitation of correlations among multiple items. Second, the diversity aspect of recommendations is insufficiently addressed compared to relevance.
In this work, we present a new optimization criterion LkP based on set probability comparison for personalized ranking that moves beyond traditional ranking-based methods. It formalizes set-level relevance and diversity ranking comparisons through a Determinantal Point Process (DPP) kernel decomposition. To confer ranking interpretability to the DPP set probabilities and prioritize the practicality of LkP, we condition the standard DPP on the cardinality k of the DPP-distributed set, known as k-DPP, a less-explored extension of DPP. The generic stochastic gradient descent based technique can be directly applied to optimizing models that employ LkP. We implement LkP in the context of both Matrix Factorization (MF) and neural networks approaches, on three real-world datasets, obtaining improved relevance and diversity performances. LkP is broadly applicable, and when applied to existing recommendation models it also yields strong performance improvements, suggesting that LkP holds significant value to the field of recommender systems.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation
Authors:
Xinlei Niu,
Jing Zhang,
Christian Walder,
Charles Patrick Martin
Abstract:
We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model. Unlike recent large-scale sound generation models, our model can be efficiently trained under limited computational resources. The integration of a contrastive learning strategy further enhances the connection between text conditions and the gen…
▽ More
We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model. Unlike recent large-scale sound generation models, our model can be efficiently trained under limited computational resources. The integration of a contrastive learning strategy further enhances the connection between text conditions and the generated outputs, resulting in coherent and high-fidelity performance. Our experiments demonstrate that SoundLoCD outperforms the baseline with greatly reduced computational resources. A comprehensive ablation study further validates the contribution of each component within SoundLoCD. Demo page: \url{https://XinleiNIU.github.io/demo-SoundLoCD/}.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving
Authors:
Sean Lamont,
Michael Norrish,
Amir Dezfouli,
Christian Walder,
Paul Montague
Abstract:
Artificial Intelligence for Theorem Proving has given rise to a plethora of benchmarks and methodologies, particularly in Interactive Theorem Proving (ITP). Research in the area is fragmented, with a diverse set of approaches being spread across several ITP systems. This presents a significant challenge to the comparison of methods, which are often complex and difficult to replicate. Addressing th…
▽ More
Artificial Intelligence for Theorem Proving has given rise to a plethora of benchmarks and methodologies, particularly in Interactive Theorem Proving (ITP). Research in the area is fragmented, with a diverse set of approaches being spread across several ITP systems. This presents a significant challenge to the comparison of methods, which are often complex and difficult to replicate. Addressing this, we present BAIT, a framework for fair and streamlined comparison of learning approaches in ITP. We demonstrate BAIT's capabilities with an in-depth comparison, across several ITP benchmarks, of state-of-the-art architectures applied to the problem of formula embedding. We find that Structure Aware Transformers perform particularly well, improving on techniques associated with the original problem sets. BAIT also allows us to assess the end-to-end proving performance of systems built on interactive environments. This unified perspective reveals a novel end-to-end system that improves on prior work. We also provide a qualitative analysis, illustrating that improved performance is associated with more semantically-aware embeddings. By streamlining the implementation and comparison of Machine Learning algorithms in the ITP context, we anticipate BAIT will be a springboard for future research.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
Authors:
Xinlei Niu,
Christian Walder,
Jing Zhang,
Charles Patrick Martin
Abstract:
We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all…
▽ More
We propose the stochastic optimal path which solves the classical optimal path problem by a probability-softening solution. This unified approach transforms a wide range of DP problems into directed acyclic graphs in which all paths follow a Gibbs distribution. We show the equivalence of the Gibbs distribution to a message-passing algorithm by the properties of the Gumbel distribution and give all the ingredients required for variational Bayesian inference of a latent path, namely Bayesian dynamic programming (BDP). We demonstrate the usage of BDP in the latent space of variational autoencoders (VAEs) and propose the BDP-VAE which captures structured sparse optimal paths as latent variables. This enables end-to-end training for generative tasks in which models rely on unobserved structural information. At last, we validate the behavior of our approach and showcase its applicability in two real-world applications: text-to-speech and singing voice synthesis. Our implementation code is available at \url{https://github.com/XinleiNIU/LatentOptimalPathsBayesianDP}.
△ Less
Submitted 25 June, 2024; v1 submitted 4 June, 2023;
originally announced June 2023.
-
DualVAE: Controlling Colours of Generated and Real Images
Authors:
Keerth Rathakumar,
David Liebowitz,
Christian Walder,
Kristen Moore,
Salil S. Kanhere
Abstract:
Colour controlled image generation and manipulation are of interest to artists and graphic designers. Vector Quantised Variational AutoEncoders (VQ-VAEs) with autoregressive (AR) prior are able to produce high quality images, but lack an explicit representation mechanism to control colour attributes. We introduce DualVAE, a hybrid representation model that provides such control by learning disenta…
▽ More
Colour controlled image generation and manipulation are of interest to artists and graphic designers. Vector Quantised Variational AutoEncoders (VQ-VAEs) with autoregressive (AR) prior are able to produce high quality images, but lack an explicit representation mechanism to control colour attributes. We introduce DualVAE, a hybrid representation model that provides such control by learning disentangled representations for colour and geometry. The geometry is represented by an image intensity mapping that identifies structural features. The disentangled representation is obtained by two novel mechanisms:
(i) a dual branch architecture that separates image colour attributes from geometric attributes, and (ii) a new ELBO that trains the combined colour and geometry representations. DualVAE can control the colour of generated images, and recolour existing images by transferring the colour latent representation obtained from an exemplar image. We demonstrate that DualVAE generates images with FID nearly two times better than VQ-GAN on a diverse collection of datasets, including animated faces, logos and artistic landscapes.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents
Authors:
Daniel D. Johnson,
Daniel Tarlow,
Christian Walder
Abstract:
Large language models show impressive results at predicting structured text such as code, but also commonly introduce errors and hallucinations in their output. When used to assist software developers, these models may make mistakes that users must go back and fix, or worse, introduce subtle bugs that users may miss entirely. We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U…
▽ More
Large language models show impressive results at predicting structured text such as code, but also commonly introduce errors and hallucinations in their output. When used to assist software developers, these models may make mistakes that users must go back and fix, or worse, introduce subtle bugs that users may miss entirely. We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U-SURE), an approach for building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditioned utility, using random samples from a generative model as a proxy for the unobserved possible intents of the end user. Our technique combines minimum-Bayes-risk decoding, dual decomposition, and decision diagrams in order to efficiently produce structured uncertainty summaries, given only sample access to an arbitrary generative model of code and an optional AST parser. We demonstrate R-U-SURE on three developer-assistance tasks, and show that it can be applied different user interaction patterns without retraining the model and leads to more accurate uncertainty estimates than token-probability baselines. We also release our implementation as an open-source library at https://github.com/google-research/r_u_sure.
△ Less
Submitted 28 April, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Sampled Transformer for Point Sets
Authors:
Shidi Li,
Christian Walder,
Alexander Soen,
Lexing Xie,
Miaomiao Liu
Abstract:
The sparse transformer can reduce the computational complexity of the self-attention layers to $O(n)$, whilst still being a universal approximator of continuous sequence-to-sequence functions. However, this permutation variant operation is not appropriate for direct application to sets. In this paper, we proposed an $O(n)$ complexity sampled transformer that can process point set elements directly…
▽ More
The sparse transformer can reduce the computational complexity of the self-attention layers to $O(n)$, whilst still being a universal approximator of continuous sequence-to-sequence functions. However, this permutation variant operation is not appropriate for direct application to sets. In this paper, we proposed an $O(n)$ complexity sampled transformer that can process point set elements directly without any additional inductive bias. Our sampled transformer introduces random element sampling, which randomly splits point sets into subsets, followed by applying a shared Hamiltonian self-attention mechanism to each subset. The overall attention mechanism can be viewed as a Hamiltonian cycle in the complete attention graph, and the permutation of point set elements is equivalent to randomly sampling Hamiltonian cycles. This mechanism implements a Monte Carlo simulation of the $O(n^2)$ dense attention connections. We show that it is a universal approximator for continuous set-to-set functions. Experimental results on point-clouds show comparable or better accuracy with significantly reduced computational complexity compared to the dense transformer or alternative sparse attention schemes.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
LegendreTron: Uprising Proper Multiclass Loss Learning
Authors:
Kevin Lam,
Christian Walder,
Spiridon Penev,
Richard Nock
Abstract:
Loss functions serve as the foundation of supervised learning and are often chosen prior to model development. To avoid potentially ad hoc choices of losses, statistical decision theory describes a desirable property for losses known as \emph{properness}, which asserts that Bayes' rule is optimal. Recent works have sought to \emph{learn losses} and models jointly. Existing methods do this by fitti…
▽ More
Loss functions serve as the foundation of supervised learning and are often chosen prior to model development. To avoid potentially ad hoc choices of losses, statistical decision theory describes a desirable property for losses known as \emph{properness}, which asserts that Bayes' rule is optimal. Recent works have sought to \emph{learn losses} and models jointly. Existing methods do this by fitting an inverse canonical link function which monotonically maps $\mathbb{R}$ to $[0,1]$ to estimate probabilities for binary problems. In this paper, we extend monotonicity to maps between $\mathbb{R}^{C-1}$ and the projected probability simplex $\tildeΔ^{C-1}$ by using monotonicity of gradients of convex functions. We present {\sc LegendreTron} as a novel and practical method that jointly learns \emph{proper canonical losses} and probabilities for multiclass problems. Tested on a benchmark of domains with up to 1,000 classes, our experimental results show that our method consistently outperforms the natural multiclass baseline under a $t$-test at 99% significance on all datasets with greater than 10 classes.
△ Less
Submitted 28 November, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
Determinantal Point Process Likelihoods for Sequential Recommendation
Authors:
Yuli Liu,
Christian Walder,
Lexing Xie
Abstract:
Sequential recommendation is a popular task in academic research and close to real-world application scenarios, where the goal is to predict the next action(s) of the user based on his/her previous sequence of actions. In the training process of recommender systems, the loss function plays an essential role in guiding the optimization of recommendation models to generate accurate suggestions for u…
▽ More
Sequential recommendation is a popular task in academic research and close to real-world application scenarios, where the goal is to predict the next action(s) of the user based on his/her previous sequence of actions. In the training process of recommender systems, the loss function plays an essential role in guiding the optimization of recommendation models to generate accurate suggestions for users. However, most existing sequential recommendation techniques focus on designing algorithms or neural network architectures, and few efforts have been made to tailor loss functions that fit naturally into the practical application scenario of sequential recommender systems.
Ranking-based losses, such as cross-entropy and Bayesian Personalized Ranking (BPR) are widely used in the sequential recommendation area. We argue that such objective functions suffer from two inherent drawbacks: i) the dependencies among elements of a sequence are overlooked in these loss formulations; ii) instead of balancing accuracy (quality) and diversity, only generating accurate results has been over emphasized. We therefore propose two new loss functions based on the Determinantal Point Process (DPP) likelihood, that can be adaptively applied to estimate the subsequent item or items. The DPP-distributed item set captures natural dependencies among temporal actions, and a quality vs. diversity decomposition of the DPP kernel pushes us to go beyond accuracy-oriented loss functions. Experimental results using the proposed loss functions on three real-world datasets show marked improvements over state-of-the-art sequential recommendation methods in both quality and diversity metrics.
△ Less
Submitted 25 April, 2022;
originally announced April 2022.
-
SPA-VAE: Similar-Parts-Assignment for Unsupervised 3D Point Cloud Generation
Authors:
Shidi Li,
Christian Walder,
Miaomiao Liu
Abstract:
This paper addresses the problem of unsupervised parts-aware point cloud generation with learned parts-based self-similarity. Our SPA-VAE infers a set of latent canonical candidate shapes for any given object, along with a set of rigid body transformations for each such candidate shape to one or more locations within the assembled object. In this way, noisy samples on the surface of, say, each leg…
▽ More
This paper addresses the problem of unsupervised parts-aware point cloud generation with learned parts-based self-similarity. Our SPA-VAE infers a set of latent canonical candidate shapes for any given object, along with a set of rigid body transformations for each such candidate shape to one or more locations within the assembled object. In this way, noisy samples on the surface of, say, each leg of a table, are effectively combined to estimate a single leg prototype. When parts-based self-similarity exists in the raw data, sharing data among parts in this way confers numerous advantages: modeling accuracy, appropriately self-similar generative outputs, precise in-filling of occlusions, and model parsimony. SPA-VAE is trained end-to-end using a variational Bayesian approach which uses the Gumbel-softmax trick for the shared part assignments, along with various novel losses to provide appropriate inductive biases. Quantitative and qualitative analyses on ShapeNet demonstrate the advantage of SPA-VAE.
△ Less
Submitted 28 August, 2022; v1 submitted 15 March, 2022;
originally announced March 2022.
-
EditVAE: Unsupervised Part-Aware Controllable 3D Point Cloud Shape Generation
Authors:
Shidi Li,
Miaomiao Liu,
Christian Walder
Abstract:
This paper tackles the problem of parts-aware point cloud generation. Unlike existing works which require the point cloud to be segmented into parts a priori, our parts-aware editing and generation are performed in an unsupervised manner. We achieve this with a simple modification of the Variational Auto-Encoder which yields a joint model of the point cloud itself along with a schematic representa…
▽ More
This paper tackles the problem of parts-aware point cloud generation. Unlike existing works which require the point cloud to be segmented into parts a priori, our parts-aware editing and generation are performed in an unsupervised manner. We achieve this with a simple modification of the Variational Auto-Encoder which yields a joint model of the point cloud itself along with a schematic representation of it as a combination of shape primitives. In particular, we introduce a latent representation of the point cloud which can be decomposed into a disentangled representation for each part of the shape. These parts are in turn disentangled into both a shape primitive and a point cloud representation, along with a standardising transformation to a canonical coordinate system. The dependencies between our standardising transformations preserve the spatial dependencies between the parts in a manner that allows meaningful parts-aware point cloud generation and shape editing. In addition to the flexibility afforded by our disentangled representation, the inductive bias introduced by our joint modeling approach yields state-of-the-art experimental results on the ShapeNet dataset.
△ Less
Submitted 30 March, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Dense Uncertainty Estimation
Authors:
Jing Zhang,
Yuchao Dai,
Mochu Xiang,
Deng-Ping Fan,
Peyman Moghadam,
Mingyi He,
Christian Walder,
Kaihao Zhang,
Mehrtash Harandi,
Nick Barnes
Abstract:
Deep neural networks can be roughly divided into deterministic neural networks and stochastic neural networks.The former is usually trained to achieve a mapping from input space to output space via maximum likelihood estimation for the weights, which leads to deterministic predictions during testing. In this way, a specific weights set is estimated while ignoring any uncertainty that may occur in…
▽ More
Deep neural networks can be roughly divided into deterministic neural networks and stochastic neural networks.The former is usually trained to achieve a mapping from input space to output space via maximum likelihood estimation for the weights, which leads to deterministic predictions during testing. In this way, a specific weights set is estimated while ignoring any uncertainty that may occur in the proper weight space. The latter introduces randomness into the framework, either by assuming a prior distribution over model parameters (i.e. Bayesian Neural Networks) or including latent variables (i.e. generative models) to explore the contribution of latent variables for model predictions, leading to stochastic predictions during testing. Different from the former that achieves point estimation, the latter aims to estimate the prediction distribution, making it possible to estimate uncertainty, representing model ignorance about its predictions. We claim that conventional deterministic neural network based dense prediction tasks are prone to overfitting, leading to over-confident predictions, which is undesirable for decision making. In this paper, we investigate stochastic neural networks and uncertainty estimation techniques to achieve both accurate deterministic prediction and reliable uncertainty estimation. Specifically, we work on two types of uncertainty estimations solutions, namely ensemble based methods and generative model based methods, and explain their pros and cons while using them in fully/semi/weakly-supervised framework. Due to the close connection between uncertainty estimation and model calibration, we also introduce how uncertainty estimation can be used for deep model calibration to achieve well-calibrated models, namely dense model calibration. Code and data are available at https://github.com/JingZhang617/UncertaintyEstimation.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
Humanly Certifying Superhuman Classifiers
Authors:
Qiongkai Xu,
Christian Walder,
Chenchen Xu
Abstract:
Estimating the performance of a machine learning system is a longstanding challenge in artificial intelligence research. Today, this challenge is especially relevant given the emergence of systems which appear to increasingly outperform human beings. In some cases, this "superhuman" performance is readily demonstrated; for example by defeating legendary human players in traditional two player game…
▽ More
Estimating the performance of a machine learning system is a longstanding challenge in artificial intelligence research. Today, this challenge is especially relevant given the emergence of systems which appear to increasingly outperform human beings. In some cases, this "superhuman" performance is readily demonstrated; for example by defeating legendary human players in traditional two player games. On the other hand, it can be challenging to evaluate classification models that potentially surpass human performance. Indeed, human annotations are often treated as a ground truth, which implicitly assumes the superiority of the human over any models trained on human annotations. In reality, human annotators can make mistakes and be subjective. Evaluating the performance with respect to a genuine oracle may be more objective and reliable, even when querying the oracle is expensive or impossible. In this paper, we first raise the challenge of evaluating the performance of both humans and models with respect to an oracle which is unobserved. We develop a theory for estimating the accuracy compared to the oracle, using only imperfect human annotations for reference. Our analysis provides a simple recipe for detecting and certifying superhuman performance in this setting, which we believe will assist in understanding the stage of current research on classification. We validate the convergence of the bounds and the assumptions of our theory on carefully designed toy experiments with known oracles. Moreover, we demonstrate the utility of our theory by meta-analyzing large-scale natural language processing tasks, for which an oracle does not exist, and show that under our assumptions a number of models from recent years are with high probability superhuman.
△ Less
Submitted 16 September, 2021;
originally announced September 2021.
-
Learning to Continually Learn Rapidly from Few and Noisy Data
Authors:
Nicholas I-Hsien Kuo,
Mehrtash Harandi,
Nicolas Fourrier,
Christian Walder,
Gabriela Ferraro,
Hanna Suominen
Abstract:
Neural networks suffer from catastrophic forgetting and are unable to sequentially learn new tasks without guaranteed stationarity in data distribution. Continual learning could be achieved via replay -- by concurrently training externally stored old data while learning a new task. However, replay becomes less effective when each past task is allocated with less memory. To overcome this difficulty…
▽ More
Neural networks suffer from catastrophic forgetting and are unable to sequentially learn new tasks without guaranteed stationarity in data distribution. Continual learning could be achieved via replay -- by concurrently training externally stored old data while learning a new task. However, replay becomes less effective when each past task is allocated with less memory. To overcome this difficulty, we supplemented replay mechanics with meta-learning for rapid knowledge acquisition. By employing a meta-learner, which \textit{learns a learning rate per parameter per past task}, we found that base learners produced strong results when less memory was available. Additionally, our approach inherited several meta-learning advantages for continual learning: it demonstrated strong robustness to continually learn under the presence of noises and yielded base learners to higher accuracy in less updates.
△ Less
Submitted 6 March, 2021;
originally announced March 2021.
-
TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning
Authors:
Minchao Wu,
Michael Norrish,
Christian Walder,
Amir Dezfouli
Abstract:
We propose a novel approach to interactive theorem-proving (ITP) using deep reinforcement learning. The proposed framework is able to learn proof search strategies as well as tactic and arguments prediction in an end-to-end manner. We formulate the process of ITP as a Markov decision process (MDP) in which each state represents a set of potential derivation paths. This structure allows us to intro…
▽ More
We propose a novel approach to interactive theorem-proving (ITP) using deep reinforcement learning. The proposed framework is able to learn proof search strategies as well as tactic and arguments prediction in an end-to-end manner. We formulate the process of ITP as a Markov decision process (MDP) in which each state represents a set of potential derivation paths. This structure allows us to introduce a novel backtracking mechanism which enables the agent to efficiently discard (predicted) dead-end derivations and restart from promising alternatives. We implement the framework in the HOL4 theorem prover. Experimental results show that the framework outperforms existing automated theorem provers (i.e., hammers) available in HOL4 when evaluated on unseen problems. We further elaborate the role of key components of the framework using ablation studies.
△ Less
Submitted 17 June, 2021; v1 submitted 19 February, 2021;
originally announced February 2021.
-
MTL2L: A Context Aware Neural Optimiser
Authors:
Nicholas I-Hsien Kuo,
Mehrtash Harandi,
Nicolas Fourrier,
Christian Walder,
Gabriela Ferraro,
Hanna Suominen
Abstract:
Learning to learn (L2L) trains a meta-learner to assist the learning of a task-specific base learner. Previously, it was shown that a meta-learner could learn the direct rules to update learner parameters; and that the learnt neural optimiser updated learners more rapidly than handcrafted gradient-descent methods. However, we demonstrate that previous neural optimisers were limited to update learn…
▽ More
Learning to learn (L2L) trains a meta-learner to assist the learning of a task-specific base learner. Previously, it was shown that a meta-learner could learn the direct rules to update learner parameters; and that the learnt neural optimiser updated learners more rapidly than handcrafted gradient-descent methods. However, we demonstrate that previous neural optimisers were limited to update learners on one designated dataset. In order to address input-domain heterogeneity, we introduce Multi-Task Learning to Learn (MTL2L), a context aware neural optimiser which self-modifies its optimisation rules based on input data. We show that MTL2L is capable of updating learners to classify on data of an unseen input-domain at the meta-testing phase.
△ Less
Submitted 18 July, 2020;
originally announced July 2020.
-
All your loss are belong to Bayes
Authors:
Christian Walder,
Richard Nock
Abstract:
Loss functions are a cornerstone of machine learning and the starting point of most algorithms. Statistics and Bayesian decision theory have contributed, via properness, to elicit over the past decades a wide set of admissible losses in supervised learning, to which most popular choices belong (logistic, square, Matsushita, etc.). Rather than making a potentially biased ad hoc choice of the loss,…
▽ More
Loss functions are a cornerstone of machine learning and the starting point of most algorithms. Statistics and Bayesian decision theory have contributed, via properness, to elicit over the past decades a wide set of admissible losses in supervised learning, to which most popular choices belong (logistic, square, Matsushita, etc.). Rather than making a potentially biased ad hoc choice of the loss, there has recently been a boost in efforts to fit the loss to the domain at hand while training the model itself. The key approaches fit a canonical link, a function which monotonically relates the closed unit interval to R and can provide a proper loss via integration. In this paper, we rely on a broader view of proper composite losses and a recent construct from information geometry, source functions, whose fitting alleviates constraints faced by canonical links. We introduce a trick on squared Gaussian Processes to obtain a random process whose paths are compliant source functions with many desirable properties in the context of link estimation. Experimental results demonstrate substantial improvements over the state of the art.
△ Less
Submitted 5 November, 2020; v1 submitted 8 June, 2020;
originally announced June 2020.
-
Quantile Propagation for Wasserstein-Approximate Gaussian Processes
Authors:
Rui Zhang,
Christian J. Walder,
Edwin V. Bonilla,
Marian-Andrei Rizoiu,
Lexing Xie
Abstract:
Approximate inference techniques are the cornerstone of probabilistic methods based on Gaussian process priors. Despite this, most work approximately optimizes standard divergence measures such as the Kullback-Leibler (KL) divergence, which lack the basic desiderata for the task at hand, while chiefly offering merely technical convenience. We develop a new approximate inference method for Gaussian…
▽ More
Approximate inference techniques are the cornerstone of probabilistic methods based on Gaussian process priors. Despite this, most work approximately optimizes standard divergence measures such as the Kullback-Leibler (KL) divergence, which lack the basic desiderata for the task at hand, while chiefly offering merely technical convenience. We develop a new approximate inference method for Gaussian process models which overcomes the technical challenges arising from abandoning these convenient divergences. Our method---dubbed Quantile Propagation (QP)---is similar to expectation propagation (EP) but minimizes the $L_2$ Wasserstein distance (WD) instead of the KL divergence. The WD exhibits all the required properties of a distance metric, while respecting the geometry of the underlying sample space. We show that QP matches quantile functions rather than moments as in EP and has the same mean update but a smaller variance update than EP, thereby alleviating EP's tendency to over-estimate posterior variances. Crucially, despite the significant complexity of dealing with the WD, QP has the same favorable locality property as EP, and thereby admits an efficient algorithm. Experiments on classification and Poisson regression show that QP outperforms both EP and variational Bayes.
△ Less
Submitted 5 November, 2020; v1 submitted 21 December, 2019;
originally announced December 2019.
-
Computer Assisted Composition in Continuous Time
Authors:
Chamin Hewa Koneputugodage,
Rhys Healy,
Sean Lamont,
Ian Mallett,
Matt Brown,
Matt Walters,
Ushini Attanayake,
Libo Zhang,
Roger T. Dean,
Alexander Hunter,
Charles Gretton,
Christian Walder
Abstract:
We address the problem of combining sequence models of symbolic music with user defined constraints. For typical models this is non-trivial as only the conditional distribution of each symbol given the earlier symbols is available, while the constraints correspond to arbitrary times. Previously this has been addressed by assuming a discrete time model of fixed rhythm. We generalise to continuous t…
▽ More
We address the problem of combining sequence models of symbolic music with user defined constraints. For typical models this is non-trivial as only the conditional distribution of each symbol given the earlier symbols is available, while the constraints correspond to arbitrary times. Previously this has been addressed by assuming a discrete time model of fixed rhythm. We generalise to continuous time and arbitrary rhythm by introducing a simple, novel, and efficient particle filter scheme, applicable to general continuous time point processes. Extensive experimental evaluations demonstrate that in comparison with a more traditional beam search baseline, the particle filter exhibits superior statistical properties and yields more agreeable results in an extensive human listening test experiment.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
Variational Inference for Sparse Gaussian Process Modulated Hawkes Process
Authors:
Rui Zhang,
Christian Walder,
Marian-Andrei Rizoiu
Abstract:
The Hawkes process (HP) has been widely applied to modeling self-exciting events including neuron spikes, earthquakes and tweets. To avoid designing parametric triggering kernel and to be able to quantify the prediction confidence, the non-parametric Bayesian HP has been proposed. However, the inference of such models suffers from unscalability or slow convergence. In this paper, we aim to solve b…
▽ More
The Hawkes process (HP) has been widely applied to modeling self-exciting events including neuron spikes, earthquakes and tweets. To avoid designing parametric triggering kernel and to be able to quantify the prediction confidence, the non-parametric Bayesian HP has been proposed. However, the inference of such models suffers from unscalability or slow convergence. In this paper, we aim to solve both problems. Specifically, first, we propose a new non-parametric Bayesian HP in which the triggering kernel is modeled as a squared sparse Gaussian process. Then, we propose a novel variational inference schema for model optimization. We employ the branching structure of the HP so that maximization of evidence lower bound (ELBO) is tractable by the expectation-maximization algorithm. We propose a tighter ELBO which improves the fitting performance. Further, we accelerate the novel variational inference schema to linear time complexity by leveraging the stationarity of the triggering kernel. Different from prior acceleration methods, ours enjoys higher efficiency. Finally, we exploit synthetic data and two large social media datasets to evaluate our method. We show that our approach outperforms state-of-the-art non-parametric frequentist and Bayesian methods. We validate the efficiency of our accelerated variational inference schema and practical utility of our tighter ELBO for model selection. We observe that the tighter ELBO exceeds the common one in model selection.
△ Less
Submitted 22 November, 2019; v1 submitted 24 May, 2019;
originally announced May 2019.
-
New Tricks for Estimating Gradients of Expectations
Authors:
Christian J. Walder,
Paul Roussel,
Richard Nock,
Cheng Soon Ong,
Masashi Sugiyama
Abstract:
We introduce a family of pairwise stochastic gradient estimators for gradients of expectations, which are related to the log-derivative trick, but involve pairwise interactions between samples. The simplest example of our new estimator, dubbed the fundamental trick estimator, is shown to arise from either a) introducing and approximating an integral representation based on the fundamental theorem…
▽ More
We introduce a family of pairwise stochastic gradient estimators for gradients of expectations, which are related to the log-derivative trick, but involve pairwise interactions between samples. The simplest example of our new estimator, dubbed the fundamental trick estimator, is shown to arise from either a) introducing and approximating an integral representation based on the fundamental theorem of calculus, or b) applying the reparameterisation trick to an implicit parameterisation under infinitesimal perturbation of the parameters. From the former perspective we generalise to a reproducing kernel Hilbert space representation, giving rise to a locality parameter in the pairwise interactions mentioned above, yielding our representer trick estimator. The resulting estimators are unbiased and shown to offer an independent component of useful information in comparison with the log-derivative estimator. We provide a further novel theoretical analysis which further characterises the variance reduction afforded by the new techniques. Promising analytical and numerical examples confirm the theory and intuitions behind the new estimators.
△ Less
Submitted 19 April, 2022; v1 submitted 31 January, 2019;
originally announced January 2019.
-
Efficient Non-parametric Bayesian Hawkes Processes
Authors:
Rui Zhang,
Christian Walder,
Marian-Andrei Rizoiu,
Lexing Xie
Abstract:
In this paper, we develop an efficient nonparametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the finite support assumption of the Hawkes process, we efficiently sample…
▽ More
In this paper, we develop an efficient nonparametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the finite support assumption of the Hawkes process, we efficiently sample random branching structures and thus, we split the Hawkes process into clusters of Poisson processes. We derive two algorithms -- a block Gibbs sampler and a maximum a posteriori estimator based on expectation maximization -- and we show that our methods have a linear time complexity, both theoretically and empirically. On synthetic data, we show our methods to be able to infer flexible Hawkes triggering kernels. On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset. We also observe that on diffusions related to online videos, the learned kernels reflect the perceived longevity for different content types such as music or pets videos.
△ Less
Submitted 12 April, 2022; v1 submitted 8 October, 2018;
originally announced October 2018.
-
Monge blunts Bayes: Hardness Results for Adversarial Training
Authors:
Zac Cranko,
Aditya Krishna Menon,
Richard Nock,
Cheng Soon Ong,
Zhan Shi,
Christian Walder
Abstract:
The last few years have seen a staggering number of empirical studies of the robustness of neural networks in a model of adversarial perturbations of their inputs. Most rely on an adversary which carries out local modifications within prescribed balls. None however has so far questioned the broader picture: how to frame a resource-bounded adversary so that it can be severely detrimental to learnin…
▽ More
The last few years have seen a staggering number of empirical studies of the robustness of neural networks in a model of adversarial perturbations of their inputs. Most rely on an adversary which carries out local modifications within prescribed balls. None however has so far questioned the broader picture: how to frame a resource-bounded adversary so that it can be severely detrimental to learning, a non-trivial problem which entails at a minimum the choice of loss and classifiers.
We suggest a formal answer for losses that satisfy the minimal statistical requirement of being proper. We pin down a simple sufficient property for any given class of adversaries to be detrimental to learning, involving a central measure of "harmfulness" which generalizes the well-known class of integral probability metrics. A key feature of our result is that it holds for all proper losses, and for a popular subset of these, the optimisation of this central measure appears to be independent of the loss. When classifiers are Lipschitz -- a now popular approach in adversarial training --, this optimisation resorts to optimal transport to make a low-budget compression of class marginals. Toy experiments reveal a finding recently separately observed: training against a sufficiently budgeted adversary of this kind improves generalization.
△ Less
Submitted 7 May, 2019; v1 submitted 8 June, 2018;
originally announced June 2018.
-
Self-Bounded Prediction Suffix Tree via Approximate String Matching
Authors:
Dongwoo Kim,
Christian Walder
Abstract:
Prediction suffix trees (PST) provide an effective tool for sequence modelling and prediction. Current prediction techniques for PSTs rely on exact matching between the suffix of the current sequence and the previously observed sequence. We present a provably correct algorithm for learning a PST with approximate suffix matching by relaxing the exact matching condition. We then present a self-bound…
▽ More
Prediction suffix trees (PST) provide an effective tool for sequence modelling and prediction. Current prediction techniques for PSTs rely on exact matching between the suffix of the current sequence and the previously observed sequence. We present a provably correct algorithm for learning a PST with approximate suffix matching by relaxing the exact matching condition. We then present a self-bounded enhancement of our algorithm where the depth of suffix tree grows automatically in response to the model performance on a training sequence. Through experiments on synthetic datasets as well as three real-world datasets, we show that the approximate matching PST results in better predictive performance than the other variants of PST.
△ Less
Submitted 6 August, 2018; v1 submitted 9 February, 2018;
originally announced February 2018.
-
Neural Dynamic Programming for Musical Self Similarity
Authors:
Christian J. Walder,
Dongwoo Kim
Abstract:
We present a neural sequence model designed specifically for symbolic music. The model is based on a learned edit distance mechanism which generalises a classic recursion from computer sci- ence, leading to a neural dynamic program. Re- peated motifs are detected by learning the transfor- mations between them. We represent the arising computational dependencies using a novel data structure, the ed…
▽ More
We present a neural sequence model designed specifically for symbolic music. The model is based on a learned edit distance mechanism which generalises a classic recursion from computer sci- ence, leading to a neural dynamic program. Re- peated motifs are detected by learning the transfor- mations between them. We represent the arising computational dependencies using a novel data structure, the edit tree; this perspective suggests natural approximations which afford the scaling up of our otherwise cubic time algorithm. We demonstrate our model on real and synthetic data; in all cases it out-performs a strong stacked long short-term memory benchmark.
△ Less
Submitted 28 August, 2018; v1 submitted 9 February, 2018;
originally announced February 2018.
-
Fast Bayesian Intensity Estimation for the Permanental Process
Authors:
Christian J. Walder,
Adrian N. Bishop
Abstract:
The Cox process is a stochastic process which generalises the Poisson process by letting the underlying intensity function itself be a stochastic process. In this paper we present a fast Bayesian inference scheme for the permanental process, a Cox process under which the square root of the intensity is a Gaussian process. In particular we exploit connections with reproducing kernel Hilbert spaces,…
▽ More
The Cox process is a stochastic process which generalises the Poisson process by letting the underlying intensity function itself be a stochastic process. In this paper we present a fast Bayesian inference scheme for the permanental process, a Cox process under which the square root of the intensity is a Gaussian process. In particular we exploit connections with reproducing kernel Hilbert spaces, to derive efficient approximate Bayesian inference algorithms based on the Laplace approximation to the predictive distribution and marginal likelihood. We obtain a simple algorithm which we apply to toy and real-world problems, obtaining orders of magnitude speed improvements over previous work.
△ Less
Submitted 19 June, 2018; v1 submitted 12 January, 2017;
originally announced January 2017.
-
Computer Assisted Composition with Recurrent Neural Networks
Authors:
Christian Walder,
Dongwoo Kim
Abstract:
Sequence modeling with neural networks has lead to powerful models of symbolic music data. We address the problem of exploiting these models to reach creative musical goals, by combining with human input. To this end we generalise previous work, which sampled Markovian sequence models under the constraint that the sequence belong to the language of a given finite state machine provided by the huma…
▽ More
Sequence modeling with neural networks has lead to powerful models of symbolic music data. We address the problem of exploiting these models to reach creative musical goals, by combining with human input. To this end we generalise previous work, which sampled Markovian sequence models under the constraint that the sequence belong to the language of a given finite state machine provided by the human. We consider more expressive non-Markov models, thereby requiring approximate sampling which we provide in the form of an efficient sequential Monte Carlo method. In addition we provide and compare with a beam search strategy for conditional probability maximisation.
Our algorithms are capable of convincingly re-harmonising famous musical works. To demonstrate this we provide visualisations, quantitative experiments, a human listening test and audio examples. We find both the sampling and optimisation procedures to be effective, yet complementary in character. For the case of highly permissive constraint sets, we find that sampling is to be preferred due to the overly regular nature of the optimisation based results. The generality of our algorithms permits countless other creative applications.
△ Less
Submitted 29 September, 2017; v1 submitted 30 November, 2016;
originally announced December 2016.
-
Symbolic Music Data Version 1.0
Authors:
Christian Walder
Abstract:
In this document, we introduce a new dataset designed for training machine learning models of symbolic music data. Five datasets are provided, one of which is from a newly collected corpus of 20K midi files. We describe our preprocessing and cleaning pipeline, which includes the exclusion of a number of files based on scores from a previously developed probabilistic machine learning model. We also…
▽ More
In this document, we introduce a new dataset designed for training machine learning models of symbolic music data. Five datasets are provided, one of which is from a newly collected corpus of 20K midi files. We describe our preprocessing and cleaning pipeline, which includes the exclusion of a number of files based on scores from a previously developed probabilistic machine learning model. We also define training, testing and validation splits for the new dataset, based on a clustering scheme which we also describe. Some simple histograms are included.
△ Less
Submitted 8 June, 2016;
originally announced June 2016.
-
Modelling Symbolic Music: Beyond the Piano Roll
Authors:
Christian Walder
Abstract:
In this paper, we consider the problem of probabilistically modelling symbolic music data. We introduce a representation which reduces polyphonic music to a univariate categorical sequence. In this way, we are able to apply state of the art natural language processing techniques, namely the long short-term memory sequence model. The representation we employ permits arbitrary rhythmic structure, wh…
▽ More
In this paper, we consider the problem of probabilistically modelling symbolic music data. We introduce a representation which reduces polyphonic music to a univariate categorical sequence. In this way, we are able to apply state of the art natural language processing techniques, namely the long short-term memory sequence model. The representation we employ permits arbitrary rhythmic structure, which we assume to be given. We show that our model is effective on four out of four piano roll based benchmark datasets. We further improve our model by augmenting our training data set with transpositions of the original pieces through all musical keys, thereby convincingly advancing the state of the art on these benchmark problems. We also fit models to music which is unconstrained in its rhythmic structure, discuss the properties of this model, and provide musical samples which are more sophisticated than previously possible with this class of recurrent neural network sequence models. We also provide our newly preprocessed data set of non piano-roll music data.
△ Less
Submitted 4 June, 2016;
originally announced June 2016.
-
Rank k Cholesky Up/Down-dating on the GPU: gpucholmodV0.2
Authors:
Christian Walder
Abstract:
In this note we briefly describe our Cholesky modification algorithm for streaming multiprocessor architectures. Our implementation is available in C++ with Matlab binding, using CUDA to utilise the graphics processing unit (GPU). Limited speed ups are possible due to the bandwidth bound nature of the problem. Furthermore, a complex dependency pattern must be obeyed, requiring multiple kernels to…
▽ More
In this note we briefly describe our Cholesky modification algorithm for streaming multiprocessor architectures. Our implementation is available in C++ with Matlab binding, using CUDA to utilise the graphics processing unit (GPU). Limited speed ups are possible due to the bandwidth bound nature of the problem. Furthermore, a complex dependency pattern must be obeyed, requiring multiple kernels to be launched. Nonetheless, this makes for an interesting problem, and our approach can reduce the computation time by a factor of around 7 for matrices of size 5000 by 5000 and k=16, in comparison with the LINPACK suite running on a CPU of comparable vintage. Much larger problems can be handled however due to the O(n) scaling in required GPU memory of our method.
△ Less
Submitted 4 November, 2010;
originally announced November 2010.
-
Semi-Supervised Kernel PCA
Authors:
Christian Walder,
Ricardo Henao,
Morten Mørup,
Lars Kai Hansen
Abstract:
We present three generalisations of Kernel Principal Components Analysis (KPCA) which incorporate knowledge of the class labels of a subset of the data points. The first, MV-KPCA, penalises within class variances similar to Fisher discriminant analysis. The second, LSKPCA is a hybrid of least squares regression and kernel PCA. The final LR-KPCA is an iteratively reweighted version of the previous…
▽ More
We present three generalisations of Kernel Principal Components Analysis (KPCA) which incorporate knowledge of the class labels of a subset of the data points. The first, MV-KPCA, penalises within class variances similar to Fisher discriminant analysis. The second, LSKPCA is a hybrid of least squares regression and kernel PCA. The final LR-KPCA is an iteratively reweighted version of the previous which achieves a sigmoid loss function on the labeled points. We provide a theoretical risk bound as well as illustrative experiments on real and toy data sets.
△ Less
Submitted 8 August, 2010;
originally announced August 2010.