Louis Dorard

University College London, Computer Science, Graduate Student

Followers

117

Following

Public Views

My work is in Machine Learning, in particular in online learning, bandit problems and searching large spaces. I am also interested in applications of ML techniques to the web (personalisation of content), and to music (composition, performance, music information retrieval).

In my PhD, I am studying bandit algorithms as a way to focus the exploration of large spaces and to make search as quick as possible. I am especially interested in the use of probabilistic models (Gaussian Processes) for the search of tree-structured spaces, such as sequences of possible actions to take in an environment (for planning in Markov Decision Processes).

I am applying these techniques to:
- Content-Based Image Retrieval with Relevance Feedback
- Sequence Labelling and performing music automatically, by having the computer "label" music notes with performance parameters (such as loudness and duration) in a way that renders the music expressively.

Please read my research summary for more information. I am also the main organiser of the 2011 Exploration & Exploitation Challenge and associated workshop at ICML (http://explo.cs.ucl.ac.uk/).
Supervisors: John Shawe-Taylor

less

Interests

Uploads

Papers by Louis Dorard

Bandit algorithms for searching large spaces

Bandit games consist of single-state environments in which an agent must sequentially choose acti... more Bandit games consist of single-state environments in which an agent must sequentially choose actions to take, for which rewards are given. The objective being to maximise the cumulated reward, the agent naturally seeks to build a model of the relationship between actions and rewards. The agent must both choose uncertain actions in order to improve its model (exploration), and actions that are believed to yield high rewards according to the model (exploitation). The choice of an action to take is called a play of an arm of the bandit, and the total number of plays may or may not be known in advance.

Algorithms designed to handle the exploration-exploitation dilemma were initially motivated by problems with rather small numbers of actions. But the ideas they were based on have been extended to cases where the number of actions to choose from is much larger than the maximum possible number of plays. Several problems fall into this setting, such as information retrieval with relevance feedback, where the system must learn what a user is looking for while serving relevant documents often enough, but also global optimisation, where the search for an optimum is done by selecting where to acquire potentially expensive samples of a target function. All have in common the search of large spaces.

In this thesis, we focus on an algorithm based on the Gaussian Processes probabilistic model, often used in Bayesian optimisation, and the Upper Confidence Bound action-selection heuristic that is popular in bandit algorithms. In addition to demonstrating the advantages of the GP-UCB algorithm on an image retrieval problem, we show how it can be adapted in order to search tree-structured spaces. We provide an efficient implementation, theoretical guarantees on the algorithm’s performance, and empirical evidence that it handles large branching factors better than previous bandit-based algorithms, on synthetic trees.

Download

Gaussian Process Modelling of Dependencies in Multi-Armed Bandit Problems

Multi-armed bandit problems, in analogy with slot machines in casinos, are problems in which one ... more Multi-armed bandit problems, in analogy with slot machines in casinos, are problems in which one has to choose actions sequentially (pull arms) in order to maximise a cumulated reward (gain), with no initial knowledge on the distribution of actions/arms’ rewards. We propose a general framework for handling dependencies across arms, based on a new assumption on the mean-reward function which is that it is drawn from a Gaussian Process (GP), with a given arm covariance matrix. We show on a toy problem that this allows to perform better than the popular UCB bandit algorithm, which considers arms to be independent.

Download

Gaussian Process Bandits: Similarities with Previous Bandit Algorithms

In multi-armed bandit problems, one has to choose actions sequentially (pull arms) in order to ma... more In multi-armed bandit problems, one has to choose actions sequentially (pull arms) in order to maximise a cumulated reward, with no initial knowledge on the distribution of actions/arms' rewards. The increasingly popular Gaussian Process Bandits framework allows to handle dependencies across arms and is based on the assumption that the mean-reward function is drawn from a Gaussian Process (GP). In this paper we describe the GPB algorithm and show that the Gaussian Process posterior mean and variance formulae are reminiscent of two previous bandit algorithms: LinRel (dependent arms) and UCB (independent arms).

Download