Distance metric learning with eigenvalue optimization
The main theme of this paper is to develop a novel eigenvalue optimization framework for learning a Mahalanobis metric. Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known ...
Conditional likelihood maximisation: a unifying framework for information theoretic feature selection
We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit ...
Plug-in approach to active learning
We present a new active learning algorithm based on nonparametric estimators of the regression function. Our investigation provides probabilistic bounds for the rates of convergence of the generalization error achievable by proposed method over a broad ...
Refinement of operator-valued reproducing kernels
This paper studies the construction of a refinement kernel for a given operator-valued reproducing kernel such that the vector-valued reproducing kernel Hilbert space of the refinement kernel contains that of the given kernel as a subspace. The study is ...
An active learning algorithm for ranking from pairwise preferences with an almost optimal query complexity
Given a set V of n elements we wish to linearly order them given pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise).
The goal is to linearly order the elements while disagreeing with as few pairwise ...
Optimal distributed online prediction using mini-batches
Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with ...
Active clustering of biological sequences
Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point ...
Multi kernel learning with online-batch optimization
In recent years there has been a lot of interest in designing principled classification algorithms over multiple cues, based on the intuitive notion that using more features should lead to better performance. In the domain of kernel methods, a ...
Active learning via perfect selective classification
We discover a strong relation between two known learning models: stream-based active learning and perfect selective classification (an extreme case of 'classification with a reject option'). For these models, restricted to the realizable case, we show a ...
Random search for hyper-parameter optimization
Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. ...
Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics
We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a finite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, ...
Bounding the probability of error for high precision optical character recognition
We consider a model for which it is important, early in processing, to estimate some variables with high precision, but perhaps at relatively low recall. If some variables can be identified with near certainty, they can be conditioned upon, allowing ...
Minimax-optimal rates for sparse additive models over kernel classes via convex programming
Sparse additive models are families of d-variate functions with the additive decomposition f* = Σj∈S fj*, where S is an unknown subset of cardinality s < d. In this paper, we consider the case where each univariate component function fj* lies in a ...
Online learning in the embedded manifold of low-rank matrices
When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches to minimizing ...
Multi-assignment clustering for boolean data
We propose a probabilistic model for clustering Boolean data where an object can be simultaneously assigned to multiple clusters. By explicitly modeling the underlying generative process that combines the individual source emissions, highly structured ...
Eliminating spammers and ranking annotators for crowdsourced labeling tasks
With the advent of crowdsourcing services it has become quite cheap and reasonably effective to get a data set labeled by multiple annotators in a short amount of time. Various methods have been proposed to estimate the consensus labels by correcting ...
Metric and kernel learning using a linear transformation
Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the ...
MULTIBOOST: a multi-purpose boosting package
The MULTIBOOST package provides a fast C++ implementation of multi-class/multi-label/multitask boosting algorithms. It is based on ADABOOST.MH but it also implements popular cascade classifiers and FILTERBOOST. The package contains common multi-class ...
ML-Flex: a flexible toolbox for performing classification analyses in parallel
Motivated by a need to classify high-dimensional, heterogeneous data from the bioinformatics domain, we developed ML-Flex, a machine-learning toolbox that enables users to perform two-class and multi-class classification analyses in a systematic yet ...
A primal-dual convergence analysis of boosting
Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad ...
Non-sparse multiple kernel fisher discriminant analysis
Sparsity-inducing multiple kernel Fisher discriminant analysis (MK-FDA) has been studied in the literature. Building on recent advances in non-sparse multiple kernel learning (MKL), we propose a non-sparse version of MK-FDA, which imposes a general lp ...
Learning algorithms for the classification restricted Boltzmann machine
Recent developments have demonstrated the capacity of restricted Boltzmann machines (RBM) to be powerful generative models, able to extract useful features from input data or construct deep artificial neural networks. In such settings, the RBM only ...
Structured sparsity and generalization
We present a data dependent generalization bound for a large class of regularized algorithms which implement structured sparsity constraints. The bound can be applied to standard squared-norm regularization, the Lasso, the group Lasso, some versions of ...
A case study on meta-generalising: a Gaussian processes approach
We propose a novel model for meta-generalisation, that is, performing prediction on novel tasks based on information from multiple different but related tasks. The model is based on two coupled Gaussian processes with structured covariance function; one ...
A kernel two-sample test
We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions ...
GPLP: a local and parallel computation toolbox for Gaussian process regression
This paper presents the Getting-started style documentation for the local and parallel computation toolbox for Gaussian process regression (GPLP), an open source software package written in Matlab (but also compatible with Octave). The working ...
Exact covariance thresholding into connected components for large-scale graphical lasso
We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected ...
Algorithms for learning kernels based on centered alignment
This paper presents new and effective algorithms for learning kernels. In particular, as shown by our empirical results, these algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon ...
Causal bounds and observable constraints for non-deterministic models
Conditional independence relations involving latent variables do not necessarily imply observable independences. They may imply inequality constraints on observable parameters and causal bounds, which can be used for falsification and identification. ...
NIMFA: a python library for nonnegative matrix factorization
NIMFA is an open-source Python library that provides a unified interface to nonnegative matrix factorization algorithms. It includes implementations of state-of-the-art factorization methods, initialization approaches, and quality scoring. It supports ...