Statistical decision making for optimal budget allocation in crowd labeling
It has become increasingly popular to obtain machine learning labels through commercial crowdsourcing services. The crowdsourcing workers or annotators are paid for each label they provide, but the task requester usually has only a limited amount of the ...
Simultaneous pursuit of sparseness and rank structures for matrix decomposition
In multi-response regression, pursuit of two different types of structures is essential to battle the curse of dimensionality. In this paper, we seek a sparsest decomposition representation of a parameter matrix in terms of a sum of sparse and low rank ...
Statistical topological data analysis using persistence landscapes
We define a new topological summary for data that we call the persistence landscape. Since this summary lies in a vector space, it is easy to combine with tools from statistics and machine learning, in contrast to the standard topological summaries. ...
Links between multiplicity automata, observable operator models and predictive state representations: a unified learning framework
Stochastic multiplicity automata (SMA) are weighted finite automata that generalize probabilistic automata. They have been used in the context of probabilistic grammatical inference. Observable operator models (OOMs) are a generalization of hidden ...
SAMOA: scalable advanced massive online analysis
SAMOA (SCALABLE ADVANCED MASSIVE ONLINE ANALYSIS) is a platform for mining big data streams. It provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and ...
Online learning via sequential complexities
We consider the problem of sequential prediction and provide tools to study the minimax value of the associated game. Classical statistical learning theory provides several useful complexity measures to study learning with i.i.d. data. Our proposed ...
Learning transformations for clustering and classification
A low-rank transformation learning framework for subspace clustering and classification is proposed here. Many high-dimensional data, such as face images and motion sequences, approximately lie in a union of low-dimensional subspaces. The corresponding ...
Multi-layered gesture recognition with Kinect
This paper proposes a novel multi-layered gesture recognition method with Kinect. We explore the essential linguistic characters of gestures: the components concurrent character and the sequential organization character, in a multi-layered framework, ...
Multimodal gesture recognition via multiple hypotheses rescoring
We present a new framework for multimodal gesture recognition that is based on a multiple hypotheses rescoring fusion scheme. We specifically deal with a demanding Kinect-based multimodal data set, introduced in a recent gesture recognition challenge (...
An asynchronous parallel stochastic coordinate descent algorithm
We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on functions that satisfy an essential strong convexity ...
Geometric intuition and algorithms for Ev-SVM
In this work we address the Ev-SVM model proposed by Pérez-Cruz et al. as an extension of the traditional v support vector classification model (v-SVM). Through an enhancement of the range of admissible values for the regularization parameter v, the Ev-...
Composite self-concordant minimization
We propose a variable metric framework for minimizing the sum of a self-concordant function and a possibly non-smooth convex function, endowed with an easily computable proximal operator. We theoretically establish the convergence of our framework ...
Network granger causality with inherent grouping structure
The problem of estimating high-dimensional network models arises naturally in the analysis of many biological and socio-economic systems. In this work, we aim to learn a network structure from temporal panel data, employing the framework of Granger ...
Iterative and active graph clustering using trace norm minimization without cluster size constraints
This paper investigates graph clustering under the planted partition model in the presence of small clusters. Traditional results dictate that for an algorithm to provably correctly recover the underlying clusters, all clusters must be sufficiently ...
A classification module for genetic programming algorithms in JCLEC
JCLEC-Classification is a usable and extensible open source library for genetic programming classification algorithms. It houses implementations of rule-based methods for classification based on genetic programming, supporting multiple model ...
AD3: alternating directions dual decomposition for MAP inference in graphical models
We present AD3, a new algorithm for approximate maximum a posteriori (MAP) inference on factor graphs, based on the alternating directions method of multipliers. Like other dual decomposition algorithms, AD3 has a modular architecture, where local ...
Introducing CURRENNT: the Munich open-source CUDA recurrent neural network toolkit
In this article, we introduce CURRENNT, an open-source parallel implementation of deep recurrent neural networks (RNNs) supporting graphics processing units (GPUs) through NVIDIA's Computed Unified Device Architecture (CUDA). CURRENNT supports uni- and ...
The flare package for high dimensional linear regression and precision matrix estimation in R
This paper describes an R package named flare, which implements a family of new high dimensional regression methods (LAD Lasso, SQRT Lasso, lq Lasso, and Dantzig selector) and their extensions to sparse precision matrix estimation (TIGER and CLIME). ...
Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima
We provide novel theoretical results regarding local optima of regularized M-estimators, allowing for nonconvexity in both loss and penalty functions. Under restricted strong convexity on the loss and suitable regularity conditions on the penalty, we ...
Generalized hierarchical kernel learning
This paper generalizes the framework of Hierarchical Kernel Learning (HKL) and illustrates its utility in the domain of rule learning. HKL involves Multiple Kernel Learning over a set of given base kernels assumed to be embedded on a directed acyclic ...
Discrete restricted Boltzmann machines
We describe discrete restricted Boltzmann machines: probabilistic graphical models with bipartite interactions between visible and hidden discrete variables. Examples are binary restricted Boltzmann machines and discrete naïve Bayes models. We detail ...
Evolving GPU machine code
- Cleomar Pereira Da Silva,
- Douglas Mota Dias,
- Cristiana Bentes,
- Marco Aurélio Cavalcanti Pacheco,
- Leandro Fontoura Cupertino
Parallel Graphics Processing Unit (GPU) implementations of GP have appeared in the literature using three main methodologies: (i) compilation, which generates the individuals in GPU code and requires compilation; (ii) pseudo-assembly, which generates ...
A compression technique for analyzing disagreement-based active learning
We introduce a new and improved characterization of the label complexity of disagreement-based active learning, in which the leading quantity is the version space compression set size. This quantity is defined as the size of the smallest subset of the ...
Response-based approachability with applications to generalized no-regret problems
Blackwell's theory of approachability provides fundamental results for repeated games with vector-valued payoffs, which have been usefully applied in the theory of learning in games, and in devising online learning algorithms in the adversarial setup. A ...
Strong consistency of the prototype based clustering in probabilistic space
In this paper we formulate in general terms an approach to prove strong consistency of the Empirical Risk Minimisation inductive principle applied to the prototype or distance based clustering. This approach was motivated by the Divisive Information-...
Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm
We propose an extensive analysis of the behavior of majority votes in binary classification. In particular, we introduce a risk bound for majority votes, called the C-bound, that takes into account the average quality of the voters and their average ...
A statistical perspective on algorithmic leveraging
One popular method for dealing with large-scale data sets is sampling. For example, by using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales rows/columns of data ...
Distributed matrix completion and robust factorization
If learning methods are to scale to the massive sizes of modern data sets, it is essential for the field of machine learning to embrace parallel and distributed computing. Inspired by the recent development of matrix factorization methods with rich ...
Combined l1 and greedy l0 penalized least squares for linear model selection
We introduce a computationally effective algorithm for a linear model selection consisting of three steps: screening-ordering-selection (SOS). Screening of predictors is based on the thresholded Lasso that is l1 penalized least squares. The screened ...
Learning with the maximum correntropy criterion induced losses for regression
Within the statistical learning framework, this paper studies the regression model associated with the correntropy induced losses. The correntropy, as a similarity measure, has been frequently employed in signal processing and pattern recognition. ...