JMLR: Vol 5, No

Volume 512/1/2004

Volume 5

12/1/2004

Publisher:

JMLR.org

ISSN:1532-4435

EISSN:1533-7928

Tags:

Bibliometrics

Select All

Export Citations Save to Binder

article

Free

Learning Rates for Q-learning

Pages 1–25

In this paper we derive convergence rates for Q-learning. We show an interesting relationship between the convergence rate and the learning rate used in Q-learning. For a polynomial learning rate, one which is 1/t^ω at time t where ω∈(1/2,1), we show ...

article

Free

Learning the Kernel Matrix with Semidefinite Programming

Pages 27–72

Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of ...

article

Free

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

Pages 73–99

We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable Y from an explanatory variable X, we treat the problem of dimensionality ...

article

Free

In Defense of One-Vs-All Classification

Pages 101–141

We consider the problem of multiclass classification. Our main thesis is that a simple "one-vs-all" scheme is as accurate as any other approach, assuming that the underlying binary classifiers are well-tuned regularized classifiers such as support ...

article

Free

Lossless Online Bayesian Bagging

Pages 143–151

Bagging frequently improves the predictive performance of a model. An online version has recently been introduced, which attempts to gain the benefits of an online algorithm while approximating regular bagging. However, regular online bagging is an ...

article

Free

Subgroup Discovery with CN2-SD

Pages 153–188

This paper investigates how to adapt standard classification rule learning approaches to subgroup discovery. The goal of subgroup discovery is to find rules describing subsets of the population that are sufficiently large and statistically unusual. The ...

article

Free

Generalization Error Bounds for Threshold Decision Lists

Martin Anthony

Pages 189–217

In this paper we consider the generalization accuracy of classification methods based on the iterative use of linear classifiers. The resulting classifiers, which we call threshold decision lists act as follows. Some points of the data set to be ...

article

Free

On the Importance of Small Coordinate Projections

Pages 219–238

It has been recently shown that sharp generalization bounds can be obtained when the function class from which the algorithm chooses its hypotheses is "small" in the sense that the Rademacher averages of this function class are small. We show that a new ...

article

Free

Weather Data Mining Using Independent Component Analysis

Pages 239–253

In this article, we apply the independent component analysis technique for mining spatio-temporal data. The technique has been applied to mine for patterns in weather data using the North Atlantic Oscillation (NAO) as a specific example. We find that ...

article

Free

Online Choice of Active Learning Algorithms

Pages 255–291

This work is concerned with the question of how to combine online an ensemble of active learners so as to expedite the learning progress in pool-based active learning. We develop an active-learning master algorithm, based on a known competitive ...

article

Free

A Compression Approach to Support Vector Model Selection

Pages 293–323

In this paper we investigate connections between statistical learning theory and data compression on the basis of support vector machine (SVM) model selection. Inspired by several generalization bounds we construct "compression coefficients" for SVMs ...

article

Free

A Geometric Approach to Multi-Criterion Reinforcement Learning

Pages 325–360

We consider the problem of reinforcement learning in a controlled Markov environment with multiple objective functions of the long-term average reward type. The environment is initially unknown, and furthermore may be affected by the actions of other ...

article

Free

RCV1: A New Benchmark Collection for Text Categorization Research

Pages 361–397

Reuters Corpus Volume I (RCV1) is an archive of over 800,000 manually categorized newswire stories recently made available by Reuters, Ltd. for research purposes. Use of this data for research on text categorization requires a detailed understanding of ...

article

Free

Distributional Scaling: An Algorithm for Structure-Preserving Embedding of Metric and Nonmetric Spaces

Pages 399–420

We present a novel approach for embedding general metric and nonmetric spaces into low-dimensional Euclidean spaces. As opposed to traditional multidimensional scaling techniques, which minimize the distortion of pairwise distances, our embedding ...

article

Free

Learning Ensembles from Bites: A Scalable and Accurate Approach

Pages 421–451

Bagging and boosting are two popular ensemble methods that typically achieve better accuracy than a single classifier. These techniques have limitations on massive data sets, because the size of the data set can be a bottleneck. Voting many classifiers ...

article

Free

Robust Principal Component Analysis with Adaptive Selection for Tuning Parameters

Pages 453–471

The present paper discusses robustness against outliers in a principal component analysis (PCA). We propose a class of procedures for PCA based on the minimum psi principle, which unifies various approaches, including the classical procedure and ...

article

Free

PAC-learnability of Probabilistic Deterministic Finite State Automata

Pages 473–497

We study the learnability of Probabilistic Deterministic Finite State Automata under a modified PAC-learning criterion. We argue that it is necessary to add additional parameters to the sample complexity polynomial, namely a bound on the expected length ...

article

Free

Sources of Success for Boosted Wrapper Induction

Pages 499–527

In this paper, we examine an important recent rule-based information extraction (IE) technique named Boosted Wrapper Induction (BWI) by conducting experiments on a wider variety of tasks than previously studied, including tasks using several collections ...

article

Free

Computable Shell Decomposition Bounds

Pages 529–547

Haussler, Kearns, Seung and Tishby introduced the notion of a shell decomposition of the union bound as a means of understanding certain empirical phenomena in learning curves such as phase transitions. Here we use a variant of their ideas to derive an ...

article

Free

Exact Bayesian Structure Discovery in Bayesian Networks

Pages 549–573

Learning a Bayesian network structure from data is a well-motivated but computationally hard task. We present an algorithm that computes the exact posterior probability of a subnetwork, e.g., a directed edge; a modified version of the algorithm finds ...

article

Free

A Universal Well-Calibrated Algorithm for On-line Classification

Vladimir Vovk

Pages 575–604

We study the problem of on-line classification in which the prediction algorithm, for each "significance level" δ, is required to output as its prediction a range of labels (intuitively, those labels deemed compatible with the available data at the ...

article

Free

New Techniques for Disambiguation in Natural Language and Their Application to Biological Text

Pages 605–621

We study the problems of disambiguation in natural language, focusing on the problem of gene vs. protein name disambiguation in biological text and also considering the problem of context-sensitive spelling error correction. We introduce a new family of ...

article

Free

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

Pages 623–648

We consider the multi-armed bandit problem under the PAC ("probably approximately correct") model. It was shown by Even-Dar et al. (2002) that given n arms, a total of O((n/ε²)log(1/δ)) trials suffices in order to find an ε-optimal arm with probability ...

article

Free

Preference Elicitation and Query Learning

Pages 649–667

In this paper we explore the relationship between "preference elicitation", a learning-style problem that arises in combinatorial auctions, and the problem of learning via queries studied in computational learning theory. Preference elicitation is the ...

article

Free

Distance--Based Classification with Lipschitz Functions

Pages 669–695

The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision ...

article

Free

Hierarchical Latent Class Models for Cluster Analysis

Nevin L. Zhang

Pages 697–723

Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known ...

article

Free

Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods

Pages 725–775

Bias-variance analysis provides a tool to study learning algorithms and can be used to properly design ensemble methods well tuned to the properties of a specific base learner. Indeed the effectiveness of ensemble methods critically depends on accuracy, ...

article

Free

A Fast Algorithm for Joint Diagonalization with Non-orthogonal Transformations and its Application to Blind Source Separation

Pages 777–800

A new efficient algorithm is presented for joint diagonalization of several matrices. The algorithm is based on the Frobenius-norm formulation of the joint diagonalization problem, and addresses diagonalization with a general, non-orthogonal ...

article

Free

Feature Discovery in Non-Metric Pairwise Data

Pages 801–818

Pairwise proximity data, given as similarity or dissimilarity matrix, can violate metricity. This occurs either due to noise, fallible estimates, or due to intrinsic non-metric features such as they arise from human judgments. So far the problem of non-...

article

Free

Probability Product Kernels

Pages 819–844

The advantages of discriminative learning algorithms and kernel machines are combined with generative modeling using a novel kernel between distributions. In the probability product kernel, data points in the input space are mapped to distributions over ...

The Journal of Machine Learning Research

Sections

Learning Rates for Q-learning

Learning the Kernel Matrix with Semidefinite Programming

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

In Defense of One-Vs-All Classification

Lossless Online Bayesian Bagging

Subgroup Discovery with CN2-SD

Generalization Error Bounds for Threshold Decision Lists

On the Importance of Small Coordinate Projections

Weather Data Mining Using Independent Component Analysis

Online Choice of Active Learning Algorithms

A Compression Approach to Support Vector Model Selection

A Geometric Approach to Multi-Criterion Reinforcement Learning

RCV1: A New Benchmark Collection for Text Categorization Research

Distributional Scaling: An Algorithm for Structure-Preserving Embedding of Metric and Nonmetric Spaces

Learning Ensembles from Bites: A Scalable and Accurate Approach

Robust Principal Component Analysis with Adaptive Selection for Tuning Parameters

PAC-learnability of Probabilistic Deterministic Finite State Automata

Sources of Success for Boosted Wrapper Induction

Computable Shell Decomposition Bounds

Exact Bayesian Structure Discovery in Bayesian Networks

A Universal Well-Calibrated Algorithm for On-line Classification

New Techniques for Disambiguation in Natural Language and Their Application to Biological Text

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

Preference Elicitation and Query Learning

Distance--Based Classification with Lipschitz Functions

Hierarchical Latent Class Models for Cluster Analysis

Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods

A Fast Algorithm for Joint Diagonalization with Non-orthogonal Transformations and its Application to Blind Source Separation

Feature Discovery in Non-Metric Pairwise Data

Probability Product Kernels

Sections

Save to Binder

Subjects

Comments