Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJanuary 2023
MAUVE scores for generative models: theory and practice
- Krishna Pillutla,
- Lang Liu,
- John Thickstun,
- Sean Welleck,
- Swabha Swayamdipta,
- Rowan Zellers,
- Sewoong Oh,
- Yejin Choi,
- Zaid Harchaoui
The Journal of Machine Learning Research (JMLR), Volume 24, Issue 1Article No.: 356, Pages 17105–17196Generative artificial intelligence has made significant strides, producing text indistinguishable from human prose and remarkably photorealistic images. Automatically measuring how close the generated data distribution is to the target distribution is ...
- research-articleJanuary 2022
A statistical approach for optimal topic model identification
The Journal of Machine Learning Research (JMLR), Volume 23, Issue 1Article No.: 58, Pages 2553–2572Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent structures in a corpus of documents. This paper addresses the ongoing concern that formal procedures for determining the optimal LDA configuration do not ...
- articleJune 2012
Confidence-weighted linear classification for text categorization
Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as ...
- articleJune 2012
Confidence-weighted linear classification for text categorization
The Journal of Machine Learning Research (JMLR), Volume 13Pages 1891–1926Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as ...
- articleFebruary 2012
Bounding the probability of error for high precision optical character recognition
We consider a model for which it is important, early in processing, to estimate some variables with high precision, but perhaps at relatively low recall. If some variables can be identified with near certainty, they can be conditioned upon, allowing ...
- articleFebruary 2012
Bounding the probability of error for high precision optical character recognition
We consider a model for which it is important, early in processing, to estimate some variables with high precision, but perhaps at relatively low recall. If some variables can be identified with near certainty, they can be conditioned upon, allowing ...
- articleDecember 2007
The Locally Weighted Bag of Words Framework for Document Representation
The Journal of Machine Learning Research (JMLR), Volume 8Pages 2405–2441The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present an effective sequential document ...
- articleDecember 2007
Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization
The Journal of Machine Learning Research (JMLR), Volume 8Pages 2297–2345Most existing methods for text categorization employ induction algorithms that use the words appearing in the training documents as features. While they perform well in many categorization tasks, these methods are inherently limited when faced with more ...
- articleDecember 2006
Spam Filtering Based On The Analysis Of Text Information Embedded Into Images
The Journal of Machine Learning Research (JMLR), Volume 7Pages 2699–2720In recent years anti-spam filters have become necessary tools for Internet service providers to face up to the continuously growing spam phenomenon. Current server-side anti-spam filters are made up of several modules aimed at detecting different ...
- articleDecember 2006
Active Learning with Feedback on Features and Instances
The Journal of Machine Learning Research (JMLR), Volume 7Pages 1655–1686We extend the traditional active learning framework to include feedback on features in addition to labeling instances, and we execute a careful study of the effects of feature selection and human feedback on features in the setting of text ...
- articleDecember 2006
Kernel-Based Learning of Hierarchical Multilabel Classification Models
The Journal of Machine Learning Research (JMLR), Volume 7Pages 1601–1626We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the ...
- articleDecember 2004
RCV1: A New Benchmark Collection for Text Categorization Research
Reuters Corpus Volume I (RCV1) is an archive of over 800,000 manually categorized newswire stories recently made available by Reuters, Ltd. for research purposes. Use of this data for research on text categorization requires a detailed understanding of ...
- articleMarch 2003
Finding the most interesting patterns in a database quickly by using sequential sampling
Many discovery problems, e.g. subgroup or association rule discovery, can naturally be cast as n-best hypotheses problems where the goal is to find the n hypotheses from a given hypothesis space that score best according to a certain utility function. ...
- articleMarch 2002
Text classification using string kernels
The Journal of Machine Learning Research (JMLR), Volume 2Pages 419–444https://doi.org/10.1162/153244302760200687We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length <em>k</em>. A subsequence is any ordered sequence of <em>k</em> ...