Document management and text processing

Applied Filters

People

Publications

Publication Date

Past 5 years

Searched The ACM Guide to Computing Literature (3,848,811 records)|Limit your search to The ACM Full-Text Collection (776,711 records)

Showing 1 - 14of14 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
Free
January 2023
MAUVE scores for generative models: theory and practice
The Journal of Machine Learning Research (JMLR), Volume 24, Issue 1Article No.: 356, Pages 17105–17196

Generative artificial intelligence has made significant strides, producing text indistinguishable from human prose and remarkably photorealistic images. Automatically measuring how close the generated data distribution is to the target distribution is ...
0
116
Metrics
Total Citations0
Total Downloads116
Last 12 Months116
Last 6 weeks14
View online with eReader
PDF
research-article
Free
January 2022
A statistical approach for optimal topic model identification
- Craig M. Lewis,
- Francesco Grossetti
The Journal of Machine Learning Research (JMLR), Volume 23, Issue 1Article No.: 58, Pages 2553–2572

Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent structures in a corpus of documents. This paper addresses the ongoing concern that formal procedures for determining the optimal LDA configuration do not ...
0
157
Metrics
Total Citations0
Total Downloads157
Last 12 Months79
Last 6 weeks13
View online with eReader
PDF
article
Free
June 2012
Confidence-weighted linear classification for text categorization
The Journal of Machine Learning Research (JMLR), Volume 13, Issue 1Pages 1891–1926

Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as ...
10
434
Metrics
Total Citations10
Total Downloads434
Last 12 Months53
Last 6 weeks3
View online with eReader
PDF
article
Free
June 2012
Confidence-weighted linear classification for text categorization
The Journal of Machine Learning Research (JMLR), Volume 13Pages 1891–1926

Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as ...
13
427
Metrics
Total Citations13
Total Downloads427
Last 12 Months42
Last 6 weeks5
View online with eReader
PDF
article
Free
February 2012
Bounding the probability of error for high precision optical character recognition
The Journal of Machine Learning Research (JMLR), Volume 13, Issue 1Pages 363–387

We consider a model for which it is important, early in processing, to estimate some variables with high precision, but perhaps at relatively low recall. If some variables can be identified with near certainty, they can be conditioned upon, allowing ...
0
221
Metrics
Total Citations0
Total Downloads221
Last 12 Months43
Last 6 weeks11
View online with eReader
PDF
article
Free
February 2012
Bounding the probability of error for high precision optical character recognition
The Journal of Machine Learning Research (JMLR), Volume 13Pages 363–387

We consider a model for which it is important, early in processing, to estimate some variables with high precision, but perhaps at relatively low recall. If some variables can be identified with near certainty, they can be conditioned upon, allowing ...
0
249
Metrics
Total Citations0
Total Downloads249
Last 12 Months47
Last 6 weeks8
View online with eReader
PDF
article
Free
December 2007
The Locally Weighted Bag of Words Framework for Document Representation
The Journal of Machine Learning Research (JMLR), Volume 8Pages 2405–2441

The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present an effective sequential document ...
15
178
Metrics
Total Citations15
Total Downloads178
Last 12 Months41
Last 6 weeks10
View online with eReader
PDF
article
Free
December 2007
Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization
- Evgeniy Gabrilovich,
- Shaul Markovitch
The Journal of Machine Learning Research (JMLR), Volume 8Pages 2297–2345

Most existing methods for text categorization employ induction algorithms that use the words appearing in the training documents as features. While they perform well in many categorization tasks, these methods are inherently limited when faced with more ...
21
346
Metrics
Total Citations21
Total Downloads346
Last 12 Months60
Last 6 weeks13
View online with eReader
PDF
article
Free
December 2006
Spam Filtering Based On The Analysis Of Text Information Embedded Into Images
The Journal of Machine Learning Research (JMLR), Volume 7Pages 2699–2720

In recent years anti-spam filters have become necessary tools for Internet service providers to face up to the continuously growing spam phenomenon. Current server-side anti-spam filters are made up of several modules aimed at detecting different ...
23
910
Metrics
Total Citations23
Total Downloads910
Last 12 Months41
Last 6 weeks6
View online with eReader
PDF
article
Free
December 2006
Active Learning with Feedback on Features and Instances
The Journal of Machine Learning Research (JMLR), Volume 7Pages 1655–1686

We extend the traditional active learning framework to include feedback on features in addition to labeling instances, and we execute a careful study of the effects of feature selection and human feedback on features in the setting of text ...
68
842
Metrics
Total Citations68
Total Downloads842
Last 12 Months52
Last 6 weeks12
View online with eReader
PDF
article
Free
December 2006
Kernel-Based Learning of Hierarchical Multilabel Classification Models
The Journal of Machine Learning Research (JMLR), Volume 7Pages 1601–1626

We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the ...
70
649
Metrics
Total Citations70
Total Downloads649
Last 12 Months69
Last 6 weeks11
View online with eReader
PDF
article
Free
December 2004
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research (JMLR), Volume 5Pages 361–397

Reuters Corpus Volume I (RCV1) is an archive of over 800,000 manually categorized newswire stories recently made available by Reuters, Ltd. for research purposes. Use of this data for research on text categorization requires a detailed understanding of ...
815
3,755
Metrics
Total Citations815
Total Downloads3,755
Last 12 Months227
Last 6 weeks37
View online with eReader
PDF
article
Free
March 2003
Finding the most interesting patterns in a database quickly by using sequential sampling
- Tobias Scheffer,
- Stefan Wrobel
The Journal of Machine Learning Research (JMLR), Volume 3Pages 833–862

Many discovery problems, e.g. subgroup or association rule discovery, can naturally be cast as n-best hypotheses problems where the goal is to find the n hypotheses from a given hypothesis space that score best according to a certain utility function. ...
29
512
Metrics
Total Citations29
Total Downloads512
Last 12 Months46
Last 6 weeks13
View online with eReader
PDF
article
Free
March 2002
Text classification using string kernels
The Journal of Machine Learning Research (JMLR), Volume 2Pages 419–444https://doi.org/10.1162/153244302760200687

We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length <em>k</em>. A subsequence is any ordered sequence of <em>k</em> ...
310
3,763
Metrics
Total Citations310
Total Downloads3,763
Last 12 Months75
Last 6 weeks8
View online with eReader
PDF

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

All Publications

Content Type

Media Formats

Publisher

Publication Date

Results

MAUVE scores for generative models: theory and practice

A statistical approach for optimal topic model identification

Confidence-weighted linear classification for text categorization

Confidence-weighted linear classification for text categorization

Bounding the probability of error for high precision optical character recognition

Bounding the probability of error for high precision optical character recognition

The Locally Weighted Bag of Words Framework for Document Representation

Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

Active Learning with Feedback on Features and Instances

Kernel-Based Learning of Hierarchical Multilabel Classification Models

RCV1: A New Benchmark Collection for Text Categorization Research

Finding the most interesting patterns in a database quickly by using sequential sampling

Text classification using string kernels