Data mining

Applied Filters

People

Publications

Conferences

Publication Date

13 Results for: Book/Issue: SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,784,770 records)|Limit your search to The ACM Full-Text Collection (765,874 records)

Showing 1 - 13of13 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

Article
July 2004
Reliability and verification of natural language text on the world wide web (abstract only)
- Melanie J. Martin
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPage 603https://doi.org/10.1145/1008992.1009149

The hypothesis that information on the Web can be verified automatically, with minimal user interaction, will be tested by building and evaluating an interactive system. In this paper, verification is defined as a reasonable determination of the truth ...
2
Metrics
Total Citations2
Article
July 2004
Information extraction using two-phase pattern discovery
- Liping Ma,
- John Shepherd
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 534–535https://doi.org/10.1145/1008992.1009107

This paper presents a new two-phase pattern (2PP) discovery technique for information extraction. 2PP consists of orthographic pattern discovery (OPD) and semantic pattern discovery (SPD) where the OPD determines the structural features from an ...
3
608
Metrics
Total Citations3
Total Downloads608
Last 12 Months0
Last 6 weeks0
Get Access
Article
July 2004
Collaborative filing in a document repository
- Harris Wu,
- Michael D. Gordon
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 518–519https://doi.org/10.1145/1008992.1009099

We introduce an emergent, collaborative filing system. In such a system, an individual is allowed to organize a subset of documents in a repository into a personal hierarchy and share the hierarchy with others. The system generates a "consensus" ...
5
449
Metrics
Total Citations5
Total Downloads449
Last 12 Months2
Last 6 weeks1
Get Access
Article
July 2004
Web taxonomy integration through co-bootstrapping
- Dell Zhang,
- Wee Sun Lee
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 410–417https://doi.org/10.1145/1008992.1009062

We address the problem of integrating objects from a source taxonomy into a master taxonomy. This problem is not only currently pervasive on the web, but also important to the emerging semantic web. A straightforward approach to automating this process ...
16
934
Metrics
Total Citations16
Total Downloads934
Last 12 Months2
Last 6 weeks0
Get Access
Article
July 2004
Restrictive clustering and metaclustering for self-organizing document collections
- Stefan Siersdorfer,
- Sergej Sizov
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 226–233https://doi.org/10.1145/1008992.1009032

This paper addresses the problem of automatically structuring heterogenous document collections by using clustering methods. In contrast to traditional clustering, we study restrictive methods and ensemble-based meta methods that may decide to leave out ...
14
940
Metrics
Total Citations14
Total Downloads940
Last 12 Months2
Last 6 weeks0
Get Access
Article
July 2004
Document clustering via adaptive subspace iteration
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 218–225https://doi.org/10.1145/1008992.1009031

Document clustering has long been an important problem in information retrieval. In this paper, we present a new clustering algorithm ASI¹ , which uses explicitly modeling of the subspace structure associated with each cluster. ASI simultaneously ...
107
1,976
Metrics
Total Citations107
Total Downloads1,976
Last 12 Months7
Last 6 weeks1
Get Access
Article
July 2004
Learning to cluster web search results
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 210–217https://doi.org/10.1145/1008992.1009030

Organizing Web search results into clusters facilitates users' quick browsing through search results. Traditional clustering techniques are inadequate since they don't generate clusters with highly readable names. In this paper, we reformalize the ...
384
4,452
Metrics
Total Citations384
Total Downloads4,452
Last 12 Months40
Last 6 weeks4
Get Access
Article
July 2004
Document clustering by concept factorization
- Wei Xu,
- Yihong Gong
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 202–209https://doi.org/10.1145/1008992.1009029

In this paper, we propose a new data clustering method called concept factorization that models each concept as a linear combination of the data points, and each data point as a linear combination of the concepts. With this model, the data clustering ...
217
2,476
Metrics
Total Citations217
Total Downloads2,476
Last 12 Months37
Last 6 weeks3
Get Access
Article
July 2004
Corpus structure, language models, and ad hoc information retrieval
- Oren Kurland,
- Lillian Lee
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 194–201https://doi.org/10.1145/1008992.1009027

Most previous work on the recently developed language-modeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into account the structure of the surrounding corpus. We propose a novel ...
118
1,332
Metrics
Total Citations118
Total Downloads1,332
Last 12 Months14
Last 6 weeks0
Get Access
Article
July 2004
GaP: a factor model for discrete data
- John Canny
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 122–129https://doi.org/10.1145/1008992.1009016

We present a probabilistic model for a document corpus that combines many of the desirable features of previous models. The model is called "GaP" for Gamma-Poisson, the distributions of the first and last random variable. GaP is a factor model, that is ...
86
1,195
Metrics
Total Citations86
Total Downloads1,195
Last 12 Months31
Last 6 weeks4
Get Access
Article
July 2004
On scaling latent semantic indexing for large peer-to-peer systems
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 112–121https://doi.org/10.1145/1008992.1009014

The exponential growth of data demands scalable infrastructures capable of indexing and searching rich content such as text, music, and images. A promising direction is to combine information re-trieval with peer-to-peer technology for scalability, ...
47
1,353
Metrics
Total Citations47
Total Downloads1,353
Last 12 Months5
Last 6 weeks0
Get Access
Article
July 2004
Chemoinformatics: an application domain for information retrieval techniques
- Peter Willett
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPage 393https://doi.org/10.1145/1008992.1008994

Chemoinformatics is the generic name for the techniques used to represent, store and process information about the two-dimensional (2D) and three-dimensional (3D) structures of chemical molecules [1, 2]. Chemoinformatics has attracted much recent ...
1
653
Metrics
Total Citations1
Total Downloads653
Last 12 Months0
Last 6 weeks0
Get Access
Article
July 2004
Challenges in using lifetime personal information stores
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPage 1https://doi.org/10.1145/1008992.1008993

Within five years, our personal computers with terabyte disk drives will be able to store everything we read, write, hear, and many of the images we see including video. Vannevar Bush outlined such a system in his famous 1945 Memex article [1]. For the ...
4
836
Metrics
Total Citations4
Total Downloads836
Last 12 Months5
Last 6 weeks0
Get Access

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Reliability and verification of natural language text on the world wide web (abstract only)

Information extraction using two-phase pattern discovery

Collaborative filing in a document repository

Web taxonomy integration through co-bootstrapping

Restrictive clustering and metaclustering for self-organizing document collections

Document clustering via adaptive subspace iteration

Learning to cluster web search results

Document clustering by concept factorization

Corpus structure, language models, and ad hoc information retrieval

GaP: a factor model for discrete data

On scaling latent semantic indexing for large peer-to-peer systems

Chemoinformatics: an application domain for information retrieval techniques

Challenges in using lifetime personal information stores