Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleJuly 2004
Reliability and verification of natural language text on the world wide web (abstract only)
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPage 603https://doi.org/10.1145/1008992.1009149The hypothesis that information on the Web can be verified automatically, with minimal user interaction, will be tested by building and evaluating an interactive system. In this paper, verification is defined as a reasonable determination of the truth ...
- ArticleJuly 2004
Information extraction using two-phase pattern discovery
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 534–535https://doi.org/10.1145/1008992.1009107This paper presents a new two-phase pattern (2PP) discovery technique for information extraction. 2PP consists of orthographic pattern discovery (OPD) and semantic pattern discovery (SPD) where the OPD determines the structural features from an ...
- ArticleJuly 2004
Collaborative filing in a document repository
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 518–519https://doi.org/10.1145/1008992.1009099We introduce an emergent, collaborative filing system. In such a system, an individual is allowed to organize a subset of documents in a repository into a personal hierarchy and share the hierarchy with others. The system generates a "consensus" ...
- ArticleJuly 2004
Web taxonomy integration through co-bootstrapping
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 410–417https://doi.org/10.1145/1008992.1009062We address the problem of integrating objects from a source taxonomy into a master taxonomy. This problem is not only currently pervasive on the web, but also important to the emerging semantic web. A straightforward approach to automating this process ...
- ArticleJuly 2004
Restrictive clustering and metaclustering for self-organizing document collections
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 226–233https://doi.org/10.1145/1008992.1009032This paper addresses the problem of automatically structuring heterogenous document collections by using clustering methods. In contrast to traditional clustering, we study restrictive methods and ensemble-based meta methods that may decide to leave out ...
- ArticleJuly 2004
Document clustering via adaptive subspace iteration
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 218–225https://doi.org/10.1145/1008992.1009031Document clustering has long been an important problem in information retrieval. In this paper, we present a new clustering algorithm ASI1 , which uses explicitly modeling of the subspace structure associated with each cluster. ASI simultaneously ...
- ArticleJuly 2004
Learning to cluster web search results
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 210–217https://doi.org/10.1145/1008992.1009030Organizing Web search results into clusters facilitates users' quick browsing through search results. Traditional clustering techniques are inadequate since they don't generate clusters with highly readable names. In this paper, we reformalize the ...
- ArticleJuly 2004
Document clustering by concept factorization
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 202–209https://doi.org/10.1145/1008992.1009029In this paper, we propose a new data clustering method called concept factorization that models each concept as a linear combination of the data points, and each data point as a linear combination of the concepts. With this model, the data clustering ...
- ArticleJuly 2004
Corpus structure, language models, and ad hoc information retrieval
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 194–201https://doi.org/10.1145/1008992.1009027Most previous work on the recently developed language-modeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into account the structure of the surrounding corpus. We propose a novel ...
- ArticleJuly 2004
GaP: a factor model for discrete data
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 122–129https://doi.org/10.1145/1008992.1009016We present a probabilistic model for a document corpus that combines many of the desirable features of previous models. The model is called "GaP" for Gamma-Poisson, the distributions of the first and last random variable. GaP is a factor model, that is ...
- ArticleJuly 2004
On scaling latent semantic indexing for large peer-to-peer systems
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPages 112–121https://doi.org/10.1145/1008992.1009014The exponential growth of data demands scalable infrastructures capable of indexing and searching rich content such as text, music, and images. A promising direction is to combine information re-trieval with peer-to-peer technology for scalability, ...
- ArticleJuly 2004
Chemoinformatics: an application domain for information retrieval techniques
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPage 393https://doi.org/10.1145/1008992.1008994Chemoinformatics is the generic name for the techniques used to represent, store and process information about the two-dimensional (2D) and three-dimensional (3D) structures of chemical molecules [1, 2]. Chemoinformatics has attracted much recent ...
- ArticleJuly 2004
Challenges in using lifetime personal information stores
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalPage 1https://doi.org/10.1145/1008992.1008993Within five years, our personal computers with terabyte disk drives will be able to store everything we read, write, hear, and many of the images we see including video. Vannevar Bush outlined such a system in his famous 1945 Memex article [1]. For the ...