Abstract
The statistical measures for similarity have been widely used in textual information retrieval for many decades. They are the basis to improve the effectiveness ofIR systems, including retrieval, clustering, and summarization. We have developed an information retrieval system DualNAVI which provides users with rich interaction both in document space and in word space. We show that associative calculation for measuring similarity among documents or words is the computational basis oft his effective information access with DualNAVI. The new approaches in document clustering (Hierarchical Bayesian Clustering), and measuring term representativeness (Baseline method) are also discussed. Both have sound mathematical basis and depend essentially on associative calculation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
M. R. Anderberg. Cluster Analysis for Applications. Academic Press, 1973. 194, 195, 197
D. Butler. Souped-up search engines. Nature, 405, pages 112–115, 2000. 188
K. W. Church, and P. Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), pages 22–29, 1990. 198
R. M. Cormack. A review of classification. Journal of the Royal Statistical Society, 134:321–367, 1971. 194, 195, 197
W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189–195, 1980. 193, 194
W. B. Croft. Document representation in probabilistic models of information retrieval. Journal of the American Society for Information Science, 32(6):451–457, 1981. 194
T. Dunning. Accurate method for the statistics of surprise and coincidence. Computational Linguistics, 19(1), pages 61–74, 1993. 198
R. H. Fowler, and D. W. Dearholt. Information Retrieval Using Pathfinder Networks, chapter 12, pages 165–178, 1990. Ablex.
N. Fuhr. Models for retrieval with probabilistic indexing. Information Processing & Retrieval, 25(1):55–72, 1989. 194
A. Griffiths, L. A. Robinson, and P. Willett. Hierarchic agglomerative clustering methods for automatic document classification. Journal of Documentation, 40(3):175–205, 1984. 194, 195, 197
M. A. Hearst, and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of ACM SIGIR’96, pages 76–84, 1996.
T. Hisamitsu, Y. Niwa, and J. Tsujii. Measuring Representativeness of Terms. In Proceedings of IRAL’99, pages 83–90, 1999. 197, 198
T. Hisamitsu, Y. Niwa, and J. Tsujii. A Method of Measuring Term Representativeness. In Proceedings of COLING 2000, pages 320–326, 2000. 193, 197, 198
M. Iwayama and T. Tokunaga. Hierarchical Bayesian Clustering for Automatic Text Classification. In Proceedings of IJCAI’95, pages 1322–1327, 1995. 194, 195
N. Jardine and C. J. Van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7:217–240, 1971. 193
K. L. Kwok. Experiments with a component theory ofp robabilistic information retrieval based on single terms as document components. ACM Transactions on Information Systems, 8(4):363–386, 1990. 194
D. D. Lewis. An evaluation ofp hrasal and clustered representation on a text categorization task. In Proceedings of ACM SIGIR’92, pages 37–50, 1992. 194
M. Nagao, M. Mizutani, and H. Ikeda. An automated method of the extraction of important words from Japanese scientific documents. In Transaction of IPSJ, 17(2), pages 110–117, 1976. 198
S. Nishioka, Y. Niwa, M. Iwayama, and A. Takano. DualNAVI: An information retrieval interface. In Proceedings of JSSST WISS’97, pages 43–48, 1997. (in Japanese). 188
Y. Niwa, S. Nishioka, M. Iwayama, and A. Takano. Topic graph generation for query navigation: Use of frequency classes for topic extraction. In Proceedings of NLPRS’97, pages 95–100, 1997. 190
Y. Niwa, M. Iwayama, T. Hisamitsu, S. Nishioka, A. Takano, H. Sakurai, and O. Imaichi. Interactive Document Search with DualNAVI. In Proceedings of NTCIR’99, pages 123–130, 1999. 188, 189
H. Sakurai, and T. Hisamitsu. A data structure for fast lookup of grammatically connectable word pairs in japanese morphological analysis. In Proceedings of ICCPOL’99, pages 467–471, 1999.
G. Salton, and C. S. Yang. On the Specification of Term Values in Automatic Indexing. Journal of Documentation, 29(4):351–372, 1973. 198
B. R. Schatz, E. H. Johnson, and P. A. Cochrane. Interactive term suggestion for users of digital libraries: Using subject thesauri and co-occurrence lists for information retrieval. In Proceedings of ACM DL’96, pages 126–133, 1996.
A. Singhal, C. Buckley, and M. Mitra. Pivoted Document Length Normalization In Proceedings of ACM SIGIR’96, pages 21–29, 1996. 192
C. J. van Rijsbergen and W. B. Croft. Document clustering: An evaluation of some experiments with the granfield 1400 collection. Information Processing & Management, 11:171–182, 1975. 193
P. Willett. Similarity coefficients and weighting functions for automatic document classification: an empirical comparison. International Classification, 10(3):138–142, 1983. 193
P. Willett. Recent trends in hierarchic document clustering: A critical review. Information Processing & Management, 24(5):577–597, 1988. 194, 195
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Takano, A. et al. (2000). Information Access Based on Associative Calculation. In: Hlaváč, V., Jeffery, K.G., Wiedermann, J. (eds) SOFSEM 2000: Theory and Practice of Informatics. SOFSEM 2000. Lecture Notes in Computer Science, vol 1963. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44411-4_12
Download citation
DOI: https://doi.org/10.1007/3-540-44411-4_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41348-6
Online ISBN: 978-3-540-44411-4
eBook Packages: Springer Book Archive