Abstract
A huge amount of electronic documents has created the demand of intelligent access to their information. Document retrieval has been investigated for providing a fundamental tool for the demand. However, it is not satisfactory due to (1) inaccuracies of retrieving long documents with short queries (a few terms), (2) a user’s burden on finding relevant parts from retrieved long documents. In this paper, we apply a passage retrieval method called “density distributions” (DD) to tackle these problems. For the first problem, it is experimentally shown that a passage-based method outperforms conventional document retrieval methods if long documents are retrieved with short queries. For the second problem, we apply DD to the question answering task: locating short passages in response to natural language queries of seeking facts. Preliminary experiments show that correct answers can be located within a window of 50 terms for about a half of such queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Callan, J.P.: Passage-level evidence in document retrieval. In: Proc. SIGIR 1994, pp. 302–310 (1994)
Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proc. SIGIR 1993, pp. 49–58 (1993)
Salton, G., Singhal, A., Mitra, M.: Automatic text decomposition using text segments and text themes. In: Proc. Hypertext 1996, pp. 53-65 (1996)
Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: Proc. SIGIR 1997, pp. 178–185 (1997)
de Kretser, O., Moffat, A.: Effective document presentation with a locality-based similarity heuristic. In: Proc. SIGIR 1999, pp. 113–120 (1999)
Mochizuki, H., Iwayama, M., Okumura, M.: Passage-level document retrieval using lexical chains. In: RIAO 2000, pp. 491–506 (2000)
Kise, K., Mizuno, H., Yamaguchi, M., Matsumoto, K.: On the use of density distribution of keywords for automated generation of hypertext links from arbitrary parts of documents. In: Proc. ICDAR 1999, pp. 301–304 (1999)
Kise, K., Junker, M., Dengel, A., Matsumoto, K.: Experimental evaluation of passage-based document retrieval. In: Proc. ICDAR 2001, pp. 592–596 (2001)
Kise, K., Junker, M., Dengel, A., Matsumoto, K.: Passage-based document retrieval as a tool for text mining with user’s information needs. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 155–169. Springer, Heidelberg (2001)
Kise, K., Junker, M., Dengel, A., Matsumoto, K.: Effectiveness of passage-based document retrieval for short queries. Trans. IEICE, Japan (2003) (to appear)
Voorhees, E.M., Tice, D.M.: The TREC-8 question answering track evaluation. In: Proc. TREC-8 (1999), available at http://trec.nist.gov/pubs/trec8/t8proceedings.html
Kurohashi, S., Shiraki, N., Nagao, M.: A Method for detecting important descriptions of a word based on its density distribution in text. Trans. Information Processing Society of Japan 38(4), 845–853 (1997) [in Japanese]
Kozima, H., Furugori, T.: Segmenting narrative text into coherent scenes. Literary and Linguistic Computing 9(1), 13–19 (1994)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Pub. Co., Reading (1999)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Berry, B., Drmac, Z., Jessup, E.: Matrices, vector spaces, and information retrieval. SIAM Review 41(2), 335–362 (1999)
Voorhees, E.M., Buckley, C.: The effect of topic set size on retrieval experiment error. In: Proc. SIGIR 2002, pp.316–323 (2002)
Hull, D.: Using statistical testing in the evaluation of retrieval experiments. In: Proc. SIGIR 1993, pp.329–338 (1993)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proc. SIGIR 1999, pp.42–49 (1999)
Bikel, D.M., Schwartz, R.L., Weischedel, R.M.: An algorithm that learns what’s in a name. Machine Learning 34(1-3), 211–231 (1999)
Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proc. NAACL 2001, pp. 192–199 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kise, K., Junker, M., Dengel, A., Matsumoto, K. (2004). Passage Retrieval Based on Density Distributions of Terms and Its Applications to Document Retrieval and Question Answering. In: Dengel, A., Junker, M., Weisbecker, A. (eds) Reading and Learning. Lecture Notes in Computer Science, vol 2956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24642-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-24642-8_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21904-0
Online ISBN: 978-3-540-24642-8
eBook Packages: Springer Book Archive