Abstract
Treating documents as bag of words is the norm in Information Filtering. Syntactic and semantic correlations between terms are ignored, or in other words, term independence is assumed. In this paper we challenge this common assumption. We use Nootropia, a user profiling model that uses a sliding window approach to capture term dependencies in a network and a spreading activation process to take them into account for document evaluation. Experiments performed based on TREC’s routing guidelines demonstrate that given an adequate window size the additional information that term dependencies encode, results in improved filtering performance over a traditional bag of words approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ide, N., Veronis, J.: Word sense disambiguation: The state of the art. Computational Linguistics 24, 1–40 (1998)
Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized vector space model in information retrieval. In: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18–25. ACM Press, New York (1985)
Deerwester, S., Dumais, S.T., Landauer, G.W., Hashman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Billhardt, H., Borrajo, D., Maojo, V.: A context vector model for information retrieval. Journal of the American Society for Information Science and Technology 53, 236–249 (2002)
van Rijsbergen, C.J.: A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation 33, 106–199 (1977)
Nallapati, R., Allan, J.: Capturing term dependencies using a language model based on sentence trees. In: 11th International Conference on Information and Knowledge Management (CIKM 2002), pp. 383–390. ACM Press, New York (2002)
Lee, C., Lee, G.G.: Probabilistic information retrieval model for a dependency structured indexing system. Information Processing and Management 45, 161–175 (2005)
Losee, R.M.: Term dependence: Truncating the bahadur-lazarsfeld expansion. Information Processing and Management 30, 293–303 (1994)
Turtle, H., Croft, W.B.: Inference networks for document retrieval. In: 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1–24 (1990)
Park, Y.C., Choi, K.S.: Automatic thesaurus construction using bayesian networks. Information Processing and Management 32, 543–553 (1996)
Cunningham, S., Holmes, G., Littin, J., Beale, R., Witten, I.: Applying connectionist models to information retrieval. In: Amari, S., Kasobov, N. (eds.) Brain-Like Computing and Intelligent Information Systems, pp. 435–457. Springer, Heidelberg (1997)
Belew, R.K.: Adaptive information retrieval: using a connectionist representation to retrieve and learn about documents. In: Belkin, N., Rijsbergen, C. (eds.) 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–20. ACM Press, New York (1989)
Wilkinson, R., Hingston, P.: Using the cosine measure in a neural network for document retrieval. In: 14th Annual Internation ACM SIGIR conference on Research and Development in Information Retrieval, pp. 202–210. ACM Press, New York (1991)
Mothe, J.: Search mechanisms using a new neural network model comparison with the vector space model. In: Intelligent Multimedia Information Retrieval Systems and Management (RIAO 1994), pp. 275–294 (1994)
Wong, S.K.M., Cai, Y.J., Yao, Y.Y.: Computation of term associations by a neural network. In: 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 107–115. ACM Press, New York (1993)
Sanderson, M., Croft, B.W.: Deriving concept hierarchies from text. In: 22nd Annual Internation ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, United States, pp. 206–213. ACM Press, New York (1999)
Anick, P., Tipirneri, S.: The paraphrase search assistant: Terminological feedback for iterative information seeking. In: Hearst, M., Gey, F., Tong, R. (eds.) 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 153–159 (1999)
Widyantoro, D.H., Ioerger, T.R., Yen, J.: An adaptive algorithm for learning changes in user interests. In: ACM/CIKM 1999 Conference on Information and Knowledge Management, Kansas City, MO, pp. 405–412 (1999)
Mostafa, J., Mukhopadhyay, S., Palakal, M., Lam, W.: A multilevel approach to intelligent information filtering: model, system, and evaluation. ACM Transactions on Information Systems (TOIS) 15, 368–399 (1997)
Mladeni’c, D.: Using text learning to help web browsing. In: 9th International Conference on Human-Computer Interaction (HCI International 2001), New Orleans, LA, pp. 893–897 (2001)
Menczer, F., Belew, R.: Adaptive information agents in distributed textual environments. In: 2nd International Conference on Autonomous Agents, Minneapolis, MN, pp. 157–164 (1998)
Sorensen, H., O’ Riordan, A., O’ Riordan, C.: Profiling with the informer text filtering agent. Journal of Universal Computer Science 3, 988–1006 (1997)
McElligott, M., Sorensen, H.: An evolutionary connectionist approach to personal information filtering. In: 4th Irish Neural Networks Conference 1994, University College Dublin, Ireland, pp. 141–146 (1994)
Nanas, N., Uren, V., Roeck, A.D.: A comparative evaluation of term weighting methods in information filtering. In: 4th International Workshop on Natural Language and Information Systems (NLIS 2004), pp. 13–17 (2004)
Nanas, N., Uren, V., De Roeck, A., Domingue, J.: Building and applying a concept hierarchy representation of a user profile. In: 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 198–204. ACM press, New York (2003)
Nanas, N., Uren, V., De Roeck, A., Domingue, J.: Multi-topic information filtering with a single user profile. In: 3rd Hellenic Conference on Artificial Intelligence, pp. 400–409 (2004)
Bruza, P.D., Song, D.: Inferring query models by information flow analysis. In: Proceedings of the 11th International ACM Conference on Information and Knowledge Management (CIKM 2002), pp. 260–269 (2002)
Roeck, A.D., Sarkar, A., Garthwaite, P.H.: Defeating the homogeneity assumption. In: 7th International Conference on the Statistical Analysis of Textual Data (JADT), pp. 282–294 (2004)
Nanas, N., De Roeck, A.: Autopoiesis, the immune system and adaptive information filtering. Natural Computing (2008)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nanas, N., Vavalis, M. (2008). A “Bag” or a “Window” of Words for Information Filtering?. In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds) Artificial Intelligence: Theories, Models and Applications. SETN 2008. Lecture Notes in Computer Science(), vol 5138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87881-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-87881-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87880-3
Online ISBN: 978-3-540-87881-0
eBook Packages: Computer ScienceComputer Science (R0)