Abstract
In this paper, a data mining approach for query refinement is proposed using Association Rules (ARs) among keywords being extracted from a document database. When a query is under-specified or contains ambiguous keywords, a set of association rules will be displayed to assist the user to choose additional keywords in order to refine his/her original query. To the best of our knowledge, no reported study has discussed on how to screen the number of documents being retrieved using ARs. The issues we are concerned in this paper are as follows. First, an AR, X ⟹ Y, with high confidence will intend to show that the number of documents that contain both sets of keywords X and Y is large. Therefore, the effectiveness of using minimum support and minimum confidence to screen documents can be little. To address this issue, maximum support and maximum confidence are used. Second, a large number of rules will be stored in a rule base, and will be displayed at run time in response to a user query. In order to reduce the number of rules, in this paper, we introduce two co-related concepts: “stem rule” and “coverage”. The stem rules are the rules by which other rules can be derived. A set of keywords is said to be a coverage of a set of documents if these documents can be retrieved using the same set of keywords. A minimum coverage can reduce the number of keywords to cover a certain number of documents, and therefore can assist to reduce the number of rules to be managed. In order to demonstrate the applicability of the proposed method, we have built an interactive interface, and a mediumsized document database is maintained. The effectiveness of using ARs to screen will be addressed in this paper as well.
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal, T.Imielinski and A.Swami: Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD'93, pp.207–216, Washington, DC, USA.
J.Allan: Relevance Feedback With Too Much Data. ACM SIGIR'95, pp.337–343, Seattll, WA, USA.
T. Andreasen, H. L. Larsen, & H. Christiansen: Term Associations and Flexible Querying. Proc. FQAS'98, International Conference on Flexible Query Answering Systems, May 13–15, 1998, Roskilde, Danmark. Lecture Notes in Artificial Intelligence, Springer-Verlag 1998 (this volume).
C. Buckley et al. Automatic query expansion using SMART: TREC 3. In D. K. Harman ed. Overview of the 3rd Text REtrieval Conference. NIST Special Publication, 1995.
C.M.Chen and N.Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. ACM SIGMOD'94, pp.161–172, Minneapolis, Minnesota, USA.
H. Chen, Y. Liu & N. Ohbo: Keyword Document Retrieval by Data Mining. IPSJ SIG Notes, Vol.97(64), pp.227–232, Sapporo, Japan, 1997 (in Japanese)
U.Fayyad, G.Piatestsky & P.Smyth: From Data Mining to Knowledge Discovery in Databases. The 3rd Knowledge Discovery and Data Mining, pp.37–53, California, USA, 1996.
J.Han and Y.Fu: Discovery of Multiple-Level Association Rules from Large Databases. 21st VLDB, pp.420–431, Zurich, Swizerland, 1995.
M. Nagao et al. ed. Encyclopedic Dictionary of Computer Science. ISBN4-00-080074-4, pp.215, 1990(in Japanese).
H.J. Peat and P. Willett: The Limitations of Term Co-Occurrence Data for Data for Query Expansion in Document Retrieval Systems. Journal of The American Society for Information Science, vol.42(5), pp.378–383, 1991.
A.Savasere, E.Omiecinski and S.Navathe: An Efficient Algorithm for Mining Association Rules in Large Databases. 21st VLDB, pp.432–444, Zurich, Swizerland, 1995.
G. Salton and C. Buckley: Improving Retrieval Performance By Relevance Feedback. Journal of The American Society for Information Science, vol.41(4), pp.288–297, 1990.
R.Srikant and R.Agrawal: Mining Quantitative Association Rules in Large Relational Tables. ACM SIGMOD'96, pp.1–12, Montreal, Canada, 1996.
Jinxi Xu and W.Bruce Croft: Query Expansion Using Local and Global Document Analysis. ACM SIGIR '96, pp.4–11, Zurich, Switzerland, 1996.
B. Vélez, et al: Fast and Effective Query Refinement. ACM SIGIR'97, pp.6–15, Philadelphia, PA, USA 1997.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, Y., Chen, H., Yu, J.X., Ohbo, N. (1998). Using stem rules to refine document retrieval queries. In: Andreasen, T., Christiansen, H., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 1998. Lecture Notes in Computer Science, vol 1495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0056006
Download citation
DOI: https://doi.org/10.1007/BFb0056006
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65082-9
Online ISBN: 978-3-540-49655-7
eBook Packages: Springer Book Archive