Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/96749.98008acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free access

Probabilistic document indexing from relevance feedback data

Published: 01 December 1989 Publication History

Abstract

Based on the binary independence indexing model, we apply three new concepts for probabilistic document indexing from relevance feedback data:
Abstraction from specific terms and documents, which overcomes the restriction of limited relevance information for parameter estimation.
Flexibility of the representation, which allows the integration of new text analysis and knowledge-based methods in our approach as well as the consideration of more complex document structures or different types of terms (e.g. single words and noun phrases).
Probabilistic learning or classification methods for the estimation of the indexing weights making better use of the available relevance information.
We give experimental results for five test collections which show improvements over other indexing methods.

References

[1]
Beinke-Geiser, U.; Lustig, G.; Putze-Meier, G. (1986). Indexieren mit dem System DAISY. In: Lustig, G. (ed.) : Au#ornatische Indezierung zwischen Forschung und Anwendung, pages 73-97. Olms, Hildesheim.
[2]
Biebricher, P.; Fuhr, N.; Knorz, G.; Lustig, G.; Schwantner, b#I. (1988). The Automatic Indexing System AII#/PHYS -- from Research to Application. In: Chiaxamella, Y. (ed.) : llth International Conference on Research and Development in Information Retrieval, pages 333-342. Presses Universitaires de Grenoble, Grenoble, Fremee.
[3]
Chow, C. K.; Liu, C. N. (1968). Approximating Discrete Probability Distributions with Dependence Trees. 1EEE Transactions on Information Theor#t 14(3), pages 462--467.
[4]
Croft, W. B. (1081). Document Representation in Probabilistic Models of Information Retrieval. Journal of the American Socie# for lnforraa#ion Science 3P, pages 451-457.
[5]
Croft, W. B. (1083). Experiments with Representation,in a Document Retrieval System. Information Technology: Research and Development #, pages i-22.
[6]
Croft, W. 13. (1986). Boolean Queries and Term Dependencies in Probabilistic Retrieval Models. Journal of the American Society for Information Science 37(#), pages 71-77.
[7]
Fagan, J. (1987). Automatic Phrase Indexing for Document Retrieval. In: Yu, C. T.; van R#sbergen, C. J. (ed.) : Proceedings of the Tenth Annual A CM SIGIR Conference on Research Development in Information Retrieval, pages 91-101.
[8]
Fagan, J. L. (1989). The Effectiveness of a Nonsyntactic Approach to Automatic Phrase Indexing for Document Retrieval. Journal of the American Society for Information Science 40(E), pages 115-132.
[9]
Faiflt, S. (1990). Developmen~ of Indezing Functions Based on Probabilistic Decision TPees (in German). Diploma thesis, TH Darmstadt, FB Informatik, Datenverwaltungssysteme II.
[10]
Freeman, D. H. (1987). Applied Categarial Data Analysis. Dekker, New York.
[11]
Fuhr, N. (1988). Probabilistisches lndexing nnd Retrieval. Dissertation, TH Darmstadt, Faehbereich Informatik.
[12]
#-kahr, N. (1989a). Models for Retrieval with Probabilistie Indexing. Information Processing and Management P5(1), pages 55-72.
[13]
Fuhr, N. (1989b). Optimum Polynomial Retrieval Functions. In: Belkin, N.; van Rijsbergen, C. J. (ed#.) : Proceedings of $he Twelfth Annual International A CMSIGIR Conference on Research and Development in Information Retrieval, pages 69-76. ACM, New York.
[14]
-#ahr, N. (1989c). Optimum Polynomial Retrieval Functions Based on the Probability Ranking Principle. A CM 7%ansactions on Information S71stems 7(3), pages 183-204.
[15]
Gordon, M. (1988). Probabilistic and Genetic Algorithms for Document Retrieval. Communications of lhe A CM 271(10), pages 1208-1218.
[16]
Knorz, G. (1983). Automatisches lndezieren als Erkennen abstrakter Objekte. Niemeyer, Ttibingen.
[17]
Kwok, K. L.; Ku-n, W. (1988). Experiments with Document Components for Indexing and Retrieval. Information Processing and Management I#4(4), pages 405--417.
[18]
Kwok, K. L. (1986). An Interpretation of Index Term Weighting Schemes Based on Document Components. In: Rabitti, F. (ed.) : Proceedin#s of the 1986 A CM Conference on Research and Development in Information Retrieval, pages 275-283. ACM, New York.
[19]
Kwok, K. L. (1989). A Neural Network for Probabilistie Information Retrieval. In: Belkin, N.; van Rijsbergen, C. J. (ed.) : Proceedings of the Twelfth Annual International A CMSIGIR Conference on Research and Development in Information Retrieval, pages 21-30. ACM, New York.
[20]
Maron, M. E.; Kuhns, J. L. (1960). On Relevance, Probabilistic Indexing, and Information Retrieval. journal of the A CM 7, pages 216-244.
[21]
Maron, M. E. (1983). Probabilistic Approaches to the Document Retrieval Problem. In: Salton, G.; Schneider, H.-J. (ed.) : Research and Development in Information Retrieval, pages 98-107. Springer, Berlin ct al.
[22]
Pfeifer, U. (1990). Development of Log-Linear and Linear-Iterative Indexing Functions (in German). Diploma thesis, TH Darmstadt, FB Informatik, Datenverwaltungssysteme If.
[23]
Quinlan, J. R. (1986). The Effect of Noise on Concept Learning. In: Michalski, R. S.; Carbonell, $. G.; Mitchell, T. M. (ed.) : Machine Learning: An Artificial Intelligence Approach, Vol. 11, pages 149-166. Morgan Kaufmann, Los Altos, California.
[24]
van Rijsbergen, C. J. (1977). A Theoretical Basis for the Use of Cx>-Occurrcnce Data in Information Retrieval. journal of Documentation 33, pages 106-119.
[25]
Robertson, S. E.; Van Rijsbergen, C. J.; Porter, M. F. (1981). Probabilistic Models of Indexing and Searching. In: Oddy, It. N.; Robertson, S. E.; Van Rijsbergen, C. J.; Williams, P. W. (ed.) : Inforbnation Retrieval Research, pages 35-56. Butterworths, London.
[26]
l#obertson, S. E.; Maron, M. F.; Cooper, W. S. (1982). Probability of Relevance: A Unification of Two Competing Models for Document Retrieval. Information Technology: Research and Developrnen# 1, pages 1-21.
[27]
Salton, G.; Buckley, C. (1988). Term Weighting Approaches in Automatic Text Retrieval. Information Processing and Management #j(5), pages 513-523.
[28]
Salton, G.; Yang, C. S.; Yu, C. T. (1975). A Theory of Term Irnportwaee in Automatic Text Analysis. Journal of the American Society for Information Science 36, pages 33--44.
[29]
Smeaton, A. F. (1986). Incorporating Syntactic Information into a Document Retrieval Strategy: an Investigation. In" 91h International Conference on Research g_4 Development in lnforrnation Retrieval, pages 103-113. ACM, New York.
[30]
Tietze, A. (1989). Approximation of Discrete Probability Distributions by Dependence Trees and #heir Application as lndezin# Functions (in German). Diploma thesis, TH Darmstadt, FB tnformatik, Datenverwaltungssysteme iI.
[31]
Wong, A. K. C.; Chiu, D. K. Y. (1987). Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data. IEEE 2#nsactions on Pattern Analysis and Machine Intelligence 9(6), pages 796-805.
[32]
Wong, S. K. M.; Yao, Y. Y. (1989). A Probability Distribution Model for Information Retrieval. Information Processing and Management #5(1), pages 39-53.
[33]
Yu, C. T.; Mizuno, H. (1988). Two Learning Schemes in Information Retrieval. In: Chiaramella, Y. (ed.) : 11th International Conference on Research # Development in Information Retrieval, pages 201-218. Presses Universitaires de Grenoble, Grenoble, France.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '90: Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
December 1989
509 pages
ISBN:0897914082
DOI:10.1145/96749
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 1989

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGIR'90
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)210
  • Downloads (Last 6 weeks)41
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2013)Towards graphical models for text processingKnowledge and Information Systems10.1007/s10115-012-0552-336:1(1-21)Online publication date: 1-Jul-2013
  • (2011)The optimum clustering framework: implementing the cluster hypothesisInformation Retrieval10.1007/s10791-011-9173-915:2(93-115)Online publication date: 12-Jul-2011
  • (2011)Text Mining in Social NetworksSocial Network Data Analytics10.1007/978-1-4419-8462-3_13(353-378)Online publication date: 17-Mar-2011
  • (2006)Documents and queries as random variables: History and implicationsJournal of the American Society for Information Science and Technology10.1002/asi.2037857:9(1138-1154)Online publication date: 8-May-2006
  • (2004)Improving document representations using relevance feedbackProceedings of the thirteenth ACM international conference on Information and knowledge management10.1145/1031171.1031230(270-278)Online publication date: 13-Nov-2004
  • (1997)Uncertainty in Information Retrieval SystemsUncertainty Management in Information Systems10.1007/978-1-4615-6245-0_7(189-224)Online publication date: 1997
  • (1995)Text retrieval in the legal worldArtificial Intelligence and Law10.1007/BF008776943:1-2(5-54)Online publication date: 1-Mar-1995
  • (1992)Information filtering and information retrievalCommunications of the ACM10.1145/138859.13886135:12(29-38)Online publication date: 1-Dec-1992
  • (1992)A system for retrieving speech documentsProceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval10.1145/133160.133194(168-176)Online publication date: 1-Jun-1992
  • (1992)An evaluation of phrasal and clustered representations on a text categorization taskProceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval10.1145/133160.133172(37-50)Online publication date: 1-Jun-1992
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media