Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/WIIAT.2008.346acmconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
Article

Integrating Structure in the Probabilistic Model for Information Retrieval

Published: 09 December 2008 Publication History

Abstract

In databases or in the World Wide Web, many documents are in a structured format (e.g. XML). We propose in this article to extend the classical IR probabilistic model in order to take into account the structure through the weighting of tags. Our approach includes a learning step in which the weight of each tag is computed. This weight estimates the probability that the tag distinguishes the terms which are the most relevant. Our model has been evaluated on a large collection during INEX IR evaluation campaigns.

References

[1]
R. Baeza-Yates and B. Ribeiro-Neto. Modern information retrieval. Addison-Wesley, 1999.
[2]
J. Boyan, D. Freitag, and T. Joachims. A machine learning architecture for optimizing web search engines. In AAAI Workshop on Internet-Based Information Systems, 1996.
[3]
L. Denoyer and P. Gallinari. The wikipedia XML corpus. In SIGIR forum, volume 40, pages 64-69, 2006.
[4]
N. Fuhr and K. Grosjohann. XIRQL: A query language for information retrieval in XML documents. In Research and Development in Information Retrieval, 2001.
[5]
N. Fuhr and K. Großjohann. XIRQL: An extension of XQL for information retrieval, July 2000. In ACM SIGIR, Workshop On XML and Information Retrieval, Athens, Greece.
[6]
M. Fuller, E. Mackie, R. Sacks-Davis, and R. Wilkinson. Coherent answers for a large structured document collection. In SIGIR, pages 204-213, 1993.
[7]
M. Géry, C. Largeron, and F. Thollard. UJM at INEX 2007: document model integrating XML tags. In Proc. of INitiative for the Evaluation of XML Retrieval, Dagstuhl, 2008.
[8]
J. Kamps, M. Marx, M. de Rijke, and B. Sigurbjörnsson. Structured queries in XML retrieval. In CIKM, 2005.
[9]
J. Kamps, J. Pehcevski, G. Kazai,M. Lalmas, and S. Robertson. INEX 2007 evaluation measures. In N. Fuhr, M. Lalmas, A. Trotman, and J. Kamps, editors, Focused access to XML documents, 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, December 2007.
[10]
Y.-H. Kim, S. Kim, J.-H. Eom, and B.-T. Zhang. Scai experiments on trec-9. In Proc. of the Text Retrieval Conference (TREC-9), pages 392-399, 2000.
[11]
D. Konopnicki and O. Schmueli. W3qs : A query system for the world-wide web. In 21st Intl. Conf. on Very Large Data Bases (VLDB'95), pages 54-65, Sept. 1995.
[12]
E. Kotsakis. Structured information retrieval in XML documents, 2002. in 17th ACM Symposium on Applied Computing, pp 663-667, 2002.
[13]
W. Lu, S. E. Robertson, and A. MacFarlane. Field-weighted XML retrieval based on bm25. In INEX, 2005.
[14]
G. Navarro and R. A. Baeza-Yates. A language for queries on structure and contents of textual. In SIGIR, 1995.
[15]
K. Pinel-Sauvagnat, L. Hlaoua, and M. Boughanem. XFIRM at INEX 2005: adhoc and relevance feedback tracks. In INitiative for the Evaluation of XML Retrieval (INEX), Dagstuhl, pages 88-103, 2005.
[16]
J. Rapela. Automatically combining ranking heuristics for html documents. In 3rd International Workshop on Web Information and Data Management (WIDM) In Conjunction with ACM CIKM, pages 61-67, 2001.
[17]
S. Robertson and K. S. Jones. Relevance weighting of search terms. Journal of the American Society for Information Sciences , 27(3):129-146, 1976.
[18]
S. Robertson, H. Zaragoza, and M. Taylor. Simple BM25 extension to multiple weighted fields. In CIKM '04, pages 42-49, New York, NY, USA, 2004.
[19]
G. Salton and M. McGill. Introduction to modern Information Retrieval. McGraw-Hill, 1983.
[20]
T. Schlieder and H. Meuss. Querying and ranking XML documents. JASIST journal, 53(6):489-503, 2002.
[21]
J. Swets. Information retrieval systems. Science, 141:245- 250, 1963.
[22]
A. Trotman. Choosing document structure weights. Information Processing and Management, 41(2):243-264, 2005.
[23]
R. Wilkinson. Effective retrieval of structured documents. In Research and Development in Information Retrieval, 1994.
[24]
J. E. Wolff, H. Florke, and A. B. Cremers. Searching and browsing collections of structural information. In Advances in Digital Libraries, pages 141-150, 2000.

Cited By

View all
  • (2010)ENSM-SE and UJM at INEX 2010Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval10.5555/2040369.2040373(44-53)Online publication date: 13-Dec-2010
  • (2010)Overview of the INEX 2010 ad hoc trackProceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval10.5555/2040369.2040371(1-32)Online publication date: 13-Dec-2010
  • (2009)UJM at INEX 2009 ad hoc trackProceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval10.5555/1881065.1881077(88-94)Online publication date: 7-Dec-2009

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WI-IAT '08: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
December 2008
963 pages
ISBN:9780769534961

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 09 December 2008

Check for updates

Author Tags

  1. XML
  2. probabilistic model
  3. structure
  4. tags

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2010)ENSM-SE and UJM at INEX 2010Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval10.5555/2040369.2040373(44-53)Online publication date: 13-Dec-2010
  • (2010)Overview of the INEX 2010 ad hoc trackProceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval10.5555/2040369.2040371(1-32)Online publication date: 13-Dec-2010
  • (2009)UJM at INEX 2009 ad hoc trackProceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval10.5555/1881065.1881077(88-94)Online publication date: 7-Dec-2009

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media