Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1099554.1099559acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Structured queries in XML retrieval

Published: 31 October 2005 Publication History

Abstract

Document-centric XML is a mixture of text and structure. With the increased availability of document-centric XML content comes a need for query facilities in which both structural constraints and constraints on the content of the documents can be expressed. How does the expressiveness of languages for querying XML documents help users to express their information needs? We address this question from both an experimental and a theoretical point of view. Our experimental analysis compares a structure-ignorant with a structure-aware retrieval approach using the test-suite of the 2004 edition of the INEX XML retrieval evaluation initiative. Theoretically, we create mathematical models of users' knowledge of a set of documents and define query languages which exactly fit these models. One of these languages corresponds to an XML version of fielded search, the other to the INEX query language. Our main findings are: First, while structure is used in varying degrees of complexity, over half of the queries can be expressed in a fielded-search like format which does not use the hierarchical structure of the documents. Second, structure is used as a search hint, and not a strict requirement, when judged against the underlying information need. Third, the use of structure in queries functions as a precision enhancing device.

References

[1]
M. Benedikt, W. Fan, and G. Kuper. Structural properties of XPath fragments. In Proc. ICDT, 2003.
[2]
P. Blackburn, M. de Rijke, and Y. Venema. Modal Logic. Cambridge University Press, 2001.
[3]
D. Carmel, Y. S. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer. Searching XML documents via XML fragments. In Proc. SIGIR, pages 151--158, 2003.
[4]
D. Carmel, Y. S. Maarek, Y. Mass, N. Efraty, and G. M. Landau. An extension of the vector space model for querying XML documents via XML fragments. In Proceedings SIGIR 2002 Workshop on XML and Information Retrieval, pages 14--25, 2002.
[5]
J. Fagan. Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and nonsyntactic methods. Technical report, Cornell University, 1987.
[6]
N. Fuhr, M. Lalmas, S. Malik, and Z. Szlávik, editors. INEX 2004 Workshop Pre-Proceedings, 2004.
[7]
G. Gottlob, C. Koch, and R. Pichler. Efficient algorithms for processing XPath queries. In VLDB'02, 2002.
[8]
D. Harman. Overview of the first Text REtrieval Conference (TREC-1). In Proc. TREC-1, 1993.
[9]
INEX. INitiative for the Evaluation of XML Retrieval, 2004. http://inex.is.informatik.uni-duisburg.de:2004/.
[10]
G. Kazai, M. Lalmas, and B. Piwowarski. INEX 2004 relevance assessment guide. In Fuhr et al. {6}, pages 241--248.
[11]
N. Kurtonina and M. de Rijke. Expressiveness of concept expressions in first-order description logics. Artificial Intelligence, 107(2):303--333, 1999.
[12]
M. Marx and M. de Rijke. Semantic Characterizations of Navigational XPath. ACM SIGMOD Record, 34(2):41--46, 2005.
[13]
W. May. Information extraction and integration with Florid: The Mondial case study. Technical report, Universität Freiburg, Institut für Informatik, 1999.
[14]
M. Mitra, C. Buckley, A. Singhal, and C. Cardie. An analysis of statistical and syntactic phrases. In Proc. RIAO-97, 1997.
[15]
R. A. O'Keefe and A. Trotman. The simplest query language that could possibly work. In Proceedings of the 2nd INEX Workshop, 2004.
[16]
J. Ponte. Language models for relevance feedback. In W. Croft, editor, Advances in Information Retrieval, chapter 3, pages 73--96. Kluwer, 2000.
[17]
Y. Rasolofo and J. Savoy. Term proximity scoring for keyword-based retrieval systems. In Proc. ECIR 2003), pages 207--218, 2003.
[18]
B. Sigurbjörnsson, J. Kamps, and M. de Rijke. The University of Amsterdam at INEX 2004. In Fuhr et al. {6}, pages 104--109.
[19]
B. Sigurbjörnsson, J. Kamps, and M. de Rijke. Processing content-oriented XPath queries. In Proc. CIKM 2004, pages 371--380. ACM Press, 2004.
[20]
B. Sigurbjörnsson, B. Larsen, M. Lalmas, and S. Maalik. INEX04 guidelines for topic development. In Fuhr et al. {6}, pages 219--236.
[21]
B. Sigurbjörnsson and A. Trotman. Queries, INEX 2003 working group report. In Proceedings of the 2nd INEX Workshop, 2004.
[22]
A. Tombros, B. Larsen, and S. Malik. The interactive track at INEX 2004. In Fuhr et al. {6}, pages 24--29.
[23]
A. Trotman and B. Sigurbjörnsson. Narrowed Extended XPath I (NEXI). In Fuhr et al. {6}, pages 219--236.
[24]
V. Vianu. A Web odyssey: from Codd to XML. In Proc. PODS, pages 1--15. ACM Press, 2001. ISBN 1-58113-361-8.
[25]
S. Wasserman and K. Faust. Social Network Analysis. Cambridge University Press, 1994.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management
October 2005
854 pages
ISBN:1595931406
DOI:10.1145/1099554
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML retrieval
  2. XPath
  3. full-text XML querying

Qualifiers

  • Article

Conference

CIKM05
Sponsor:
CIKM05: Conference on Information and Knowledge Management
October 31 - November 5, 2005
Bremen, Germany

Acceptance Rates

CIKM '05 Paper Acceptance Rate 77 of 425 submissions, 18%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)3SEPIASInformation Sciences: an International Journal10.1016/j.ins.2012.06.013218(31-50)Online publication date: 6-Jan-2019
  • (2018)XCDSearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2009.21022:12(1781-1796)Online publication date: 31-Dec-2018
  • (2018)BM25t: a BM25 extension for focused information retrievalKnowledge and Information Systems10.1007/s10115-011-0426-032:1(217-241)Online publication date: 29-Dec-2018
  • (2016)Personalized information retrieval models integrating the user's profile2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS)10.1109/RCIS.2016.7549310(1-9)Online publication date: Jun-2016
  • (2013)Selection fusion in semi-structured retrievalProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505686(1291-1300)Online publication date: 27-Oct-2013
  • (2012)A study of relevance feedback techniques in interactive multilingual information accessLibrary Hi Tech10.1108/0737883121126664530:3(523-544)Online publication date: 31-Aug-2012
  • (2010)Searching cultural heritage dataAdaptivity, Personalization and Fusion of Heterogeneous Information10.5555/1937055.1937094(152-155)Online publication date: 28-Apr-2010
  • (2010)Clinical information retrieval using document and PICO structureHuman Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics10.5555/1857999.1858123(822-830)Online publication date: 2-Jun-2010
  • (2010)Updating XML views and querying XML views with update syntaxInternational Journal of Computational Science and Engineering10.1504/IJCSE.2010.0368225:2(118-126)Online publication date: 1-Nov-2010
  • (2010)Exploiting Semantic Tags in XML RetrievalFocused Retrieval and Evaluation10.1007/978-3-642-14556-8_15(133-144)Online publication date: 2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media