Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Articulating information needs in XML query languages

Published: 01 October 2006 Publication History
  • Get Citation Alerts
  • Abstract

    Document-centric XML is a mixture of text and structure. With the increased availability of document-centric XML documents comes a need for query facilities in which both structural constraints and constraints on the content of the documents can be expressed. How does the expressiveness of languages for querying XML documents help users to express their information needs? We address this question from both an experimental and a theoretical point of view. Our experimental analysis compares a structure-ignorant with a structure-aware retrieval approach using the test suite of the INEX XML Retrieval Evaluation Initiative. Theoretically, we create two mathematical models of users' knowledge of a set of documents and define query languages which exactly fit these models. One of these languages corresponds to an XML version of fielded search, the other to the INEX query language.Our main experimental findings are: First, while structure is used in varying degrees of complexity, two-thirds of the queries can be expressed in a fielded-search-like format which does not use the hierarchical structure of the documents. Second, three-quarters of the queries use constraints on the context of the elements to be returned; these contextual constraints cannot be captured by ordinary keyword queries. Third, structure is used as a search hint, and not as a strict requirement, when judged against the underlying information need. Fourth, the use of structure in queries functions as a precision enhancing device.

    References

    [1]
    Benedikt, M., Fan, W., and Kuper, G. 2003. Structural properties of XPath fragments. Theor. Comput. Sci. 336, 1, 3--31.
    [2]
    Blackburn, P., de Rijke, M., and Venema, Y. 2001. Modal Logic. Cambridge University Press, New York.
    [3]
    Carmel, D., Maarek, Y. S., Mandelbrod, M., Mass, Y., and Soffer, A. 2003. Searching XML documents via XML fragments. In Proceedings of the Special Interest Group in Information Retrieval (SIGIR) Conference. 151--158.
    [4]
    Carmel, D., Maarek, Y. S., Mass, Y., Efraty, N., and Landau, G. M. 2002. An extension of the vector space model for querying XML documents via XML fragments. In Proceedings of the Special Interest Group in Information Retrieval (SIGIR) Workshop on XML and Information Retrieval. 14--25.
    [5]
    Fagan, J. 1987. Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and non-syntactic methods. Tech. Rep., Cornell University, Ithaca, NY.
    [6]
    Fuhr, N., Gövert, N., Kazai, G., and Lalmas, M., Eds. 2003. Proceedings of the 1st Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2002).
    [7]
    Fuhr, N. and Großjohann, K. 2001. XIRQL: A query language for information retrieval in XML documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, D. H. Kraft et al., eds. ACM, New York. 172--180.
    [8]
    Fuhr, N. and Großjohann, K. 2004. XIRQL: An XML query language based on information retrieval concepts. ACM Trans. Inf. Syst. 22, 313--356.
    [9]
    Fuhr, N., Lalmas, M., and Malik, S., eds. 2004. INEX 2003 Workshop Proceedings.
    [10]
    Fuhr, N., Lalmas, M., Malik, S., and Kazai, G., Eds. 2006. Advances in XML Information Retrieval and Evaluation: 4th Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2005). Lecture Notes in Computer Science vol. 3977. Springer-Verlag.
    [11]
    Fuhr, N., Lalmas, M., Malik, S., and Szlávik, S., Eds. 2005. Advances in XML Information Retrieval: 3rd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2004). Lecture Notes in Computer Science vol. 3493. Springer-Verlag.
    [12]
    Gottlob, G., Koch, C., and Pichler, R. 2005. Efficient algorithms for processing XPath queries. ACM Trans. Database Syst. 30, 2, 444--491.
    [13]
    Harman, D. 1993. Overview of the first Text REtrieval conference (TREC-1). In Proceedings of the (TREC-1) Text Retrieval Conference.
    [14]
    Hiemstra, D. 2001. Using language models for information retrieval. Ph.D. thesis, University of Twente.
    [15]
    INEX. 2006. INitiative for the Evaluation of XML Retrieval. http://inex.is.informatik.uni-duisburg.de/.
    [16]
    Kamps, J. and Sigurbjörnsson, B. 2006. What do users think of an XML element retrieval system? In Proceedings of the Advances in XML Information Retrieval and Evaluation: 4th Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2005). Lecture Notes in Computer Science vol. 3977. Springer-Verlag.
    [17]
    Kamps, J., Marx, M., de Rijke, M., and Sigurbjörnsson, B. 2005. Structured queries in XML retrieval. In Proceedings of the CIKM Conference. ACM, New York. 2--11.
    [18]
    Kazai, G. and Lalmas, M. 2006. INEX 2005 evaluation measures. In Proceedings of the Advances in XML Information Retrieval and Evaluation: 4th Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2005). Lecture Notes in Computer Science vol. 3977. Springer-Verlag.
    [19]
    Kazai, G., Lalmas, M., and Piwowarski, B. 2004. INEX 2004 relevance assessment guide. In INEX Workshop Pre-Proceedings, N. Fuhr et al., eds. 241--248.
    [20]
    Kurtonina, N. and de Rijke, M. 1999. Expressiveness of concept expressions in first-order description logics. Artif. Intell. 107, 2, 303--333.
    [21]
    Marx, M. and de Rijke, M. 2005. Semantic characterizations of navigational XPath. ACM SIGMOD Record 34, 2, 41--46.
    [22]
    May, W. 1999. Information extraction and integration with Florid: The Mondial case study. Tech. Rep., Universität Freiburg, Institut für Informatik.
    [23]
    Mitra, M., Buckley, C., Singhal, A., and Cardie, C. 1997. An analysis of statistical and syntactic phrases. In Proceedings of the RIAO 5th International Conference Recherche d'Information Assistee par Ordinateur.
    [24]
    O'Keefe, R. A. and Trotman, A. 2004. The simplest query language that could possibly work. In Proceedings of the INEX Workshop. 167--174.
    [25]
    Rasolofo, Y. and Savoy, J. 2003. Term proximity scoring for keyword-based retrieval systems. In Proceedings of the Advances in Information Retrieval 25th European Conference on IR Research. Pisa, Italy. 207--218.
    [26]
    Sigurbjörnsson, B., Kamps, J., and de Rijke, M. 2004a. An element-based approach to XML retrieval. In Proceedings of the INEX Workshop. 19--26.
    [27]
    Sigurbjörnsson, B., Kamps, J., and de Rijke, M. 2004b. Processing content-oriented XPath queries. In Proceedings of the CIKM Conference. ACM, New York. 371--380.
    [28]
    Sigurbjörnsson, B. and Kamps, J. 2006. The effect of structured queries and selective indexing on XML retrieval. In Proceedings of the Advances in XML Information Retrieval and Evaluation: 4th Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2005). Lecture Notes in Computer Science vol. 3977. Springer-Verlag.
    [29]
    Sigurbjörnsson, B., Larsen, B., Lalmas, M., and Maalik, S. 2004c. INEX04 guidelines for topic development. In INEX 2004 Workshop Pre-Proceedings, N. Fuhr et al., eds. 219--236.
    [30]
    Sigurbjörnsson, B. and Trotman, A. 2003. Queries, INEX 2003 working group report. In Proceedings of the 1st Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2002).
    [31]
    Tombros, A., Larsen, B., and Malik, S. 2005. The interactive track at INEX 2004. In Proceedings of the Advances in XML Information Retrieval: 3rd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2004). Lecture Notes in Computer Science vol. 3493. Springer-Verlag. 410--423.
    [32]
    Trotman, A. and Lalmas, M. 2006. The interpretation of CAS. In Proceedings of the Advances in XML Information Retrieval and Evaluation: 4th Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2005). Lecture Notes in Computer Science vol. 3977. Springer-Verlag.
    [33]
    Trotman, A. and Sigurbjörnsson, B. 2005. Narrowed Extended XPath I (NEXI). In Proceedings of the Advances in XML Information Retrieval and Evaluation: 3rd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2004). Lecture Notes in Computer Science vol. 3493. Springer-Verlag. 16--40.
    [34]
    van Benthem, J. 1983. Modal Logic and Classical Logic. Bibliopolis, Napoli.
    [35]
    Vianu, V. 2001. A Web odyssey: from Codd to XML. In Proceedings of the PODS Conference. ACM, New York. 1--15.
    [36]
    Wasserman, S. and Faust, K. 1994. Social Network Analysis. Cambridge University Press.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 24, Issue 4
    October 2006
    138 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/1185877
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2006
    Published in TOIS Volume 24, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Full-text XML querying
    2. XML retrieval
    3. XPath

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media