Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

An Effective and Efficient Approach for Keyword-Based XML Retrieval

  • Conference paper
Advances in Web-Age Information Management (WAIM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3739))

Included in the following conference series:

Abstract

IR-style keyword-based search on XML document has become the most common tool for XML query, as users need not to know the structural information of the target XML document before constructing a query. For a keyword-based search engine for XML document, the key issue is how to return some sets of meaningfully related nodes to user’s query efficiently. An ordinary solution of current approaches is to store the relationship of each pair of nodes in an XML document to an index. Obviously, this will lead to serious storage overhead. In this paper, we propose an enhanced inverted index structure (PN-Inverted Index) that stores path information in addition to node ID, and import and extend the concept of LCA to PLCA. Efficient algorithms with these concepts are designed to check the relationship of arbitrary number of nodes. Compared with existing approaches, our approach need not create additional relationship index but just utilize the existing inverted index that is much common for IR-style keyword search engine. Experimental results show that with the promise of returning meaningful answers, our search engine offers great performance benefits. Although the size of the inverted index is increased, the total size of indices of search engine is smaller than the existing approaches.

Supported by the National Natural Science Foundation of China(60173051), and the Teaching and the Teaching and Research Award Program for Outstanding Young Teachers in Higher Education Institution of the Ministry of Education of China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSearch: a semantic search engine for xml. In: Proc. of VLDB (2003)

    Google Scholar 

  • Li, Y., Yu, C., Jagadish, H.V.: Schema-free XQuery. In: Proc. of VLDB (2004)

    Google Scholar 

  • Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: Proc. of SIGMOD (2001)

    Google Scholar 

  • Kha, D.D., Yoshikawa, M., Uemura, S.: An XML indexing structure with relative region coordinates. In: Proc. of ICDE 2001 (2001)

    Google Scholar 

  • Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proc. of VLDB 2001 (2001)

    Google Scholar 

  • Wang, W., Jiang, H., Lu, H., Yu, J.X.: PBiTree coding and efficient processing of containment joins. In: Proc. of ICDE (2003)

    Google Scholar 

  • Jiang, H., Lu, H., Wang, W., Ooi, B.C.: XR-Tree: indexing xml data for efficient structural joins. In: Proc. of ICDE (2003)

    Google Scholar 

  • Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRank: ranked keyword search over xml documents. In: Proc. of SIGMOD (2003)

    Google Scholar 

  • Berglund, A., Boag, S., Chamberlin, D., Fernandez, M.F., Kay, M., Robie, J., Simeon, J.: XML path language (XPath) 2.0. W3C working draft (2002), Available from http://www.w3.org/TR/xpath20/

  • Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Simeon, J.: XQuery 1.0: an xml query language. W3C working draft (2003), http://www.w3.org/TR/xquery/

  • Schmidt, A., Kersten, M., Windhouwer, M.: Querying xml document made easy: nearest concept queries. In: Proc. of ICDE (2001)

    Google Scholar 

  • Fuhr, N., Grobjoham, K.: XIRQL: a query language for information retrieval in XML document. In: Proc. of SIGIR (2001)

    Google Scholar 

  • Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Å altenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 477. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Cohen, S., Kanza, Y., Kogan, Y., Nutt, W., Sagiv, Y., Serebrenik, A.: EquiX: a search and query language for XML. In: Proc. of JASIST (2002)

    Google Scholar 

  • Choi, B.: What are real dtds like? In: Proc. of the Fifth International Workshop on Web and Database (WebDB) (2002)

    Google Scholar 

  • XMark (2003), http://monetdb.cwi.nl/xml/index.html

  • W3C. XML schema (2003), http://www.w3.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, X., Gong, J., Wang, D., Yu, G. (2005). An Effective and Efficient Approach for Keyword-Based XML Retrieval. In: Fan, W., Wu, Z., Yang, J. (eds) Advances in Web-Age Information Management. WAIM 2005. Lecture Notes in Computer Science, vol 3739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563952_6

Download citation

  • DOI: https://doi.org/10.1007/11563952_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29227-2

  • Online ISBN: 978-3-540-32087-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics