Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1066157.1066217acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Efficient keyword search for smallest LCAs in XML databases

Published: 14 June 2005 Publication History

Abstract

Keyword search is a proven, user-friendly way to query HTML documents in the World Wide Web. We propose keyword search in XML documents, modeled as labeled trees, and describe corresponding efficient algorithms. The proposed keyword search returns the set of smallest trees containing all keywords, where a tree is designated as "smallest" if it contains no tree that also contains all keywords. Our core contribution, the Indexed Lookup Eager algorithm, exploits key properties of smallest trees in order to outperform prior algorithms by orders of magnitude when the query contains keywords with significantly different frequencies. The Scan Eager variant is tuned for the case where the keywords have similar frequencies. We analytically and experimentally evaluate two variants of the Eager algorithm, along with the Stack algorithm [13]. We also present the XKSearch system, which utilizes the Indexed Lookup Eager, Scan Eager and Stack algorithms and a demo of which on DBLP data is available at http://www.db.ucsd.edu/projects/xksearch. Finally, we extend the Indexed Lookup Eager algorithm to answer Lowest Common Ancestor (LCA) queries.

References

[1]
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, 2002.
[2]
V. Aguilera et al. Querying XML documents in XYleme. In SIGIR Workshop on XML and Information Retrieval, 2000.
[3]
S. Amer-Yahia, S. Cho, and D. Srivastava. Tree pattern relaxation. In EDBT, 2002.
[4]
BerkeleyDB. http://www.sleepycat.com/.
[5]
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, 2002.
[6]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998.
[7]
Z. Chen, H. Jagadish, F. Korn, and N. Koudas. Counting twig matches in a tree. In ICDE, 2001.
[8]
S. Cohen, J. Namou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. In VLDB, 2003.
[9]
D. Florescu, D. Kossmann, and I. Manolescu. Integrating keyword search into XML query processing. In WWW9, 2000.
[10]
N. Fuhr and K. Grojohann. XIRQL: A Query Language for Information Retrieval in XML documents. In SIGIR, 2001.
[11]
H. Garcia-Molina, J. Ullman, and J. Widom. Database System Implementation. Prentice-Hall, 2000.
[12]
R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity Search in Databases. In VLDB, 1998.
[13]
L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked keyword search over XML documents. In SIGMOD, 2003.
[14]
V. Hristidis, N. Koudas, Y. Papakonstantinou, and D. Srivastava. Keyword Proximity Search in XML Trees. Available at http://www.db.ucsd.edu/publications/treeproximity.pdf.
[15]
V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In VLDB, 2002.
[16]
V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on XML graphs. In ICDE, 2003.
[17]
Q. Li and B. Moon. Indexing and Querying XML data for regular path expressions. In VLDB, 2001.
[18]
Y. Li, C. Yu, and H. V. Jagadish. Schema-free xquery. In VLDB, 2004.
[19]
J. Naughton et al. The Niagara Internet Query System. IEEE Data Engineering Bulletin, 24(2):27--33, 2001.
[20]
B. Schieber and U. Vishkin. On finding lowest common ancestors: Simplification and parallelization. SIAM J. Computing, 17(6):1253--1262, 1988.
[21]
A. Schmidt, M. L. Kersten, and M. Windhouwer. Querying XML documents made easy: Nearest concept queries. In ICDE, 2001.
[22]
D. Srivastava et al. Structural joins: A primitive for efficient XML query pattern matching. In ICDE, 2002.
[23]
I. Tatarinov, S. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and querying ordered XML using a relational database system. In SIGMOD, 2002.
[24]
A. Theobald and G. Weikum. Adding relevance to XML. In WebDB, 2000.
[25]
A. Theobald and G. Weikum. The index-based XXL search engine for querying XML data with relevance ranking. In EDBT, 2002.
[26]
Z. Wen. New algorithms for the LCA problem and the binary tree reconstruction problem. Information Processing. Lett, 51(1): 11--16, 1994.
[27]
XYZFind. http://www.searchtools.com/tools/xyzfind.html.

Cited By

View all
  1. Efficient keyword search for smallest LCAs in XML databases

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
    June 2005
    990 pages
    ISBN:1595930604
    DOI:10.1145/1066157
    • Conference Chair:
    • Fatma Ozcan
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS05
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Temporal JSON Keyword SearchProceedings of the ACM on Management of Data10.1145/36549802:3(1-27)Online publication date: 30-May-2024
    • (2024)Building an open-source collaborative platform for migration researchKnowledge-Based Systems10.1016/j.knosys.2024.111823299:COnline publication date: 18-Oct-2024
    • (2023)Keyword Coupling Query of Spatiotemporal XML DataUncertain Spatiotemporal Data Management for the Semantic Web10.4018/978-1-6684-9108-9.ch012(211-226)Online publication date: 15-Dec-2023
    • (2021) Adaptive query relaxation and top‐ k result sorting of fuzzy spatiotemporal data based on XML International Journal of Intelligent Systems10.1002/int.22781Online publication date: 13-Dec-2021
    • (2020)A survey of typical attributed graph queriesWorld Wide Web10.1007/s11280-020-00849-024:1(297-346)Online publication date: 20-Nov-2020
    • (2019)Review on Keyword Search and Ranking Techniques for Semi-Structured DataKnowledge-Intensive Economies and Opportunities for Social, Organizational, and Technological Growth10.4018/978-1-5225-7347-0.ch013(248-270)Online publication date: 2019
    • (2019)A Crowdsourcing-based Approach for Efficient XML Keyword Search2019 5th International Conference on Web Research (ICWR)10.1109/ICWR.2019.8765281(16-21)Online publication date: Apr-2019
    • (2019)Application Research of XML Parsing Technology Based on AndroidJournal of Physics: Conference Series10.1088/1742-6596/1314/1/0122131314(012213)Online publication date: 6-Nov-2019
    • (2019)XPloreRankWorld Wide Web10.1007/s11280-018-0630-x22:4(1727-1750)Online publication date: 1-Jul-2019
    • (2018)A structure-based approach of keyword querying for fuzzy XML dataInternational Journal of Knowledge-based and Intelligent Engineering Systems10.3233/KES-18037922:2(125-140)Online publication date: 25-Jul-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media