Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1367497.1367656acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
poster

Sailer: an effective search engine for unified retrieval of heterogeneous xml and web documents

Published: 21 April 2008 Publication History

Abstract

This paper studies the problem of unified ranked retrieval of heterogeneous XML documents and Web data. We propose an effective search engine called Sailer to adaptively and versatilely answer keyword queries over the heterogenous data. We model the Web pages and XML documents as graphs. We propose the concept of pivotal trees to effectively answer keyword queries and present an effective method to identify the top-k pivotal trees with the highest ranks from the graphs. Moreover, we propose effective indexes to facilitate the effective unified ranked retrieval. We have conducted an extensive experimental study using real datasets, and the experimental results show that Sailer achieves both high search efficiency and accuracy, and outperforms the existing approaches significantly.

References

[1]
Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using banks. In ICDE, 2002.
[2]
Jens Graupmann, Ralf Schenkel, and Gerhard Weikum. The spheresearch engine for unified ranked retrieval of heterogeneous xml and web documents. In VLDB, 2005.
[3]
Guoliang Li, Jianhua Feng, Jianyong Wang, and Lizhu Zhou. Efficient keyword search for valuable lcas over xml documents. In CIKM, 2007.
[4]
Guoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, and Lizhu Zhou. EASE: Efficient and Adaptive Keyword Search on Unstructured, Semi-structured and Structured Data. In SIGMOD, 2008.
[5]
Guoliang Li, Jianhua Feng, Jianyong Wang, and Lizhu Zhou. RACE: Finding and Ranking Compact Connected Trees for Keyword Proximity Search over XML Documents. In WWW, 2008.
[6]
Wen-Syan Li, K. Selcuk Candan, Quoc Vu, and Divyakant Agrawal. Retrieving and organizing web pages by `information unit'. In WWW, 2001.

Cited By

View all
  • (2022)NATURAL GUJARATI LANGUAGE INTERFACE TO DIRECT RETRIEVAL FROM DIVERSE INDIAN AGRICULTURE SOURCESTowards Excellence10.37867/TE140308(55-83)Online publication date: 30-Sep-2022
  • (2019)Semantics Based Web Ranking Using a Robust Weight SchemeInternational Journal of Web Portals10.4018/IJWP.201901010411:1(56-72)Online publication date: Jan-2019
  • (2012)Capturing Semantics of Web Page using Weighted TAG- Tree for Information RetrievalInternational Journal of Asian Business and Information Management10.4018/jabim.20121001023:4(7-24)Online publication date: 1-Oct-2012
  • Show More Cited By

Index Terms

  1. Sailer: an effective search engine for unified retrieval of heterogeneous xml and web documents

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '08: Proceedings of the 17th international conference on World Wide Web
    April 2008
    1326 pages
    ISBN:9781605580852
    DOI:10.1145/1367497
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 April 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. keyword search
    2. unified keyword search
    3. web pages
    4. xml

    Qualifiers

    • Poster

    Conference

    WWW '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)NATURAL GUJARATI LANGUAGE INTERFACE TO DIRECT RETRIEVAL FROM DIVERSE INDIAN AGRICULTURE SOURCESTowards Excellence10.37867/TE140308(55-83)Online publication date: 30-Sep-2022
    • (2019)Semantics Based Web Ranking Using a Robust Weight SchemeInternational Journal of Web Portals10.4018/IJWP.201901010411:1(56-72)Online publication date: Jan-2019
    • (2012)Capturing Semantics of Web Page using Weighted TAG- Tree for Information RetrievalInternational Journal of Asian Business and Information Management10.4018/jabim.20121001023:4(7-24)Online publication date: 1-Oct-2012
    • (2012)Keyword-Based Search over Semantic DataSemantic Search over the Web10.1007/978-3-642-25008-8_7(159-192)Online publication date: 28-Jan-2012
    • (2011)KEMBIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2010.15923:7(1035-1049)Online publication date: 1-Jul-2011
    • (2011)An effective 3-in-1 keyword search method over heterogeneous data sourcesInformation Systems10.1016/j.is.2008.08.00136:2(248-266)Online publication date: 1-Apr-2011
    • (2011)Providing built-in keyword search capabilities in RDBMSThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-010-0188-420:1(1-19)Online publication date: 1-Feb-2011
    • (2009)SAILInformation Sciences: an International Journal10.1016/j.ins.2009.06.025179:21(3745-3762)Online publication date: 1-Oct-2009
    • (2008)An effective and versatile keyword search engine on heterogenous data sourcesProceedings of the VLDB Endowment10.14778/1454159.14541981:2(1452-1455)Online publication date: 1-Aug-2008
    • (2008)RaceProceedings of the 17th international conference on World Wide Web10.1145/1367497.1367648(1045-1046)Online publication date: 21-Apr-2008
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media