Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2396761.2398581acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Diversifying query results on semi-structured data

Published: 29 October 2012 Publication History

Abstract

Queries on the web can easily result in a large number of results. Result Diversification, a process by which the query provides the k most diverse set of matches, enables the user to better understand/explore such large results. Computing the diverse subset from a large set of results needs a massive number of pair-wise distance computations as well as finding the subset that maximizes the total pair-wise distance, which is NP-hard and requires efficient approximate algorithm.
The problem becomes more difficult when querying semi-structured data, since diversity can occur not only in the document content but also (and more importantly) in the document structure; thus one needs to efficiently measure the structural differences between results. The tree edit distance is the standard choice but, is too expensive for large result sets. Moreover, the generalized tree edit distance ignores the context of the query and also the content of the documents resulting in poor diversification. We present a novel algorithm for meaningful diversification that considers both the structural context of the query and the content of the matched results while computing pair-wise distances. Our algorithm is an order of magnitude faster than the tree edit distance with an elegant worst case guarantee.
We also present a novel algorithm that finds the top-k diverse subset of matches in time linear on the size of the result-set. We experimentally demonstrate the utility of our algorithms as a plugin for standard query processors without introducing large error and latency to the output.

References

[1]
N. Augsten, D. Barbosa, M. Bohlen, and T. Palpanas. Tasm: Top-k approximate subtree matching. In ICDE, 2010.
[2]
E. Demidova, P. Fankhauser, X. Zhou, and W. Nejdl. Divq: Diversification for keyword search over structured databases. In SIGIR, 2010.
[3]
M. Drosou and E. Pitoura. Diversity over continuous data. IEEE Data(base) Engineering Bulletin, 32:49--56, 2009.
[4]
Z. Liu, P. Sun, and Y. Chen. Structured search result differentiation. In VLDB, 2009.
[5]
S. Tatikonda, S. Parthasarathy, and M. Goyder. Lcstrim: Dynamic programming meets xml indexing and querying. In VLDB, 2007.
[6]
M. R. Vieira, H. L. Razente, M. C. N. Barioni, M. Hadjieleftheriou, D. Srivastava, C. Traina, and V. J. Tsotras. On query result diversification. In ICDE, pages 1163--1174, 2011.
[7]
K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. Siam Journal on Computing, 18:1245--1262, 1989.

Cited By

View all

Index Terms

  1. Diversifying query results on semi-structured data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
    October 2012
    2840 pages
    ISBN:9781450311564
    DOI:10.1145/2396761
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. diversity
    2. semi-structured data
    3. xml

    Qualifiers

    • Short-paper

    Conference

    CIKM'12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Searching Lead to Better Search Intension for KeywordSmart Trends in Information Technology and Computer Communications10.1007/978-981-13-1423-0_14(120-128)Online publication date: 21-Aug-2018
    • (2017)Trading Off Popularity for Diversity in the Results Sets of Keyword Queries on Linked DataWeb Engineering10.1007/978-3-319-60131-1_9(151-170)Online publication date: 1-Jun-2017
    • (2017)Exploratory Ad-Hoc Analytics for Big DataHandbook of Big Data Technologies10.1007/978-3-319-49340-4_11(365-407)Online publication date: 26-Feb-2017
    • (2016)Diversifying the Results of Keyword Queries on Linked DataWeb Information Systems Engineering – WISE 201610.1007/978-3-319-48740-3_14(199-207)Online publication date: 2-Nov-2016
    • (2016)Diversification of Keyword Query Result PatternsWeb-Age Information Management10.1007/978-3-319-39958-4_14(171-183)Online publication date: 2-Jun-2016
    • (2015)Context-Based Diversification for Keyword Queries Over XML DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2014.233429727:3(660-672)Online publication date: 1-Mar-2015
    • (2014)User effort minimization through adaptive diversificationProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2623330.2623610(203-212)Online publication date: 24-Aug-2014

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media