Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1516360.1516425acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free access

Flexible and efficient querying and ranking on hyperlinked data sources

Published: 24 March 2009 Publication History

Abstract

There has been an explosion of hyperlinked data in many domains, e.g., the biological Web. Expressive query languages and effective ranking techniques are required to convert this data into browsable knowledge. We propose the Graph Information Discovery (GID) framework to support sophisticated user queries on a rich web of annotated and hyperlinked data entries, where query answers need to be ranked in terms of some customized ranking criteria, e.g., PageRank or ObjectRank. GID has a data model that includes a schema graph and a data graph, and an intuitive query interface. The GID framework allows users to easily formulate queries consisting of sequences of hard filters (selection predicates) and soft filters (ranking criteria); it can also be combined with other specialized graph query languages to enhance their ranking capabilities. GID queries have a well-defined semantics and are implemented by a set of physical operators, each of which produces a ranked result graph. We discuss rewriting opportunities to provide an efficient evaluation of GID queries. Soft filters are a key feature of GID and they are implemented using authority flow ranking techniques; these are query dependent rankings and are expensive to compute at runtime. We present approximate optimization techniques for GID soft filter queries based on the properties of random walks, and using novel path-length-bound and graph-sampling approximation techniques. We experimentally validate our optimization techniques on large biological and bibliographic datasets. Our techniques can produce high quality (Top K) answers with a savings of up to an order of magnitude, in comparison to the evaluation time for the exact solution.

References

[1]
http://www.ncbi.nlm.nih.gov/sites/entrez, 2008.
[2]
http://dblp.uni-trier.de/xml/
[3]
S. Agrawal, S. Chaudhuri and G. Das: "DBXplorer: A System for Keyword-Based Search Over Relational Databases", IEEE ICDE, 2002.
[4]
G. Arocena, A. Mendelzon: WebOQL: Restructuring documents, databases and webs. ICDE 1998.
[5]
A. Balmin, V. Hristidis and Y. Papakonstantinou: "Authority-Based Keyword Queries in Databases using ObjectRank". VLDB 2004.
[6]
G. Bhalotia, C. Nakhe, A. Hulgeri, S. Chakrabarti and S. Sudarshan: "Keyword Searching and Browsing in Databases using BANKS", IEEE ICDE, 2002.
[7]
R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, E. Vee: "Comparing and Aggregating rankings with Ties". PODS, 2004.
[8]
R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, E. Vee: "Comparing Partial Rankings". SIDMA, 2006, vol. 20, No. 3.
[9]
R. Fagin, R. Kumar, D. Sivakumar: "Comparing Top-k lists". SODA, 2003.
[10]
R. Fagin, A. Lotem, M. Naor: Optimal Aggregation Algorithms for Middleware. PODS, 2001.
[11]
G. Feng, T. Y. Liu, Y. Wang, Y. Bao, Z. Ma, X. Zhang, W. Y. Ma: "AggregateRank: Bringing order to web sites". SIGIR, 2006.
[12]
M.Fernandez, D. Florescu, A. Levy, D. Suciu: A query language for a web site management system. SIGMOD Record 1997.
[13]
R. Goldman, N. Shivakumar, S. Venkatasubramanian, H. Garcia-Molina: "Proximity Search in Databases". VLDB, 1998.
[14]
T. Haveliwala: "Topic-Sensitive PageRank". WWW, 2002.
[15]
J. Hellerstein, P. Haas, and H. J. Wang. Online aggregation. SIGMOD Rec. 26, 2 (Jun. 1997), 171--182.
[16]
V. Hristidis and Y. Papakonstantinou: "DISCOVER: Keyword Search in Relational Databases", VLDB, 2002.
[17]
V. Hristidis, Y. Papakonstantinou and A. Balmin: "Keyword Proximity Search on XML Graphs", IEEE ICDE, 2003.
[18]
L. Katz: "A New Status Index derived from Sociometric Analysis". Psychometrika, 1953, vol. 18, issue 1.
[19]
G. Kasneci, F. M. Suchanek, G. Ifrim, M. Ramanath, G. Weikum: NAGA: Searching and Ranking Knowledge. ICDE 2008: 953--962.
[20]
D. Konopnicki, O. Shmueli: W3QS: A query system for the World Wide Web. VLDB 1995.
[21]
A. Mendelzon, G. Mihalia, T. Milo: Querying the World Wide Web. Journal on Digital Libraries 1(1):54--67, 1997.
[22]
L. Nie, B. D. Davison and X. Qi: "Topical link analysis for web search". SIGIR, 2006.
[23]
L. Page, S. Brin, R. Motwani and T. Winograd: "The pagerank citation ranking: Bringing order to the web", Technical report, Stanford University, 1998.
[24]
S. Raghavan, H. Garcia-Molina: "Complex Queries over Web Repositories". VLDB, 2003.
[25]
L. Raschid, Y. Wu, W. J. Lee, M. E. Vidal, P. Tsaparas, P. Srinivasan, A. K. Sehgal: "Ranking target objects of navigational queries". ACM WIDM, 2006.
[26]
S. Russell and P. Norvig: "Artificial Intelligence: A modern approach. Second Edition. Princeton Hall. 2003.
[27]
A. Singhal: "Modern Information Retrieval: A Brief Overview". Google, IEEE Data Eng. Bull, 2001.
[28]
SPARQL: Query Language for RDF: http://www.w3.org/TR/rdf-sparql-query/
[29]
R. Varadarajan, V. Hristidis, L. Raschid: Explaining and Reformulating Authority Flow Queries. IEEE ICDE, 2008.
[30]
R. Varadarajan, V. Hristidis, L. Raschid, M. Vidal, L. Lbanez and H. Drumond: Flexible and Efficient Querying and Ranking of Hyperlinked Data Source (extended version). http://dbir.cs.fiu.edu/WebSearch/GID.pdf.

Cited By

View all
  • (2018)An authority-flow based ranking approach to discover potential novel associations between Linked DataSemantic Web10.5555/2786093.27860975:1(23-46)Online publication date: 13-Dec-2018
  • (2018)Approximation and relaxation of semantic web path queriesWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2016.08.00140:C(1-21)Online publication date: 20-Dec-2018
  • (2015)Edge-Weighted Personalized PageRankProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2783278(1325-1334)Online publication date: 10-Aug-2015
  • Show More Cited By

Index Terms

  1. Flexible and efficient querying and ranking on hyperlinked data sources

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
    March 2009
    1180 pages
    ISBN:9781605584225
    DOI:10.1145/1516360
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 March 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ObjectRank
    2. authority flow ranking
    3. hard filters
    4. soft filters

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    EDBT/ICDT '09
    EDBT/ICDT '09: EDBT/ICDT '09 joint conference
    March 24 - 26, 2009
    Saint Petersburg, Russia

    Acceptance Rates

    Overall Acceptance Rate 7 of 10 submissions, 70%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 09 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)An authority-flow based ranking approach to discover potential novel associations between Linked DataSemantic Web10.5555/2786093.27860975:1(23-46)Online publication date: 13-Dec-2018
    • (2018)Approximation and relaxation of semantic web path queriesWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2016.08.00140:C(1-21)Online publication date: 20-Dec-2018
    • (2015)Edge-Weighted Personalized PageRankProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2783278(1325-1334)Online publication date: 10-Aug-2015
    • (2014)Efficient Ranking on Entity Graphs with Personalized RelationshipsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2013.5226:4(850-863)Online publication date: Apr-2014
    • (2013)Towards query model integrationProceedings of the Joint EDBT/ICDT 2013 Workshops10.1145/2457317.2457350(185-194)Online publication date: 18-Mar-2013
    • (2013)Efficient search algorithm for SimRankProceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)10.1109/ICDE.2013.6544858(589-600)Online publication date: 8-Apr-2013
    • (2011)Effective ranking techniques for book review retrieval based on the structural featureProceedings of the 5th international conference on Convergence and hybrid information technology10.5555/2045005.2045053(360-367)Online publication date: 22-Sep-2011
    • (2011)Using medians to generate consensus rankings for biological dataProceedings of the 23rd international conference on Scientific and statistical database management10.5555/2032397.2032404(73-90)Online publication date: 20-Jul-2011
    • (2011)Ranking objects by following paths in entity-relationship graphsProceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management10.1145/2065003.2065008(11-18)Online publication date: 28-Oct-2011
    • (2011)Using Medians to Generate Consensus Rankings for Biological DataScientific and Statistical Database Management10.1007/978-3-642-22351-8_5(73-90)Online publication date: 2011
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media