research-article

SPARQL Rewriting: Towards Desired Results

Authors:

Lei ChenAuthors Info & Claims

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Pages 1979 - 1993

https://doi.org/10.1145/3318464.3389695

Published: 31 May 2020 Publication History

Abstract

Recent years witnessed the emergence of various applications on knowledge graphs, which are often represented as RDF graphs. However, due to the lack of data schema and the complexity of SPARQL language, there is usually a gap between the user's real desire and the actual meaning of a SPARQL query, especially when the query itself is complicated. In this paper, we try to narrow this gap by modifying a given query with a set of modifiers, so that its result approaches a user-provided example set. Specifically, we model this problem as two individual sub-problems, query-restricting, and query-relaxing, both of which are shown to be NP-hard. We further prove that unless P=NP, query-restricting has no polynomial-time approximation scheme (PTAS), and query-relaxing has no polynomial-time constant-factor approximation algorithm. Despite their hardness, we propose a (1-1/ε)-approximation method for query-restricting and 2 heuristics for query-relaxing. Extensive experiments have been conducted on real-world knowledge graphs to evaluate the effectiveness and efficiency of our proposed solutions.

Supplementary Material

MP4 File (3318464.3389695.mp4)

Presentation Video

Download
127.47 MB

References

[1]

Marcelo Arenas, Gonzalo I. Diaz, and Egor V. Kostylev. 2016. Reverse Engineering SPARQL Queries. In WWW. 239--249.

Digital Library

[2]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The Semantic Web. 722--735.

[3]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In ACM SIGMOD. 1247--1250.

[4]

Horst Bunke. 2000. Graph matching: Theoretical foundations, algorithms, and applications. In Proc. Vision Interface, Vol. 2000. 82--88.

[5]

MS Fabian, Kasneci Gjergji, WEIKUM Gerhard, et al. 2007. Yago: A core of semantic knowledge unifying wordnet and wikipedia. In WWW. 697--706.

[6]

Uriel Feige. 1998. A threshold of ln n for approximating set cover. JACM, Vol. 45, 4 (1998), 634--652.

Digital Library

[7]

Zhian He and Eric Lo. 2012. Answering why-not questions on top-k queries. TKDE, Vol. 26, 6 (2012), 1300--1315.

[8]

Melanie Herschel and Mauricio A Hernández. 2010. Explaining missing answers to SPJUA queries. VLDB Endowment, Vol. 3, 1--2 (2010), 185--196.

Digital Library

[9]

John E Hopcroft. 2008. Introduction to automata theory, languages, and computation.

[10]

Jiansheng Huang, Ting Chen, AnHai Doan, and Jeffrey F Naughton. 2008. On the provenance of non-answers to queries over extracted data. VLDB Endowment, Vol. 1, 1 (2008), 736--747.

Digital Library

[11]

Md Saiful Islam, Chengfei Liu, and Jianxin Li. 2015. Efficient answering of why-not questions in similar graph matching. TKDE, Vol. 27, 10 (2015), 2672--2686.

Digital Library

[12]

Md Saiful Islam, Rui Zhou, and Chengfei Liu. 2013. On answering why-not questions in reverse skyline queries. ICDE, 973--984.

[13]

Nandish Jayaram, Arijit Khan, Chengkai Li, Xifeng Yan, and Ramez Elmasri. 2015. Querying knowledge graphs by example entity tuples. TKDE, Vol. 27 (2015), 2797--2811.

Digital Library

[14]

Richard M Karp. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations. 85--103.

[15]

Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni, and Dan Roth. 2016. Question Answering via Integer Programming over Semi-Structured Knowledge. In IJCAI. 1145--1152.

[16]

Andreas Krause and Carlos Guestrin. 2005. A note on the budgeted maximization of submodular functions. http://snap.stanford.edu/class/cs224w-readings/krause05note.pdf. (2005).

[17]

Günter Ladwig and Thanh Tran. 2010. Linked data query processing strategies. In ISWC. 453--469.

[18]

Qing Liu, Yunjun Gao, Gang Chen, Baihua Zheng, and Linlin Zhou. 2016. Answering why-not and why questions on reverse top-k queries. VLDBJ, Vol. 25, 6 (2016), 867--892.

Digital Library

[19]

Davide Mottin, Francesco Bonchi, and Francesco Gullo. 2015. Graph Query Reformulation with Diversity. In SIGKDD. 825--834.

[20]

Davide Mottin, Matteo Lissandrini, Yannis Velegrakis, and Themis Palpanas. 2014. Exemplar queries: Give me an example of what you need. VLDB Endowment, Vol. 7, 5 (2014), 365--376.

Digital Library

[21]

Jaeseok Myung, Jongheum Yeon, and Sang-goo Lee. 2010. SPARQL basic graph pattern processing with iterative MapReduce. In Proceedings of the Workshop on Massive Data Analytics on the Cloud. 6.

Digital Library

[22]

Pekka Orponen and Heikki Mannila. 1987. On approximation preserving reductions: complete problems and robust measures. (1987).

[23]

Fotis Psallidas, Bolin Ding, Kaushik Chakrabarti, and Surajit Chaudhuri. 2015. S4: Top-k spreadsheet-style search for query discovery. In ACM SIGMOD. 2001--2016.

[24]

Mohammad Hossein Namaki Qi Song and Yinghui Wu. 2019. Answering Why-Questions for Subgraph Queries in Multi-Attributed Graphs. In ICDE.

[25]

Muhammad Saleem, Muhammad Intizar Ali, Aidan Hogan, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2015. LSQ: The Linked SPARQL Queries Dataset. In ISWC. 261--269.

[26]

Yanyan Shen, Kaushik Chakrabarti, Surajit Chaudhuri, Bolin Ding, and Lev Novik. 2014. Discovering queries based on example tuples. In ACM SIGMOD. 493--504.

[27]

Chen Shi, Shujie Liu, Shuo Ren, Shi Feng, Mu Li, Ming Zhou, Xu Sun, and Houfeng Wang. 2016. Knowledge-Based Semantic Embedding for Machine Translation. In ACL.

[28]

Amit Singhal. 2012. Introducing to the Knowledge Graph: things not strings. (2012).

[29]

Markus Stocker, Andy Seaborne, Abraham Bernstein, Christoph Kiefer, and Dave Reynolds. 2008. SPARQL basic graph pattern optimization using selectivity estimation. In WWW. 595--604.

[30]

Quoc Trung Tran and Chee-Yong Chan. 2010. How to conquer why-not questions. In ACM SIGMOD. 15--26.

[31]

Quoc Trung Tran, Chee-Yong Chan, and Srinivasan Parthasarathy. 2009. Query by output. In ACM SIGMOD. 535--548.

[32]

Luca Trevisan. 2004. Inapproximability of combinatorial optimization problems. arXiv preprint cs/0409043 (2004).

[33]

Elena Vasilyeva, Maik Thiele, Christof Bornhövd, and Wolfgang Lehner. 2016. Answering "Why Empty?" and "Why So Many?" Queries in Graph Databases. JCSS, Vol. 82 (2016), 3--22.

[34]

Denny Vrandecić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledge base. (2014).

[35]

Meng Wang, Jun Liu, Bifan Wei, Siyu Yao, Hongwei Zeng, and Lei Shi. 2019. Answering why-not questions on SPARQL queries. KIS (2019), 1--40.

[36]

Meihui Zhang, Hazem Elmeleegy, Cecilia M Procopiuc, and Divesh Srivastava. 2013. Reverse engineering complex join queries. In ACM SIGMOD. 809--820.

[37]

Moshe M. Zloof. 1977. Query-by-example: A data base language. IBM systems Journal, Vol. 16, 4 (1977), 324--343.

Cited By

Stadler CSaleem MMehmood QBuil-Aranda CDumontier MHogan ANgonga Ngomo A(2024)LSQ 2.0: A linked dataset of SPARQL query logsSemantic Web10.3233/SW-22301515:1(167-189)Online publication date: 12-Jan-2024
https://doi.org/10.3233/SW-223015
Chaudhry HRossi M(2024)Optimising Queries for Pattern Detection Over Large Scale Temporally Evolving GraphsIEEE Access10.1109/ACCESS.2024.341735212(86790-86808)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3417352
Almendros-Jiménez JBecerra-Terón AMoreno GRiaza J(2024)Tuning Fuzzy SPARQL QueriesInternational Journal of Approximate Reasoning10.1016/j.ijar.2024.109209(109209)Online publication date: Apr-2024
https://doi.org/10.1016/j.ijar.2024.109209
Show More Cited By

Index Terms

SPARQL Rewriting: Towards Desired Results
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
      1. Query reformulation
      2. Query suggestion

Recommendations

Towards Reconciling SPARQL and Certain Answers
WWW '15: Proceedings of the 24th International Conference on World Wide Web

SPARQL entailment regimes are strongly influenced by the big body of works on ontology-based query answering, notably in the area of Description Logics (DLs). However, the semantics of query answering under SPARQL entailment regimes is defined in a more ...
SPARQL query rewriting for implementing data integration over linked data
EDBT '10: Proceedings of the 2010 EDBT/ICDT Workshops

There has been lately an increased activity of publishing structured data in RDF due to the activity of the Linked Data community. The presence on the Web of such a huge information cloud, ranging from academic to geographic to gene related information, ...
SPARQL graph pattern rewriting for OWL-DL inference queries

This paper focuses on the issue of OWL-DL ontology queries implemented in SPARQL. Currently, ontology repositories construct inference ontology models, and match SPARQL queries to the models, to derive inference results. Because an inference model uses ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

June 2020

2925 pages

ISBN:9781450367356

DOI:10.1145/3318464

General Chairs:
David Maier
Portland State University, USA
,
Rachel Pottinger
University of British Columbia, Canada
,
Program Chairs:
AnHai Doan
University of Wisconsin, USA
,
Wang-Chiew Tan
Megagon Labs, USA
,
Publications Chairs:
Abdussalam Alawini
University of Illinois at Urbana-Champaign, USA
,
Hung Q. Ngo
RelationalAI, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '20

Sponsor:

SIGMOD

SIGMOD/PODS '20: International Conference on Management of Data

June 14 - 19, 2020

OR, Portland, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
568
Total Downloads

Downloads (Last 12 months)37
Downloads (Last 6 weeks)4

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Stadler CSaleem MMehmood QBuil-Aranda CDumontier MHogan ANgonga Ngomo A(2024)LSQ 2.0: A linked dataset of SPARQL query logsSemantic Web10.3233/SW-22301515:1(167-189)Online publication date: 12-Jan-2024
https://doi.org/10.3233/SW-223015
Chaudhry HRossi M(2024)Optimising Queries for Pattern Detection Over Large Scale Temporally Evolving GraphsIEEE Access10.1109/ACCESS.2024.341735212(86790-86808)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3417352
Almendros-Jiménez JBecerra-Terón AMoreno GRiaza J(2024)Tuning Fuzzy SPARQL QueriesInternational Journal of Approximate Reasoning10.1016/j.ijar.2024.109209(109209)Online publication date: Apr-2024
https://doi.org/10.1016/j.ijar.2024.109209
Zeng JU LYan XLi YHan MTang B(2023)Extracting Top-$k$ Frequent and Diversified Patterns in Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3233594(1-18)Online publication date: 2023
https://doi.org/10.1109/TKDE.2022.3233594
Thapa RGiese M(2023)Optimizing SPARQL Queries with SHACLThe Semantic Web – ISWC 202310.1007/978-3-031-47240-4_3(41-60)Online publication date: 27-Oct-2023
https://doi.org/10.1007/978-3-031-47240-4_3
Lin XJiang D(2022)A Two-Phase Method for Optimization of the SPARQL QueryJournal of Sensors10.1155/2022/46248562022(1-12)Online publication date: 25-Aug-2022
https://doi.org/10.1155/2022/4624856
Xue XTsai PChen J(2022)Large-scale complex ontology matching through anchor-based semantic partitioning technique and confidence matrix based evolutionary algorithmApplied Soft Computing10.1016/j.asoc.2022.109516128(109516)Online publication date: Oct-2022
https://doi.org/10.1016/j.asoc.2022.109516
Zeng JU LYan XHan MTang B(2021)Fast Core-based Top-k Frequent Pattern Discovery in Knowledge Graphs2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00086(936-947)Online publication date: Apr-2021
https://doi.org/10.1109/ICDE51399.2021.00086

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents