We report on a community effort between industry and academia to shape the future of graph query languages. We argue that existing graph database management systems should consider supporting a query language with two key characteristics. First, it should be composable, meaning, that graphs are the input and the output of queries. Second, the graph query language should treat paths as first-class citizens. Our result is G-CORE, a powerful graph query language design that fulfills these goals, and strikes a careful balance between path query expressivity and evaluation complexity.
References
[1]
2017. Cypher for Apache Spark. (2017). https://github.com/opencypher/ cypher-for-apache-spark
Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter A. Boncz, George H. L. Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan F. Sequeda, Oskar van Rest, and Hannes Voigt. 2017. G-CORE: A Core for Future Graph Query Languages. CoRR abs/1712.01550 (2017). http://arxiv.org/ abs/1712.01550
Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan L. Reutter, and Domagoj Vrgoc. 2017. Foundations of Modern Query Languages for Graph Databases. Comput. Surveys 50, 5 (2017).
Renzo Angles, Peter Boncz, Josep Larriba-Pey, Irini Fundulaki, Thomas Neumann, Orri Erling, Peter Neubauer, Norbert Martinez-Bazan, Venelin Kotsev, and Ioan Toma. 2014. The Linked Data Benchmark Council: A Graph and RDF Industry Benchmarking Effort. SIGMOD Record 43, 1 (May 2014), 27--31.
Pablo Barceló, Leonid Libkin, Anthony W. Lin, and Peter T. Wood. 2012. Expressive Languages for Path Queries over Graph-Structured Data. TODS 37, 4, Article 31 (Dec. 2012), 46 pages.
Anton Dries, Siegfried Nijssen, and Luc De Raedt. 2009. A Query Language for Analyzing Networks. In Proc. of the 18th ACM Conference on Information and Knowledge Management (CIKM). ACM, 485--494.
Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and Peter Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In SIGMOD2015. ACM, 619--630.
Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. 2018. Cypher: An Evolving Query Language for Property Graphs. In ACM SIGMOD International Conference on Management of Data (SIGMOD 2018).
Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau PratPérez, Thomas Manhardt, Hassan Chafi, Mihai Capotǎ, Narayanan Sundaram, Michael Anderson, Ilie Gabriel Tǎnase, Yinglong Xia, Lifeng Nai, and Peter Boncz. 2016. LDBC Graphalytics: A Benchmark for Large-scale Graph Analysis on Parallel and Distributed Platforms. PVLDB 9, 13 (Sept. 2016), 1317--1328.
Stefan Plantikow, Martin Junghanns, Petra Selmer, and Max Kießling. 2017. Cypher and Spark: Multiple Graphs and More in openCypher. (2017). https: //www.youtube.com/watch?v=EaCFxDxhtsI
Marko A. Rodriguez and Peter Neubauer. 2010. Constructions from Dots and Lines. Bulletin of the American Society for Information Science and Technology 36, 6 (Aug. 2010), 35--41.
Nicholas P Roth, Vasileios Trigonakis, Sungpack Hong, Hassan Chafi, Anthony Potter, Boris Motik, and Ian Horrocks. 2017. PGX.D/Async: A Scalable Distributed Graph Pattern Matching Engine. (2017).
Martin Sevenich, Sungpack Hong, Oskar van Rest, Zhe Wu, Jayanta Banerjee, and Hassan Chafi. 2016. Using domain-specific languages for analytic graph databases. Proceedings of the VLDB Endowment 9, 13 (2016), 1257--1268.
Hannes Voigt. 2017. Declarative Multidimensional Graph Queries, Patrick Marcel and Esteban Zimányi (Eds.). Business Intelligence -- 6th European Summer School, eBISS 2016, Tours, France, July 3--8, 2016, Tutorial Lectures 280, 1--37.
Hang JHong ZFeng XWang GCao DQiao JWang HZhang D(2024)Complex-Path: Effective and Efficient Node Ranking with Paths in Billion-Scale Heterogeneous GraphsProceedings of the VLDB Endowment10.14778/3685800.368582017:12(3973-3986)Online publication date: 1-Aug-2024
Han SIves Z(2024)Implementation Strategies for Views over Property GraphsProceedings of the ACM on Management of Data10.1145/36549492:3(1-26)Online publication date: 30-May-2024
GRADES '16: Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems
Graph-based approaches to data analysis have become more widespread, which has given need for a query language for graphs. Such a graph query language needs not only SQL-like functionality for querying structured data, but also intrinsic support for ...
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data
In the last two decades, we have been witnessing high demand for graph-based technologies in industry. On the research side, several recent advances have been made about large-scale graph processing, graph analytical systems and graph databases. The ...
The industry-wide adoption of graph databases has been hindered due to the fragmentation in syntax and semantics of available graph query languages. As a result, several projects have been proposed by industry and academia to develop a ...
Highlights
There is a lack of explicit mapping between practical graph query languages and theoretical language formalisms.
With a description of the G-CORE language, designed by the Linked Data Benchmark Council (LDBC) Graph Query Language Task Force, this paper presents a standardization proposal of a graph query language for property graphs.
In section 1, the authors focus on three main challenges to existing graph query languages: "composability," "paths as first-class citizens," and "capture the core of available languages." By composability the authors mean that "graphs are the input and the output of queries." Their approach addresses the third challenge by taking "the successful functionalities of current [graph query] languages ... to develop the next generation of graph languages.
Section 2 discusses the second challenge and defines an extended path property graph (PPG) model. PPGs allow for multi-valued properties and stored paths. Paths have an identity and can also have labels. Queries on paths are also enabled.
The core section (3) "demonstrate[s] and explain[s] the main features of the G-CORE language," for example, returning a graph for every query, matching and filtering, multi-graph queries and joins, dealing with multi-valued properties, constructions that respect identities, graph aggregations, treatment of path, existential subqueries, views and optional matching, weighted shortest paths, and the use of graph patterns. All constructs are explained in examples.
In section 4, the authors "provide a formal definition of the syntax and semantics of [G-CORE]." A complexity analysis is also considered here. The authors prove that "evaluating [a query, q ] over an input PPG G can be computed in polynomial time," that is, G-CORE is tractable. In section 5, the authors "show how G-CORE is extended to handle tabular data." This approach is in accordance with today's trends to integrate heterogeneous data and to develop polyglot databases.
Section 6 presents related works, that is, the most important industrial graph database products represented by their graph query languages, Gremlin, Cypher, and PGQL. The authors "describe the main differences among G-CORE [and these languages]." Finally, in the conclusion, one more use is emphasized: G-CORE could be used as a base for integrating many graph-oriented data models and approaches to querying graphs.
The notions defined in the paper are specified in the usual denotational way, which provides the needed clarity and preciseness. A lot of representative examples increase the paper's readability. Without doubt, the paper offers interesting, valuable, and useful information for those interested in graph query languages.
Access critical reviews of Computing literature here
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Hang JHong ZFeng XWang GCao DQiao JWang HZhang D(2024)Complex-Path: Effective and Efficient Node Ranking with Paths in Billion-Scale Heterogeneous GraphsProceedings of the VLDB Endowment10.14778/3685800.368582017:12(3973-3986)Online publication date: 1-Aug-2024
Han SIves Z(2024)Implementation Strategies for Views over Property GraphsProceedings of the ACM on Management of Data10.1145/36549492:3(1-26)Online publication date: 30-May-2024
David CFrancis NMarsault V(2024)Distinct Shortest Walk Enumeration for RPQsProceedings of the ACM on Management of Data10.1145/36516012:2(1-22)Online publication date: 14-May-2024
Zhang SHe ZJing YZhang KWang X(2024)MWP: Multi-Window Parallel Evaluation of Regular Path Queries on Streaming GraphsProceedings of the ACM on Management of Data10.1145/36392602:1(1-26)Online publication date: 26-Mar-2024
Seifer PHernández DLämmel RStaab SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)From Shapes to Shapes: Inferring SHACL Shapes for Results of SPARQL CONSTRUCT QueriesProceedings of the ACM Web Conference 202410.1145/3589334.3645550(2064-2074)Online publication date: 13-May-2024
Zou KXie XLi HWang X(2024)Multithreading Heterogeneous Graph AggregationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3320127(1-15)Online publication date: 2024
García RAngles R(2024)Path Querying in Graph Databases: A Systematic Mapping StudyIEEE Access10.1109/ACCESS.2024.337197612(33154-33172)Online publication date: 2024
Pino EOrejas FMylonakis NPasarella E(2024)A Logical Approach to Graph DatabasesJournal of Logical and Algebraic Methods in Programming10.1016/j.jlamp.2024.100997(100997)Online publication date: Jun-2024