Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Querying Graphs with Data

Published: 20 March 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Graph databases have received much attention as of late due to numerous applications in which data is naturally viewed as a graph; these include social networks, RDF and the Semantic Web, biological databases, and many others. There are many proposals for query languages for graph databases that mainly fall into two categories. One views graphs as a particular kind of relational data and uses traditional relational mechanisms for querying. The other concentrates on querying the topology of the graph. These approaches, however, lack the ability to combine data and topology, which would allow queries asking how data changes along paths and patterns enveloping it.
    In this article, we present a comprehensive study of languages that enable such combination of data and topology querying. These languages come in two flavors. The first follows the standard approach of path queries, which specify how labels of edges change along a path, but now we extend them with ways of specifying how both labels and data change. From the complexity point of view, the right type of formalisms are subclasses of register automata. These, however, are not well suited for querying. To overcome this, we develop several types of extended regular expressions to specify paths with data and study their querying power and complexity. The second approach adopts the popular XML language XPath and extends it from XML documents to graphs. Depending on the exact set of allowed features, we have a family of languages, and our study shows that it includes efficient and highly expressive formalisms for querying both the structure of the data and the data itself.

    Supplementary Material

    a14-libkin-apndx.pdf (libkin.zip)
    Supplemental movie, appendix, image and software files for, Querying Graphs with Data

    References

    [1]
    S. Abiteboul, R. Hull, and V. Vianu. 1995. Foundations of Databases. Addison-Wesley.
    [2]
    S. Abiteboul, L. Segoufin, and V. Vianu. 2006. Representing and querying XML with incomplete information. ACM Trans. Database Syst. 31, 1 (2006), 208--254.
    [3]
    S. Abiteboul and V. Vianu. 1999. Regular path queries with constraints. J. Comput. Syst. Sci. 3, 58 (1999), 428--452.
    [4]
    A. V. Aho. 1990. Handbook of Theoretical Computer Science. 255--300 pages.
    [5]
    N. Alechina, S. Demri, and M. de Rijke. 2003. A modal perspective on path constraints. J. Log. Comput. 13, 6 (2003), 939--956.
    [6]
    N. Alechina and N. Immerman. 2000. Reachability logic: An efficient fragment of transitive closure logic. Logic J. IGPL 8, 3 (2000), 325--337.
    [7]
    H. Andréka, I. Németi, and I. Sain. 2001. Algebraic logic. In Handbook of Philosophical Logic (2nd ed.). Vol. 2. Springer.
    [8]
    R. Angles and C. Gutierrez. 2008. Survey of graph database models. Comput. Surv. 40, 1 (2008), 1:1--1:39.
    [9]
    P. Barceló. 2013. Querying graph databases. In 32th ACM Symposium on Principles of Database Systems (PODS’13).
    [10]
    P. Barceló, D. Figueira, and L. Libkin. 2012a. Graph logics with rational relations and the generalized intersection problem. In 27th Annual IEEE Symposium on Logic in Computer Science (LICS’12).
    [11]
    P. Barceló, L. Libkin, A. W. Lin, and P. T. Wood. 2012b. Expressive languages for path queries over graph-structured data. ACM TODS 38, 4 (2012), 31:1--31:46.
    [12]
    P. Barceló, L. Libkin, A. Poggi, and C. Sirangelo. 2010. XML with incomplete information. J. ACM 58, 1 (2010), 1--62.
    [13]
    P. Barceló, L. Libkin, and J. L. Reutter. 2014. Querying regular graph patterns. J. ACM 61, 1 (2014), 8:1--8:54.
    [14]
    P. Barceló, J. Pérez, and J. L. Reutter. 2012c. Relative expressiveness of nested regular expressions. In Proceedings of the 6th Alberto Mendelzon International Workshop on Foundations of Data Management (AMW). 180--195.
    [15]
    P. Barceló, R. Pichler, and S. Skritek. 2015. Efficient evaluation and approximation of well-designed pattern trees. In ACM Symposium on Principles of Database Systems (PODS’15). 131--144.
    [16]
    M. Benedikt, W. Fan, and F. Geerts. 2008. XPath satisfiability in the presence of DTDs. J. ACM 55, 2 (2008), 8:1--8:79.
    [17]
    M. Bojanczyk. 2010. Automata for data words and data trees. In Proceedings of the 21st International Conference on Rewriting Techniques and Applications (RTA).
    [18]
    M. Bojanczyk, C. David, A. Muscholl, T. Schwentick, and L. Segoufin. 2011. Two-variable logic on words with data. ACM TOCL 12, 4 (2011), 27:1--27:26.
    [19]
    M. Bojanczyk, A. Muscholl, T. Schwentick, and L. Segoufin. 2009. Two-variable logic on data trees and XML reasoning. J. ACM 56, 3 (2009), 13:1--13:48.
    [20]
    M. Bojanczyk and P. Parys. 2011. XPath evaluation in linear time. J. ACM 58, 4 (2011), 17:1--17:33.
    [21]
    P. Bouyer, A. Petit, and D. Thérien. 2001. An algebraic characterization of data and timed languages. In International Conference on Concurrency Theory (CONCUR). 248--261.
    [22]
    D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Y. Vardi. 2000. Containment of conjunctive regular path queries with inverse. In 7th International Conference on Principles of Knowledge Representation and Reasoning (KR’00). 176--185.
    [23]
    D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Y. Vardi. 2009. An automata-theoretic approach to regular XPath. In 12th International Symposium on Database Programming Languages (DBLP). 18--35.
    [24]
    S. Cassidy. 2003. Generalizing XPath for directed graphs. In Extreme Markup Languages.
    [25]
    R. Cleaveland and B. Steffen. 1993. A linear-time model-checking algorithm for the alternation-free modal mu-calculus. Formal Methods Syst. Design 2, 2 (1993), 121--147.
    [26]
    M. Consens and A. O. Mendelzon. 1990. GraphLog: A visual formalism for real life recursion. In 9th ACM Symposium on Principles of Database Systems (PODS’90). 404--416.
    [27]
    I. Cruz, A. O. Mendelzon, and P. Wood. 1987. A graphical query language supporting recursion. In ACM Special Interest Group on Management of Data 1987 Annual Conference (SIGMOD’87). 323--330.
    [28]
    S. Demri and R. Lazić. 2009. LTL with the freeze quantifier and register automata. ACM TOCL 10, 3 (2009), 16:1--16:30.
    [29]
    S. Demri, R. Lazić, and D. Nowak. 2007. On the freeze quantifier in constraint LTL: Decidability and complexity. Inf. Comput. 205, 1 (2007), 2--24.
    [30]
    A. Deutsch and V. Tannen. 2001. Optimization properties for classes of conjunctive regular path queries. In 8th International Workshop on Database Programming Languages (DBPL’01). 21--39.
    [31]
    Dex. 2013. DEX query language, Sparsity Technologies. Retrieved from http://www.sparsity-technologies.com/dex.php.
    [32]
    Facebook. 2014. Graph Search. Retrieved from https://www.facebook.com/about/graphsearch.
    [33]
    W. Fan. 2012. Graph pattern matching revised for social network analysis. In 15th International Conference on Database Theory (ICDT). 8--21.
    [34]
    D. Figueira. 2009. Satisfiability of downward XPath with data equality tests. In 28th ACM Symposium on Principles of Database Systems (PODS’09). 197--206.
    [35]
    D. Figueira. 2010. Reasoning on Words and Trees with Data. Ph.D. Dissertation. ÉNS de Cachan.
    [36]
    G. H. L. Fletcher, M. Gyssens, D. Leinders, J. Van den Bussche, D. Van Gucht, S. Vansummeren, and Y. Wu. 2011. Relative expressive power of navigational querying on graphs. In 14th International Conference on Database Theory (ICDT). 197--207.
    [37]
    D. Florescu, A. Y. Levy, and D. Suciu. 1998. Query containment for conjunctive queries with regular expressions. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 139--148.
    [38]
    S. Fortune, J. Hopcroft, and J. Wyllie. 1980. The directed homeomorphism problem. Theor. Comput. Sci. 10 (1980), 111--121.
    [39]
    G. Gottlob, C. Koch, and R. Pichler. 2005. Efficient algorithms for processing XPath queries. ACM Trans. Database Syst. 30, 2 (2005), 444--491.
    [40]
    Gremlin 2013. Gremlin Language. Retrieved from https://github.com/tinkerpop/gremlin/wiki.
    [41]
    C. Gutierrez, C. Hurtado, A. O. Mendelzon, and J. Pérez. 2011. Foundations of semantic web databases. J. Comput. System Sci. 77, 3 (2011), 520--541.
    [42]
    S. Harris and A. Seaborne. 2013. SPARQL 1.1 Query Language. W3C Recommendation. http://www.w3.org/ TR/sparql11-query/. (March 2013).
    [43]
    M. Kaminski and N. Francez. 1994. Finite memory automata. Theor. Comput. Sci. 134, 2 (1994), 329--363.
    [44]
    M. Kaminski and T. Tan. 2006. Regular expressions for languages over infinite alphabets. Fundamenta Informaticae 69, 3 (2006), 301--318.
    [45]
    M. Kaminski and T. Tan. 2008. Tree automata over infinite alphabets. In Pillars of Computer Science. 386--423.
    [46]
    M. Kay. 2004. XPath 2.0 Programmer’s Reference. Wrox.
    [47]
    E. V. Kostylev, J. L. Reutter, M. Romero, and D. Vrgoč. 2015b. SPARQL with property paths. In The Semantic Web - 14th International Semantic Web Conference, Proceedings, Part I (ISWC’15). 3--18.
    [48]
    E. V. Kostylev, J. L. Reutter, and D. Vrgoč. 2015a. XPath for DL ontologies. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI). 1525--1531.
    [49]
    G. M. Kuper, L. Libkin, and J. Paredaens (Eds.). 2000. Constraint Databases. Springer.
    [50]
    M. Lange. 2006. Model checking propositional dynamic logic with all extras. J. Appl. Logic 4, 1 (2006), 39--49.
    [51]
    L. Libkin. 2004. Elements of Finite Model Theory. Springer.
    [52]
    L. Libkin. 2014. Certain answers as objects and knowledge. In Principles of Knowledge Representation and Reasoning (KR’14). 328--337.
    [53]
    L. Libkin, W. Martens, and D. Vrgoč. 2013a. Querying graph databases with XPath. In Joint 2013 EDBT/ICDT Conferences (ICDT'13).
    [54]
    L. Libkin, J. L. Reutter, and D. Vrgoč. 2013b. TriAL for RDF: Adapting graph query languages for RDF data. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS).
    [55]
    L. Libkin and D. Vrgoč. 2012a. Regular expressions for data words. In Logic for Programming, Artificial Intelligence, and Reasoning (LPAR). 274--288.
    [56]
    L. Libkin and D. Vrgoč. 2012b. Regular path queries on graphs with data. In 15th International Conference on Database Theory (ICDT). 74--85.
    [57]
    K. Losemann and W. Martens. 2012. The complexity of evaluating path expressions in SPARQL. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 101--112.
    [58]
    M. Marx. 2003. XPath and modal logics of finite DAG’s. In International Conference on Automated Reasoning with Analytic Tableaux and Related Methods (TABLEAUX). 150--164.
    [59]
    M. Marx. 2005. Conditional XPath. ACM Trans. Database Syst. 30, 4 (2005), 929--959.
    [60]
    Neo4j 2013. Neo4j, The Graph Database. Retrieved from http://www.neo4j.org/.
    [61]
    F. Neven, Th. Schwentick, and V. Vianu. 2004. Finite state machines for strings over infinite alphabets. ACM Trans. Comput. Log. 5, 3 (2004), 403--435.
    [62]
    J. Pérez, M. Arenas, and C. Gutierrez. 2009. Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34, 3 (2009), 16:1--16:45.
    [63]
    J. Pérez, M. Arenas, and C. Gutierrez. 2010. nSPARQL: A navigational language for RDF. J. Web Semantics 8, 4 (2010), 255--270.
    [64]
    J. L. Reutter. 2013a. Containment of Nested Regular Expressions. CoRR abs/1304.2637. (2013).
    [65]
    J. L. Reutter. 2013b. Graph Patterns: Structure, Query Answering and Applications in Schema Mappings and Formal Language Theory. Ph.D. Dissertation. School of Informatics, University of Edinburgh.
    [66]
    J. L. Reutter, A. Soto, and D. Vrgoč. 2015. Recursion in SPARQL. In The Semantic Web - 14th International Semantic Web Conference, Proceedings, Part I (ISWC’15). 19--35.
    [67]
    R. Ronen and O. Shmueli. 2009. SoQL: A language for querying and creating data in social networks. In 25th International Conference on Data Engineering (ICDE’09). 1595--1602.
    [68]
    H. Sakamoto and D. Ikeda. 2000. Intractability of decision problems for finite-memory automata. Theor. Comput. Sci. 231, 2 (2000), 297--308.
    [69]
    M. San Martín and C. Gutierrez. 2009. Representing, querying and transforming social networks with RDF/SPARQL. In 6th European Semantic Web Conference (ESWC’09). 293--307.
    [70]
    L. Segoufin. 2006. Automata and logics for words and trees over an infinite alphabet. In Computer Science Logic (CSL). 41--57.
    [71]
    L. Segoufin. 2007. Static analysis of XML processing with data values. SIGMOD Record 36, 1 (2007), 31--38.
    [72]
    A. Tarski and S. Givant. 1987. A Formalization of Set Theory Without Variables. AMS.
    [73]
    B. ten Cate. 2006. The expressivity of XPath with transitive closure. In 25th ACM Symposium on Principles of Database Systems (PODS’06). 328--337.
    [74]
    B. ten Cate and M. Marx. 2007. Navigational XPath: Calculus and algebra. Sigmod Record 36, 2 (2007), 19--26.
    [75]
    D. Vrgoč. 2014. Querying Graphs with Data. Ph.D. Dissertation. School of Informatics, University of Edinburgh.
    [76]
    P. T. Wood. 2012. Query languages for graph databases. Sigmod Record 41, 1 (2012), 50--60.

    Cited By

    View all

    Recommendations

    Reviews

    Jaroslav Pokorny

    Various approaches to querying so-called graphs with data are described very comprehensively in this paper. In principle, this category of graphs is called (labeled) property graphs in the graph database community. Concerning queries in such databases, two categories are considered: one views graphs as sets of relational data, and the other concentrates on querying the topology of the graph. The authors focus on querying that combines both of these approaches. In section 2, containing preliminaries, an important notion of path and various types of complexity of query evaluation are defined. In the forthcoming sections, the authors deal with two categories of languages: path languages and graph languages. Also, notions like regular path queries (RPQs), conjunctive RPQs (CRPQs), and nested regular expressions (NREs) are introduced. Section 3 discusses existing languages for paths, particularly register automata and related classes of expressions focused on data path queries. First, regular data path queries (RDPQs) are studied, followed by regular query expressions with memory (RQM), which use regular expressions with memory (REM). Another class of regular expressions for data paths permits testing for equality or inequality of data; this enables consideration of queries based on regular expressions with equality. The authors talk about regular queries with datasets (RQDs). The core section 4 shows how the XPath language can be adapted to work over graphs. XPath has many desirable properties, such as the ability to define patterns that cannot be captured by paths, close connections to first-order logic, and efficient evaluation algorithms. The authors show how to extend the language to operate over graph databases while still retaining these properties. Such a language is called GPath here, and it is discussed with different data tests permitted in its formulas. Based on two categories of path formulas, the regular graph XPath (GXPathreg) and core graph XPath (GXPathcore) languages are introduced. Some extensions of both languages with counters are also discussed. The next two subsections of section 4 focus on query evaluation and expressive power. The authors investigate the complexity of querying graph databases using variants of GXPath and its many dialects when compared to first-order logic. They provide a detailed analysis of expressiveness for navigational features of these dialects and prove that GXPathcore is a proper subset of GXPathreg. Another result says that GXPathreg is equivalent to a three-variable fragment of the transitive-closer first-order logic. Expressiveness of languages using datasets is also studied. The rest of section 4 contains a comparison of GXPath to path languages introduced in section 3, as well as to traditional navigational languages such as RPQs, CRPQs, and NREs. A hierarchy of the GXPath fragments with regard to their relative expressive power illustrates their mutual relationships. In section 5, the authors define conjunctive queries using languages from previous sections as their basic building blocks. Section 6 includes a summary of complexity results for classes of queries studied in the paper. Finally, section 6 offers a number of possible directions for future research. The consideration of incomplete information is very realistic, since it occurs very often in practice, particularly in querying graphs with data. The notions defined in the paper are specified in a usual denotational way, which offers the needed clarity and preciseness. This increases the readability of the paper. Without a doubt, the paper offers interesting, valuable, and useful material for those interested in graph query languages and the associated theory. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of the ACM
    Journal of the ACM  Volume 63, Issue 2
    May 2016
    249 pages
    ISSN:0004-5411
    EISSN:1557-735X
    DOI:10.1145/2906142
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 March 2016
    Accepted: 01 December 2015
    Revised: 01 December 2015
    Received: 01 September 2014
    Published in JACM Volume 63, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Graph databases
    2. XPath
    3. data values
    4. navigational queries

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • DFG
    • Millennium Nucleus Center for Semantic Web Research
    • EPSRC

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)74
    • Downloads (Last 6 weeks)7
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Process Query LanguageInformation Systems10.1016/j.is.2023.102337122:COnline publication date: 1-May-2024
    • (2024)Cardinality estimation for property graph queries with gated learning approach on the graph databaseMultimedia Tools and Applications10.1007/s11042-024-19215-7Online publication date: 7-May-2024
    • (2024)Multi-model query languages: taming the variety of big dataDistributed and Parallel Databases10.1007/s10619-023-07433-142:1(31-71)Online publication date: 1-Mar-2024
    • (2023)GPC: A Pattern Calculus for Property GraphsProceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3584372.3588662(241-250)Online publication date: 18-Jun-2023
    • (2023)An epistemic approach to model uncertainty in data-graphsInternational Journal of Approximate Reasoning10.1016/j.ijar.2023.108948160:COnline publication date: 1-Sep-2023
    • (2023)Modeling Regex Operators for Solving Regex Crossword PuzzlesDependable Software Engineering. Theories, Tools, and Applications10.1007/978-981-99-8664-4_12(206-225)Online publication date: 27-Nov-2023
    • (2023) Data Graphs with Incomplete Information (and a Way to Complete Them)Logics in Artificial Intelligence10.1007/978-3-031-43619-2_49(729-744)Online publication date: 20-Sep-2023
    • (2022)Data Path Queries over Embedded Graph DatabasesProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3524159(189-201)Online publication date: 12-Jun-2022
    • (2022)The Complexity of Regular Trail and Simple Path Queries on Undirected GraphsProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3524149(165-174)Online publication date: 12-Jun-2022
    • (2022)Conjunctive Regular Path Queries with Capture GroupsACM Transactions on Database Systems10.1145/351423047:2(1-52)Online publication date: 23-May-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media